Professional Documents
Culture Documents
978 1 4612 1718 3
978 1 4612 1718 3
Nonparametric Statistics
for Stochastic Processes
Second Edition
, Springer
D. Bosq
Universite Pierre et Marie Curie
Institut de Statistique
4 Place Jussieu
75 252 Paris cedex 05
France
AlI rights reserved. This work may not be translated or copied in whole or in part without the written
permission ofthe publisher Springer Science+Business Media, LLC, except for brief excerpts in connection
with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval,
electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former
are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks
and Merchandise Marks Act, may according1y be used freely by anyone.
9 876 5 4 3 2 1
Apart from some improvements and corrections, this second edition con-
tains a new chapter dealing with the use of local time in density estimation.
This edition contains some improvements and corrections, and two new
chapters.
Chapter 6 deals with the use of local time in density estimation. The
local time furnishes an unbiased density estimator and its approximation
by a kernel estimator gives new insight in the choice of bandwidth.
5. Density estimation 8
6. Regression estimation and prediction 11
7. The local time density estimator 12
8. Implementation of nonparametric method 13
4. Asymptotic normality 53
5. Nonregular cases 57
Notes 65
CHAPTER 3. Regression estimation and prediction
for discrete time processes 67
1. Regression estimation 67
Notes 87
5. Sampling 118
Notes 127
6. Sampling 140
7. Nonparametric prediction in continuous time 141
Notes 144
CONTENTS xiii
Notes 167
Notes 184
4. Annex 185
References 197
Index 207
Notation
E(X I B), E(X I Xi, i E I), VeX I B), vex I Xi, i E I) conditional expecta-
tion, conditional variance (of X), with respect to B or to a(Xi , i E I).
8(a) , B(n,p), N(m,a 2 ),)..d Dirac measure, Binomial distribution, normal dis-
tribution, Lebesgue measure over ]Rd.
(X t , t E I) or (X t ) stochastic process.
C([a, b]) Banach space of continuous real functions defined over [a, b], equipped
with the sup norm.
~ weak convergence.
~ convergence in probability.
• end of a proof.
~E cardinal of E.
Synopsis
IJ
10
.J
- nonpar3metric p,edictor
·10
AR..'v(A predictor
·Il .....-----------~------------
~y'i'
We also try to explain why nonparametric forecasts are (in general) more
accurate than parametric ones. Finally we make suggestions for the implemen-
tation of functional estimators and predictors.
Now the rest of the synopsis is organized as follows. In 8.2 we construct the
kernel density estimator. The kernel regression estimator and the associated
predictor are considered in 8.3. The mathematical tools defined in Chapter
1 are described in 8.4. 8.5 deals with the asymptotic behaviour of the kernel
density estimator (cf. Chapters 2 and 4). 8.6 is devoted to the convergence of
regression estimators and predictors (cf. Chapters 3 and 5). In 8.7 we point
out the role of local time in density estimation for continuous time processes.
Finally 8.8 discusses sampling, and practical considerations (cf. Chapter 7) .
S.2. THE KERNEL DENSITY ESTIMATOR 3
If f is continuous over Inj and if an,j -an,j-l is small, then !n(x) is close to
f(x) for each x in I nj , However this estimator does not utilize all the informa-
tion about f(x) contained in data since observations which fall barely outside
Inj do not appear in !n(x). This drawback is particularly obvious if x = an,j-l.
Now in order to obtain smoother estimations, one can use other kernels (a
kernel on JRd is a bounded symmetric d-dimensional density such that II u lid
K(u) ----> 0 as II u 11----> 00 and f II u 112 K(u)du < 00). Let K be a kernel, the
associated kernel estimator is
fn(x) = nh~
1
8 (x - Xi)
n
K --,;;:-
u2
-- 1
For example if d = 1 and if K(u) = f<Ce 2, u E R then
v 27r
1 n 1 .(x_X.)2
fn(x) = - 2:--e-2 ~ , x ER
n i=l h n V'2-ff
We now consider the case where the data are realizations of a stochas-
tic process (Xd. In that case the Kolmogorov extension theorem states that
the distribution v of a stochastic process is completely specified by its finite-
dimensional distributions (cf. [A.GJ). Thus the general problem of estimating
v reduces to the estimation of these. To this aim, it is convenient to estimate
the associated densities if they do exist.
S.3. THE KERNEL REGRESSION ESTIMATOR AND PREDICTOR 5
Jr(x) = ~ r
ThT 10
T
K (x -hTXt) dt , x E JRd
r(x) = E(Y; I Xi = x) , x E JR .
, x E Inj , j E Z
L IJnj(X i)
i=l
i=l
where (K, h n ) is defined in S.2. The definition remains valid if (Xi, Yi) is
lR d x lR-valued and if data are dependent.
where the last term is structural and consequently cannot be controlled by the
statistician.
Now let us consider the bidimensional process (Xt , Yt) = (et, et+H), t E Z
and the associated regression estimator rn-H defined above. It induced a
natural nonparametric predictor via the formula
The final purpose of this book is the asymptotic study of fn+H' &+H and
of some more general predictors.
Some measures of that type are introduced in Chapter 1. The most impor-
tant should be the strong mixing coefficient (cf. [ROj). For the sake of clarity,
let us introduce it in a stationary context.
Let us recall that X is said to be (strictly) stationary if, for any integer
k and anytb ... ,tk, S inZ one has p(x tt+St· .. , x tk+S )=p(x tt,··" x)·
tic
For such a process a(k) does not depend on t. Now X is said to be strongly
mixing (or a-mixing) if lim a(k) = O. This condition specifies a form of
k-oo
asymptotic independence of the past and future of X.
Classical ARMA processes are strongly mixing with coefficients which de-
crease to zero at an exponential rate.
Among these inequalities the sharper is due to RIO (d. [RI]1993). RIO's
inequality is optimal in some sense (see (1.9) and Theorem 1.1).
Other important outcomes of the strong mixing condition are large de-
viation inequalities. An accurate lemma of BRADLEY (Lemma 1.2) gives
the "cost" of the replacement of dependent random variables by associated
8 SYNOPSIS
independent ones. Using this result and exponential type inequalities for inde-
pendent variables it is thus possible to establish large deviation inequalities for
strongly mixing processes.
where
(cf. Corollary 2.2). This result is (almost) optimal since the uniform rate of
where N(m) has the m-dimensional standard normal distribution. Note that
the precise form of this result allows to use it for constructing tests and confi-
dence sets for the density.
Here a(k) = O(k- 2 ), and the proof utilizes the BRADLEY lemma quoted
in SA.
The search for optimal rates is performed in a more general setting than in
discrete time, here! is supposed to be k times differentiable with kth partial
derivatives satisfying a Lipschitz condition of order >- (0 < >- :::; 1). Thus the
number r = k + >- characterizes the regularity of f. In that case it is interesting
to choose K in a special class of kernels (cf. Section 4.1).
Now this rate is achieved if the observed sample paths are slowly varying,
otherwise the rate is more accurate.
10 SYNOPSIS
then
sup E(fr(x) - f(x))2 =0 (T-pr/(pr+d»)
xERd
If p is large enough (p > 2) the local irregularity of the sample paths fur-
nishes additional information. This explains the improvement of the so called
"optimal rate" .
Apart from that, regression and density kernel estimators behave similarly.
where c is explicit (Theorem 3.1). Proof of this result is rather intricate since
it is necessary to use one of the exponential type inequalities established in
Chapter 1, in order to control the large deviations of r n - r .
Similarly, as in the density case, if the sample paths are irregular enough
the kernel estimator exhibits a parametric asymptotic behaviour, namely
S.6.2 Prediction
The asymptotic properties of the predictors fn+H and fT+H introduced in S.3
heavily depend on these of the regression estimators which generate them. De-
tails are given in Chapters 3 and 5.
Here we only indicate two noticeable results which are valid under a 'Prev-
mixing condition (a condition stronger than a-mixing).
Secondly, modifying slightly fT+H one obtains a new predictor, say ~T+H
such that for each compact interval t..
thus the non parametric predictor ~T+H reaches a parametric rate. This could
be a first explanation for the efficiency of nonparametric prediction methods.
Other explanations are given in Section 8.
iT(X) = lim!.x
010 c;
{t :0 :s t :s T, IXt - xl < :.}
2
where .x denotes Lebesgue measure.
However, the above technique suffers the drawback of perturbating the data.
Thus it should be better to use simple transformations as differencing (cf. 3.5.2)
or affine transformations (cf. [PO]).
In fact it is even possible to consider directly the original data and use them
for prediction! For example if (~n, n E Z) is a real square integrable Markov
process, the predictor fn+H introduced in 8.3 may be written as
n-H
fn+H = L Pin~HH
i=l
K C\~~i)
where Pin = H ; i = 1, ... , n
~ K(~\~~i)
thus ~n+H is a weighted mean and the weight Pin appears as a measure of
similarity (cf. [PO]) between (ei,eHH) and (~n,en+H)' In other words the
nonparametric predictor is constructed from the "story'? of the process (et).
Consequently trend and seasonality may be used to "tell this story".
S.8.2 Construction
The construction of a kernel estimator (or predictor) requires a choice of K
and hn . Some theoretical results show that the choice of reasonable K does
not much influence the asymptotic behaviour of In or Tn.
Note that, if the observed random variables are one-dimensional, the normal
x2
1 --
kernel K(x) = rn=e 2 and hn = (Tn n- 1/ 5 (where (Tn denotes the empirical
v21l"
standard deviation) are commonly used in practice (cf. appendix).
S.8.3 Sampling
The problem of sampling a continuous time process is considered in Sections
4.4 and 5.6.
Theorem 4.12 and 4.13 state that if X Cn ' X 2Cn , ... ,Xncn are observed (with
On -> 0 and Tn = nOn -> (0) then On = T;;d/2r is admissible provided h n =
rp-l/2r
.Ln •
example [RB-ST]).
In this chapter we present some inequalities for covariances, joint densities and
partial sums of stochastic discrete time processes when dependence is mea-
sured by strong mixing coefficients. The main tool is coupling with indepen-
dent random variables. Some limit theorems for mixing processes are given as
applications.
1.1 Mixing
In the present paragraph we point out some results about mixing. For the
proofs and details we refer to the bibliography.
Let (st, A , P) be a probability space and let 13 and C be two sub <T-field of
A. In order to estimate the correlation between 13 and C various coefficients
are used:
17
18 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES
where the "sup" may be omitted if (Xd is stationary. Similarly one defines
,6-mixing (or absolute regularity), cp-mixing and p-mixing.
where aj = O(e- rj ) , r > 0 and where the Et ' S are independent zero-mean real
random variables with a common density and finite second moment. Then the
series above converges in quadratic mean, and (Xd is p-mixing and therefore
a-mixing with coefficients which decrease to zero at an exponential rate.
where the Et'S are independent with common distribution l3 (1, ~).
Noting that X t has the uniform density over (0,1) and that
1.2. COUPLING 19
one deduces that X t is the fractional part of 2Xt +l, hence a(Xt ) C a(Xt+l) '
By iteration we get
a(Xt) C a(Xs, s 2: t + k)
thus
1 1
4' 2: ak 2: a(a(Xt),a(Xt )) = 4'
which proves that (X t ) is not a-mixing . •
In the Gaussian case there are special implications between the various
kinds of mixing : if (Xt ) is a Gaussian stationary cp-mixing process, then it is
m-dependent Le., for some m, a(Xs,s :::; t) and O'(Xs,s 2: t + k) are inde-
pendent for k > m . On the other hand we have Pk :::; 2;rrak for any Gaussian
process so that a-mixing and p-mixing are equivalent in this particular case.
However a Gaussian process may be a-mixing without being ,g-mixing.
The above results show that cp-mixing and ,g-mixing are often too restrictive
as far as applications are concerned. Further on we will principally use a and
p-mixing conditions and sometimes the 2-a-mixing condition:
(1.4)
1.2 Coupling
The use of coupling is fruitful for the study of weakly dependent random vari-
ables. The principle is to replace these by independent ones having respectively
the same distribution. The difference of behaviour between the two kinds of
variables is connected with the mixing coefficients of the dependent random
variables . We now state two important coupling results. For the proofs, which
are rather intricate, we refer to [B] and [BR1] .
It can be proved that" =" cannot be replaced by "<" , thus the result is optimal.
20 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES
Proof
Cov(X+, y+) =
Jr
R2
+
[P(X > u, Y > v) - P(X > u)P(Y > v)jdudv,
which implies
(1.9) Cov(X, Y) ~
r201
2"1 Jo [Qx(u)]2du .
Proof may be found in [R11].
We now present two inequalities which are less general but more tractable.
COROLLARY 1.1 Let X and Y be two real valued r·andom variables such
1 1 1
that X E Lq(P), Y E Lr(p) where q > I, r> 1 and - + - = 1- -, then
q r p
(1.10)
22 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES
(Davydov's inequality).
Suppose first that q and r are finite. Then Markov's inequality yields
which implies
Q (u) < IIXllq O<u:::;1.
x - u 1/ q
Now, using (1.5) we obtain
hence (1.10).
where (X, Y) is a ]Rd x ]Rd-valued random vector and where fz denotes the
density of the random vector Z with respect to Lebesgue measure.
The following statement connects 9 = g(X ,Y ) with a = a(O'(X), O'(Y)).
1.3. INEQUALITIES FOR COVARIANCES 23
(1.l3)
If in addition g satisfies the Lipschitz's condition
(1.l4) Ig(x',y') - g(x,y)1 ::; f(llx' - xl12 + lIy' _ YI12)1/2,
x,x',y,y' E JRd, for some constant f, then there exists a constant ")'(d, f) such
that
(1.l5) IIglioo ::; ,(d, f)a: 1/(2d+ l).
Furthermore one may choose ")'(d, f) = Vi 2 +f.j2 where lid denotes the volume
of the unit ball in JRd.
Proof
(1) If ai ::; Xi ::; bi ; i = 1, ... , n where aI, bl , ... , an, bn are constant then
(Hoeffding's inequality).
(2) If there exists c > 0 such that
(1.17)
i = 1, ... , n ; k = 3,4, ...
(Cramer's conditions) then
i=l
(Bernstein's inequality).
Proof
(1) First, let X be a real-valued zero-mean random variable such that
a ::; X ::; b. We claim that
(1.19) E(exp)"X)::;exp (
),,2(b -
8
a?) ,),,>0.
4t
Choosing)' = n it follows that
2)bi - ai)2
i=1
1
(2) For 0 < A < - according to Cramer's conditions (1.17) we have
c
(1.22)
Using (1.22) and the dominated convergence theorem we can deduce that
IT E(e
n
P(Sn 2: t) :::; e-)..t AX ,)
;=1
<
- e
->.t
exp
(A2 1~EX;)
_ AC .
t
Now the choice A = n leads to
2LEX; +ct
;=1
2
P(Sn 2: t) :s; exp (- n t )
4 LEX; +2ct
;=1
n ) 1/2 ( n )
t 2 kb) ( ~ EX; and tb :S c:(r) ~ EX;
On the other hand it can be seen that Cramer's conditions (1.17) are equiv-
alent to existence of E (e XX i ) for some, > O. We refer to [A - Zj for a
discussion.
We now turn to the study of the dependent case. For any real discrete time
process (Xt , t E Z) we define the strongly mixing coefficients as
Note that this scheme applies to a finite number of random variables since it
is always possible to complete a sequence by adding an infinite number of de-
generate random variables.
Vq = J2(q-l)p
(2 -l)PQ
Yu du
n
where p = - .
2q
Here e = bbp and ~ = min (~;, (0 - l)bP) for some b > 1 which will be
specified below.
If 8 = 1 + 2be then
thus
(1.29) P(!~
L.
W.!> ne)
4
::; 2exp (_..!::S2 ).
J 16pb2
1
Clearly
and
hence
and
Taking into account the above overestimate and using (1.31) we obtain
after some easy calculations
P (l~w·1
L > ne)
4 ~ 2exp (-~)
J Sv (q) 2
1
(1.32)
We would like to mention that although (1.26) is sharper than (1.25) when
E and a(.) are small enough, however (1.25) is more tractable in some practical
situations.
The next theorem is devoted to the general case where the Xt's are not
necessarily bounded but satisfy Cramer's conditions.
THEOREM 1.4 Let (Xt, t E Z) be a zero-mean real-valued process.
Suppose that there exists c > 0 such that
where
2
al
n
= 2-
q
+ 2 ( 1 + 25 m 2E + 5CE ),v.nth
. 2
m2 =
2
max EXt,
l~t~n
2
and
a2(k) = lln 1 + -
(5mk~) , with mk = max IIXtll k .
E l~t~n
Proof
1 ~ qr ~ n < (q + l)r.
Consider the partial sums
Zl Xl + Xr + l + + X(q-l)r+l
Z2 X2 + X r +2 + + X(q-l)r+2
Zr Xr + X2r + + Xqr
~ X qr+ 1 + + + Xn ifqr<n
0 otherwise.
32 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES
We clearly have
~
Choosing
2E
8 = 1 + - - and ~ = -
2E Yields
.
5mk 5
The proof will be complete if we exhibit a suitable overestimate for P (18..1 > ~t:).
For that purpose we write
theoretical results.
and
La(k)l-~ < +00
k;:::l
First we study the series L Cov(Xo, Xk) . By using (1.10) with q = rand
kEZ
1 2
-=l--weget
p r
r _(2a(k))1-2/r(EIXo
ICOV(XO,Xk)1 :::; 2_
r-2
n2/ r
which proves the absolute convergence of the series since L a(k)1-2/r < + 00.
k;:::l
Now clearly
Sn = n- 1
nVar-
n
L Cov(Xs,Xt ) ,
O$s ,t$n-l
S
nVar nn = L
n- l ( Ikl) CoV(XO,Xk) .
1 - --:;
k=-(n-l)
(2) If (X t ) is a-mixing with a(k) :::; apk, a> 0,0 < p < 1
then
(1.42) Bn -;. 0 a.s ..
y'nLog 2nLogn
Proof
. /Log 2nLogn
(1) Using (1.34) for n > m, e =V n TJ, TJ > 0 and q = [njm + 1]
we get
L
n>m
P ( IBnl
JnLog 2nLogn
> TJ) < +00 , TJ > 0
and the Borel Cantelli lemma (cf. [BI 2]) yields (1.41) . •
. . . Log 2nLogn
(2) Usmg agam (1.34) wIth e = y'n TJ, TJ > 0, k = 2 and
q = [L nL
og2n ogn
+ 1] leads to
Note that (1.41) and (1.42) are nearly optimal since the law of the iterated
logarithm implies that Sn f> 0 a.s. even for independent summands.
v'nLog 2 n
We now give a central limit theorem for strongly mixing processes.
THEOREM 1.7 Suppose that (Xt , t E /Z) is a zero-mean real-valued strictly
stationary process such that for some I > 2 and some b > 0
and
(1.43)
(1.44)
Proof
First (]'2 does exist by Theorem 1.5. Now consider the blocks
V{ = X p +1 + ... + X p +q
(1.45) P(IWj - Vjl > 0 ::; 11 (11 v:.J +~ cll 'Y ) ~ a(q) --.::L..
2.,+1 ;
. E:(],..fii
J = 1, ... , r ; where ~ = -- (E: > 0) , c = P (IIXolI'Y ( > 1).
r
.
smce P rv --
n
and -
fo rv
fo
-L-' so that (1.45)
. .
IS valId .
Logn r ogn
Consequently setting
~ _ VI + ... + Vr _ WI + ... + Wr
n - ufo 17fo
we obtain
thus
....:I..-
Vl+",+Vr
Now in order to obtain the asymptotic normality of fo it suffices
17 n
to prove that bon converges to zero in probability ([BIll). To this aim we use
38 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES
(1.46), we have
LV} LV;
Sn = _1_ _ + _1_ _ + Rn
ufo ufo ufo ufo
where
X r (p+q)+l + .. . + Xn if r(p + q) < n
o otherwise
r
LV;
It remains to show that 1 r;;; and ~ converge to zero in probability.
Uyn Uyn
LV;
_1_ ~ N ""N(O, l)
U ,;qr
therefore
r r
LV; LV;
_1_ _ = fqT_1__ ~ 0
ufo V-:;;: u,;qr
. fqT rr;;g;;
smce V-:;;: "" V~ .
Second, using Tchebychev's inequality we get
Rn ~O
ufo
Collecting the above results we obtain (1.44) . •
1.5. SOME LIMIT THEOREMS FOR STRONGLY MIXING 39
Notice that a functional central limit theorem may be shown when assump-
tions of Theorem 1.7 hold.
Notes
The strong mixing condition has been introduced by ROSENBLATT ([R01]) in 1956.
The basic properties of strong mixing conditions are studied by BRADLEY in [BR2J
but the most complete reference should be the book by OOUKHAN ([DKJ 1994).
The coupling lemma's are from [BJ and [BR1J with a slight improvement due to
RHOMARI ([RHJ 1994). The optimal RIO's inequality is in ([RIlJ 1993).
The second part of Lemma 1.3 is given in [B08J. Concerning the exponential
inequalities (1.16) and (1.18) some improvements may be found in the BENNETT's
paper [BEJ. The original forms and the proof's method of Theorems 1.3 and 1.4 are
obtained in [B06J. The present statement is an amelioration using some ideas of
RHOMARI. Related inequalities may be found in [OK1] and [C] .
Theorem 1.5 is a result of DAVY DOV ([0]), Theorem 1.6 is an easy consequence of
the exponential inequalities and Theorem 1.7 was obtained by IBRAGIMOV in [IB],
here the proof is simpler than the original one since we use the powerful Bradley's
lemma.
Chapter 2
This chapter deals with nonparametric density estimation for sequences of cor-
related random variables.
We shall see that, under mild conditions, it is possible to obtain the same
convergence rates and the same asymptotic distribution as in the LLd. case.
The asymptotic behaviour of the kernel estimate in some non regular cases
(errors in variables, chaotic data, singular distribution) is studied at the end of
the chapter.
lim liulldK(u) = O
Ilull ...... oo
and
r
l)Rd
II u 112 K(u)du < +00
41
42 CHAPTER 2. DENSITY ESTIMATION
(2.1)
since j.tn is not absolutely continuous with respect to Lebesgue measure over
JRd. So in order to obtain such an estimate it is necessary to transform j.tn
in a suitable way. The kernel method consists in a regularization of j.tn by
convolution with a smoothed kernel, leading to the kernel estimator
(2.3)
which can be written
(2.4) In(x) =
n
h1d ~
L... K
n t=l
(x- Xt)
-h-
n
are necessary and sufficient for the consistency of In. This will be clarified
below; from now on , we do suppose that (2.5) is satisfied unless otherwise
stated.
2.2. OPTIMAL ASYMPTOTIC QUADRATIC ERROR 43
We need some notations and assumptions : we suppose that for each couple
(t, t'), toft' the random vector (Xt, X t ,) has a d ensity and we set
(2 .6)
THEOREM 2.1 If f(x) > 0, ifH 1 (resp. H 2 ) holds and if {3 > 2 P - 1 (resp.
p-2
2d+ 1
(3 > -d--) then the choice h n = cnn-1/(dH ) where en - -t c > 0 leads to
+1
(2 .7) n 4 /(dH) E[fn(x) - f(X)]2 ---t C(c, K, f) > 0
where
c
C(c, K,f)=4
4
(
L
l:5i ,j:5d
8 f ' (x)
8x8x
'
2
J
J uiujK(u)du ) +7
f J
:I (x) 2
K.
44 CHAPTER 2. DENSITY ESTIMATION
Proof
The following decomposition is valid:
1
E(Jn(x) - f(x)? (Efn(x) - f(x))2 +; VarKhn (x - Xl)
+
1
n(n-l) L COV(Khn(X-Xt),Khn(X-Xt'))
I:SW-tl:Sn-1
: B~(x) + VJn(X) + Cn .
(2.8)
We treat each term separately. First we consider the bias :
(2.9)
Now Vfn(x) is nothing else but the variance of fn in the i.i.d. case. It can be
written
then writing Rd = {u :1I u II::; 7J} u {u :11 u II> 17} where 7J is small enough it is
easy to infer that
J
and
(2 .11 ) Kh n (x - u)f(u)du -> f(x) .
(In fact (2.10) and (2 .11) are two forms of a famous Bochner's lemma (see
[PAD , [C-L] and Chapter 4).
2.2. OPTIMAL ASYMPTOTIC QUADRATIC ERROR 45
(2.13)
1 1
where -
p
+ -q = 1.
On the other hand Billingsley's inequality (1.11) entails
(2.14)
thus
ICt,t'l :S I'n(lt' - tl)
where
Consequently
2 n-l
ICnl :S -
n t=1
L
I'n{t)
which implies
where Un ~ h;.2d/ q /3 •
Finally using the decomposition (2.8), the asymptotic results (2.9), (2.12),
(2.15) and the fact that h n ~ cn- 1/(4+d) we obtain the claim (2.7).
46 CHAPTER 2. DENSITY ESTIMATION
When H2 holds the proof is similar. The only difference lies in the overes-
timation of Ct,t' : using (2.14) in Lemma 1.3 we get
1/(2d+l}
(2.16) ICt,t,1 ~ i(d,£) [oP}(It' - tl) ]
and consequently
. -(2d+1}/i3 d . 2d + I
then the chOlce Vn c::e h n leads to nhnlCnl = 0(1) smce (3 > -d-- ' •
+1
Let us introduce the notion of " geometrically strongly mixing" (GSM) pro-
cess. We will say that (Xt ) is GSM if there exist Co > 0 and p E [0, I[ such
that
(2.17) a(k) ~ Copk k 2: 1.
Note that usual linear processes are GSM (see [DKJ).
The following lemma deals with simple almost sure convergence of fn for a
GSM process.
2.3. UNIFORM ALMOST SURE CONVERGENCE 47
--L.
LOgn)
2) If f E C2,d(b) for some b and if hn = en ( ----;- d+4
where en -+ c > 0,
then for all x E ]Rd and all integer k
(2.19) -L-1
ogkn
(n);rh
-L-
ogn
Un(x) - f(x)) -+ 0 a.s ..
Proof
1) The continuity of f at x and (2.11) yield
Efn(x) -+ f(x) ,
Hence
P(lfn(x) - Efn(x)1
(2.20)
48 CHAPTER 2. DENSITY ESTIMATION
which implies
nh d
Now setting Un = (Log~)2 we obtain the bound
thus
L P{/fn{x) - Efn{x)/ > 1':) < +00 , e > 0
n
Efn{x) - f{x) = 2.
h
2
2
JL
l<i '<d
_ ,J_
2
Xi
8f (x - Bhnv)ViVjK(v)dv
-8 8
Xj
Now set
LOgn)*
en = Logkn ( -n- ,
we get
ac2
l':~lIEfn(x) - f(x)1 ::; - L
n
ogk n
and the bound vanishes at infinity. Thus we only need to show that
(2.23)
where
1.
(2 .25) -L-1
ogk n
(n)*
-L
ogn
sup Ifn(x) - f(x)1
IIxll:5n'
-> 0 a.s . .
Proof
1) f being a uniformly continuous integrable function, it is therefore bounded.
Thus it is easy to see that for all 6 > 0
Choosing 15- 1 and n large enough that bound can be made arbitrarily
small, hence
sup IEfn(x) - f(x)l-> 0
xElRd
where for convenience we take II .II as the sup norm on ]Rd, defined by
II (Xl"'" Xd) 11= SUP1<i<d IXil· In the sequel we may and do suppose
that 'Y > 1. - -
(2.27)
and similarly
(2.28)
2.3. UNIFORM ALMOST SURE CONVERGENCE 51
it follows that
~n~~~+-L 1
og2 n
where ~~ = SUPl$j$v~ Ifn(xjn) - Efn(xjn)l·
Now, for all c > 0
v~
P(L1~ > c) ~ L P(lfn(xjn) - Efn(xjn)1 > c)
j=l
where Un --+ +00, hence EP(L1~ > c) < +00 which implies L1~ --+ 0 a s. .
which in turn implies L1n --+ 0 a.s. and the proof of (2.24) is complete . •
Note that the obtained rate in (2.25) is nearly optimal; in fact by applying a
theorem of STUTE (see [SED for Li.d. Xt's we obtain for all A > 0 and f > 0
n )
( Logn
2
<I+4
II:~~A
Ifn(x)f(x)
- f(x) I
-->
( (d +d4)& J 2)K
1/2
a.s ..
We now study the uniform behaviour of fn over the entire space. The results
are summarized in the following corollary.
COROLLARY 2.2 Let fn be the estimator associated with the normal kernel.
Then
1) If assumptions of Theorem 2.1 hold and if in addition E II Xo II < 00 we
have
(2.29) sup Ifn(x) - f(x)1 --> 0 a.s ..
xElRd
(2.30) -
limllxll_oo I x II d+2 f(x) < +00
(2.31) -L-1
ogk n
(n)*
-L
ogn
sup Ifn(x) - f(x)1
xElRd
--> 0 a.s . .
It may be shown that (2.29) and (2.31) are still valid if K has compact support.
Proof
1) Since (2.24) is valid it suffices to establish that
First note that we may choose 'Y > 2, then, by Markov inequality
2.4. ASYMPTOTIC NORMALITY 53
P (lim UII
t=l
Xt II> n2'Y) = 0 .
K ( X - hXt{w»)
n
< K(xn) t = 1, ... ,n
hence
sup fn{x,w)::;
IIxll>n~
1
hd
n n
L K (x ~ X
(no(w)
t=l n
t
)
+ (n - no{w»K{xn)
)
and finally
{21r)-d/2 (
(2.34) sup fn(x,w) ::; hd no{w) + exp (' ---h
1 n2'Y))
2 '
IIxll>n~ n n , 24 n
C 3 ) f E C2,d(b) .
Then we have
THEOREM 2.3 If C 1, C2, C3 hold, if a(k) = O(k- f3 ) where f3 ~ 2 and if
hn = L ~ n-
og ogn
m,
c > 0 then for all integer m and all distincts Xi'S such
that f(Xi) > 0
Proof
We first show that
Let us set n
Sn = LYi,n
t=l
where
From Bradley's lemma 1.2 there exist i.i.d. random variables WIn, ... ,
Wrn such that PWjn = PVjn and
P(IV}n - Wjnl > en) :::: 11 (" V}n;n Crt 1100) 1/2 a(q);j = 1, . .. ,r,
(2 .38)
Now consider r r
LVjn LWjn
(2.39) D. _ j=1 _ -,-j_=_1---:-7:::"
n - (rph~)1/2 (rph~)1/2'
.
which tends to zero provIded that e > 2(3 1( + ++ 2)
a d
d 4 ,thus
LWjn
Zn = (~:~~)1/2 ,n ~1 .
To this aim we apply the Liapounov condition (see [BII] p. 44). It suffices
to show that
r
L
j=1
E IWjn l
3
Now applying Schwarz inequality in (2.42) and the here above results we
obtain
that is
8
which tends to zero provided that a ::::: 3(d + 4)
Now write
n n
LVln
r )
j=1
Var (
(rph~)1/2
and
· .
Th ese con d ItlOns 2/3 - -/3--
are satls.fi e d I'f a < -/3-- 1 -dd -+ 2 w h'IC h IS. com-
8 1( 2) .
2 +1 2 +1 +4
d+
patible with a::::: 3(d + 4) and c> 2/3 a + d + 4 Sillce /3 ::::: 2.
2.5. NONREGULAR CASES 57
L
m
Zn ~ A;N;(m)
i=1
~n ~ 0
L VJn
r
(rph~) -1/2 ~ 0
j=l
(rph~r1/2 Dn ~ 0
which put together imply (2.36) by using [BIll (Theorem 4.1 p. 25).
in (2.36) we may also replace I(Xi) by In(Xi) and the proof of Theorem 2.3 is
therefore complete. •
THEOREM 2.4 Let f~ be the kernel estimate associated with the naive kernel
K = 1[-!,+W' Suppose that Dl1 D2, D3 are satisfied and that f E C2,d(b).
.
Then if h n = C
(L-n-
ogn ) I/(d+4)
°
,e> we have
L ) 1/(d+4)
Therefore by using (2.9), (2.12) and h n = c ( ~n we obtain
4/(dH»)
(2.46) B~(x) + Vfn(x) = 0 (( L~n )
.
Now settmg Bn = [hn hn] d
-"2' +"2 we get
where
cn(lt' - tl) = min(211 f 1100 h~ ,cp1t'-t l ).
From (2.47) we deduce that
LOgh- d ]
Now choosing Wn = [ LOgp~l we find that
Ie (x)1 < 4 II f
1100 Logh;;d +~ ~ t
n - nhd Logp-l nh2d ~ p
n n t>Wn
;n ,
which implies
4/(dH»)
(2.48) IGn(x)1 = 0 (( )
(2.49)
hence
thus
(2.50)
and that
therefore
n-1
::; n 2 6h 2d ~(
~ n - t) II 'Tr
t
go - f d
1100 h n
n t=l
(2.51) 6 +00
::; nh d
n t=l
L II 'Trtgo - f 1100 ,
finally (2.46), (2.48), (2.49), (2 .50) and (2.51) imply (2.45) . •
2.5. NONREGULAR CASES 61
fn(x, y}
1
= (n _ k}h~+2 ~ K
n-k (Xx-
hn
(t)
'
y - Yt+k
hn
) '
1 2
2) If 8 and 8 hold and if (~~;:;2 -+ +00 then for all 'Y E ] 0, ~ [ we have
(2.54) limhn inf fn(x,y»O a.s. .
ly-r(x)I<"yh n
8imilar results may be proved for other kernel estimates (see [BOI]) .
62 CHAPTER 2. DENSITY ESTIMATION
Proof
1) First we may and do suppose that Cr 2: 1 and that we use the sup norms
in ]Rk+l and ]Rk+2 . Now if n is large enough we may use 8 1 to write that
:S Iy - r(X(t))1
:S Iy - r(X(t))1
h
which entails Iy - r(X(t))1 > Cr 2n.
Consequently we have
Now put
( Y ) -_ (X -h X(t)
Unt X,
n
'
Y - r(x(t)))
hn ' 1:S t :S n - k ,
1 1
II Unt(x, y) II> 2 min(1 , cr ) =2 ,1:S t :S n - k
2) From Iy - r(x)1 < ,hn with 0 <, < ~ and II x - XCt) II:S ahn we deduce
that for n large enough, we have
(2.56)
2.5. NONREGULAR CASES 63
where
and by (2.56)
nh k + 1
COROLLARY 2.3 IfSI and S2 aTe satisfied and if (Lo;n)2 ...... +00 then faT
each 8 EjO, l[
(2 .58)
Proof
Let be a positive number. If h~I+c5ITn(X) - T(x)1 > € then we have
€
ITn(x) - T(X) I > ma:x(l , cr)hn , n ~ no using (2 .53) we obtain for n >
max(no , ne:)
Now (2.54) implies that this event has a probability zero, hence the result . •
We refer to [B01] for the construction of an estimate for k which uses again
Theorem 2.5.
64 CHAPTER 2. DENSITY ESTIMATION
Yt = Xt + et , t E IE
where the et'S are Li.d. and the processes (X t ) and (ed are independent.
(2.59)
(2.60) K- n (Y ) -- - 1
27T
1 IR
e
-iuy c/>K(U)
(-1) dU
c/>e uh n
, Y E lR .
In (2.60) c/>e denotes the characteristic function of eo and c/>K the Fourier
transform of the classical kernel K.
Then the loss is small: compare (2.61) with (2.7) and (2.62) with (2.25).
2) If 1>" is geometrically decreasing at infinite the convergence rates are
poor.
If, for instance, 1>,,(u) is of order exp( -alul i3 ) (a > 0) at infinite then
under some conditions
and
sup lin(x) - f(x)1 = O((Logn)-2/ i3 )
xED
where D is compact.
Notes
In the strongly mixing case one can mention ROUSSAS, ROSENBLATT, TRAN ,
TRUONG-STONE, MASRY, BOSQ, ROBINSON, PHAM-TRAN among others.
DELECROIX in [DE2] (1987) has first considered the case of an ergodic process.
In 1992, GYORFI and LUGOSI [GL] have shown that the kernel density estimator
is not universally consistent for an ergodic process.
Chaotic data and singular distributions are studied by BOSQ ([BOll and [BOS]).
Processes with errors have been recently considered by FAN and MASRY.
The construction and study of a non parametric predictor are the main purpose
of this chapter. In practice such a predictor is in general more efficient and
more flexible than the predictors based on BOX and JENKINS method, and
nearly equivalent if the underlying model is truly linear. This surprising fact
will be clarified at the end of the chapter.
We suppose that Zo admits a density fz(x,y) and that fz(x ,· ) and mOfz(x, ·)
are in L: 1 (>, d') for each x in JRd . Then, we may define the functional parameters
67
68 CHAPTER 3. REGRESSION ESTIMATION
and
1
L
n
Vn = - D(Xt,m(Y,))
n
t=l
and
(3.5)
1 ~
<Pn(x) = nh~ ~ m(Yt}K
- Xt)
(xh;:- , x E JR ,
d
(3.6)
Note that, if K is not strictly positive, definition (3.6) must be completed:
1 n
for fn(x) = 0 one may choose rn(x) = - L m(yt) which is clearly more nat-
n t=l
ural than the arbitrary Tn(X) = 0 used by many authors.
Pnt(x) =.!.,
n
1::; t::; n if fn(x) = O.
+ -d V (X)jK 2
c f(x) ,
Then we have
THEOREM 3.1 If 1 -> 4 hold, if f(x) > 0 and if (3 > max( 2(p - 1) , d + 2)
p-2
then the choice h n = enn-1/(d+4) where Cn -> c> 0 leads to
(3.8) n 4/(d+4)E(r n (x) -r(x)? -> C(x,c,K,f,r).
Proof
E(r n - r)2 = An + En + C n
where
and
2
en =- J2E[(rn - r)(c.pn - c.p)(fn - f)].
The quantity E(f - fn? has been already studied in Theorem 2.1 and the
other terms in An may be studied similarly. After some calculations one obtains
n 4 / (d+4) An -> C(x, c, K, I, r) .
Using successively Schwarz and Holder inequalities we get for n large enough :
Bn ::; 2n'n-(l+eh EUn - 1)2 + 2n'[EUn - 1)4]1/2
(3 .10)
[E (Ir n - rI2vllrnl<o;n~) jl/2V [p (lrn - rl > n-(l+eh: Irnl ::; n') jl/2V ,
1 1
where - + - = l.
v w
The first term in the bound is a O(n-C1'n- 4 /(dH) = o(n- 4 /(d+4). In the
second term we treat each factor separately. First we clearly have
(3 .12)
and
First we have
72 CHAPTER 3. REGRESSION ESTIMATION
On the other hand we may apply the exponential type inequality (1.34) to
the random variables Vt - EVt, 1 S t S n. Then, choosing q ~ n A , collecting
the above bounds and choosing v large enough, 'Y and € small enough and A
4
close enough to -d- we get En = o(n4/(d+4») and the proof of Theorem 3.1
+4
is now complete . •
As a simple example let us consider the case where the Zt'S are i.i.d. bi-
variate Gaussian variables with standard margins and Cov(Xt , Yt) = p. In this
case the classical estimator of rex) = px is Pnx where
and clearly
(3 .13) sup IPnx - pxl = +00 a.s ..
xEIR
THEOREM 3.2 Let (Zt) be a GSM strictly stationary process such that f
and'P belong to C 2 ,d(b) for some b and such that E(expalm(YoW) < +00 for
some a > 0 and some T > O.
nhd
Then if K is Lipschitzian, if n 1 -t +00 and ~f S is a compact set
(Logn)2+"T
such that infxEs f(x) > 0, we have
.
Furthermore if h n ~ (
Logn 2 -"T
(n)
1) 1/(d+4)
.
, then for each mteger k
n 2/(d+4)
(3.15) ( ) sup Irn(x) - r(x)1 -~ 0 a.s ..
Logkn(Logn)~+ 1-~;r:h xES
Proof (Sketch)
suplrnl 1
sup Irn - rl ~ ~f sup Ifn - fl + T f sup l'Pn - 'PI .
S l~ S l~ S
2
then, choosing A = cT(Logn)l/T where c > - and using Borel Cantelli lemma
a
we obtain
P(limsup{suplrnl > cT (Logn)1fr}) = o.
n S
The proof of (3.15) is similar but uses the more precise inequality (1.26)
instead of (1.25) . Details are omitted . •
(L ?-(1/T}) 1/{d+4}
then the choice h n -::: ( ogn n entails
Example 3.1
Suppose that m = IB where B E BIRd and that f(x) -:::11 x II-V for II x II
2
tending to infinity (p> d), then if 8 < -d- - 'YP we have
+4
Example 3.2
Suppose that (Zt) is a bivariate Gaussian process with X t '" N(o, 0"2)
(0" > 0). Then, if m is the identity, we have for each c EjO, ~[
n~-e
~ sup Irn(x) - r(x)1 -T 0 a.s ..
ogn Ixl~(7v'2€Logn
where (~t, t E Z) is a real strictly stationary process, and that m is the identity.
where i 2: 0, j 2: 0, i +j = 2.
then we have
or equivalently
for each Borelian real function F such that E(IF(~o)l) < +00.
Given the data 6, ... , ~N we want to predict the non-observed square inte-
grable real random variable
where 1 ::; H ::; N -k and where m is measurable and bounded on compact sets.
(3.19)
and we set
(3.20)
3.3. PREDICTION FOR A STATIONARY MARKOV PROCESS 77
Proof (sketch)
E(sup(rn(X) - r(x)?) = 2
xES
1 0
+00
vP(sup Irn(x) - r(x)1
xES
> v)dv
Note that if f and <p are twice continuously differentiable on S the rate is
not improved because the bias remains the same on the edge of S.
(3.21)
and
Proof
(3.23)
78 CHAPTER 3. REGRESSION ESTIMATION
and it is easy to prove that the condition E(expalm(eoW') < +00 implies
thus
(3.25)
Now let us write
~n E((rn(Xn +H ) -r(Xn+H))2Ixn+HV!'sJ
hence (3.22) . •
Note that while the rate in (3.20) is nearly optimal, there is a loss of rate
in the general case. As indicated above, the reason for it is the unpredictable
behaviour of r(x) for large values of II x II.
Example 3.3
Take a one dimensional Markov process (et) with f(x) ~ cexp( -c'lxjT)
(T ~1, c > 0, c' > 0), then, using (3.21) and (3.22) it is easy to check that
(3.27)
n 1 /(d+2)
----(-l-)-."c-lrn(Xn+H) - r(Xn+H)1
(Logn)'+ -, d+2
-----+ ° a.s ..
3.3. PREDICTION FOR A STATIONARY MARKOV PROCESS 79
Proof
Since
Irn(Xn+H) - r(Xn+H)1 :S sup Irn(x) - r(.r) I
xES
the result follows from Theorem 3.2 applied to the associated process (Zt) . •
COROLLARY 3.2 If conditions in Theorem 3.6 hold and if
then
(3.29)
Proof
rn,(x) = -
t.
t-1
m(Yt)K
,
(x ~ ~t) n
, xER
d
tK(X hn'-Xt)
t=l
(3 .32)
Now, in order to establish (3.32), we first apply (3.30) to eiuZn ' (x) , obtaining
Now
E (eiUZ",(Xn+H») --> e- 4 , uE R
3.4. PREDICTION FOR GENERAL PROCESSES 81
Note that, by using the precise form of (3.31), one may construct confidence
intervals for r(Xn+H)'
In order to take that fact into account we are induced to consider associated
processes of the form
~y,
~ t,N
K(XN+H,N-Xt,N)
h
(3.34) * ( X N+H,N )
rn = .;..---=.,n.,.---------
t-l n
L K (XN+H,N - Xt ' N)
t=l hn
(3.35)
(3.36) 'Tlt = ~t + St , t E Z
where (~t) is a non-observed strictly stationary process and (St) an unknown
deterministic sequence. For the estimation of (St) we refer to Section 4.
In the following 9 denotes the density of (~o , 6), f the density of ~o, r the
regression of 6 on ~o and c.p = r f ·
C- S is bounded and there exist real functions 7 and cp, and a b 2: 0 such that
3.4. PREDICTION FOR GENERAL PROCESSES 83
and
Example 3.4
If f and <p are bounded and if s is periodic with period T then C is valid
with
_ 1 T
f( ·) = T Lf(· - St)
t=l
J
and
~( . ) = T1 L
T
yg(. - St,y - sddy .
t=l
Example 3.5
t
---> S (respectively S is bounded
Example 3.6
A simple example of non periodic s should be
1 n
Finally, note that the condition -
n
L IStl ---> 0 may be compatible with the
t=l
appearance of some outliers.
The kind of robustness we deal with here consists in the fact that the kernel
predictor
(3.37) r n(1]n) =~
{:r 1]t+l K (1]n - 1]t )
---,;;:- /
~K
(:r (1]n - 1]t)
---,;;:-
(3.38)
Generalisation
(3.39) 1]t = et + St , t E Z,
Interpolation
Let (et, t E Z) be a real strictly stationary process observed at times
-nl, . .. , -1, +1, . .. n2. The interpolation problem consists in evaluating the
missing data eo.
3.4. PREDICTION FOR GENERAL PROCESSES 85
Obviously eo,n may also be used for detecting outliers by comparing an ob-
served random variable ~tD with its interpolate etDon' If we adopt the simple
scheme (3.36) we obtain a test problem with null hypothesis Ho : StD = O.
and
q(c) = inf{T/ : p(T/) :=; c} , 0 < c < 1.
Now a natural estimator for p is
Chaos
E(rn(x) - r(x))2 =0
L
(( ~n
)4/(d+4)) .
XP) = X + et , t E Z
t
where the et'S are i.i.d; and where (Xt ) and (et) are independent.
In the particular case where co has a known density, say fe, the estimator
takes the form
Now the asymptotic results are similar to those indicated in 2.5.3 : good
convergence rates if rPe is algebraically decreasing and poor rates if rPe; is geo-
metrically decreasing. See [MS].
Note that this model is different from (3.39) since here the observed process
is stationary.
3.4. PREDICTION FOR GENERAL PROCESSES 87
Notes.
Estimation of regression function by the kernel method was first investigated by
NADARAJA (1964) and WATSON (1964). A great number of people have studied
the problem in the i.i.d. case. An early bibliography has been collected by
COLLOMB (1981) . The case of time series has been studied by GYORFI-HARDLE-
SARDA and VIEU (1989) among others.
Theorem 3.1 is due to BOSQ and CHEZE (1994). Theorems 3.2, 3.3 are taken
from BOSQ (1991) and RHOMARI (1994) . Theorem 3.4 and results about predic-
tion and interpolation for kth Markov processes and general st ationary processes are
mainly due to RHOMARI (1994). For related results see the references.
We shall see that the situation is somewhat different from the discrete case.
In fact, if the observed process paths are slowly varying the optimal rates are
the same as in the discrete case. If, on the contrary, these paths are irregular
one obtains supemptimal rates in quadratic mean and uniformly almost surely.
It is noteworthy that these rates are preserved if the process is observed at
judicious discrete instants.
89
90 CHAPTER 4. KERNEL DENSITY ESTIMATION
Suppose that the Xt's have a common distribution J-t. We wish to estimate
J-t from the data (Xt,O ::::: t ::::: T). A primary estimator for J-t is the empirical
measure J-tT defined as
(4.1)
Now if J-t has a density, say f, one may regularize J-tT by convolution, leading
to the kernel density estimator defined as
(4.2) fr(x) = -1
d loT K (x- -hXt)
- dt,x E R d
ThT 0 T
In some situations we will consider the space Hk ,)' of the kernels of order
(k,).) (k E N,O < ). ::::: 1) Le. the space of mapping K : Rd ----; R bounded,
integrable, such that flR d K(u)du = 1 and satisfying the conditions
(4.3) and
Note that a kernel is a positive kernel of order (1,1). On the other hand
we will use two mixing coefficients :
continuous on the right and have a limit on the left at each t).
4.2.1 Consistency
Let us begin with a simple consistency result .
THEOREM 4.1 Iff is continuous at x and if 0:(2) E Ll (>.) then the condition
Th~d -+ +00 implies
(4.8) E(Jr(x) - f(x)? -+ O.
Furthermore if f E etH(R), K E Hk,A and hT ~ T- 1 /(2r+2d) where r = k + >.
then
(4.9) E(Jr(x) - f(x))2 = O(T-r/(r+d») .
Proof
Using the classical Bochner's lemma (see (2.11)) we get
(4.10) EJr(x) = r
}'J(d Khr(x - u)f(u)du h-r>- O f(x).
Now Fubini's theorem entails
VJr(x) =
(4.11)
r
T\ },O,TJ2Cov (Khr(x - Xs),Khr(x - Xt)) dsdt,
92 CHAPTER 4. KERNEL DENSITY ESTIMATION
Vfr(x)~4I1K II~
T 2 h}d
r
J[0,Tj2
Q(2)(lt-s[)dsdt.
and finally
8(k) f ]
- .J1 . (x) du
8x 1 .. . 8x Jd
d
and (4.6) implies
(4.15)
where
. 11
c) hmsup -T
T--++oo [o,TFnr
dsdt = fr < 00 .
The following lemma furnishes an upper bound for the variance of fT.
p-1
LEMMA 4.1 If A(r,p) and Mb , f3) hold for some r,p, 'Y, 13, with 13 ~ 2--
p-2
then
v Jr(x) :S _1 E
Th~
[~K2
h~
(x -hTXo)] .2.T r i[O ,Tj2nr
dsdt
(4.17)
+ (211 KII~ 8 (r) + 811:_1I~ I) :!~
p
p 2d 1
where q = - - and 1) = -(1 - -) - d ~ O.
p-1 q 13
Proof
Let us consider the decomposition
Th~VJr(x) = r
i[O,T]2nr
Cov (K (x -hXT ' K(~-hTXt)) Th~
s)dsdt
(4.18)
( 4.19) IT := h1d EK 2
T
(x- -h -o ) . T 1
T
X 1
[O,TJ 2 nr
dsdt.
94 CHAPTER 4. KERNEL DENSITY ESTIMATION
Concerning the second integral, we may use A(r, p) and Holder's inequality
with respect to Lebesgue measure for obtaining
(4 .20) I (K (--,;;;:-
COY
X-Xs) ,K (X-Xt))1
~ (2d)/q II K IIq Dp(r),(s,t) Il"r,
~ hT 2 .
we get
' .
JT'
hence
(4.22)
J' < 811 K II~ / h«2d)/q)(1-t)-d
T- ,6-1 T '
il I is bounded, then
h 1100 J K2
C = fr II f cd
were
lim
T--+oo
h~EK~T(X - Xo) = f(x) / K2
If (3 = 2P -
1 (in particular if P = +00 and (3 = 2) the same rates are valid
p-2
but with a constant greater than C.
In order to show that the above rates are achieved for some processes, let
us consider the family X of processes X = (Xt , t E lR) which satisfy the above
hypothesis uniformly, in the following sense : there exist positive constants
fo, Lo , bo , "10, (30 and Po such that for each X E X and with clear notations
• II fx Ii00S fo
• -.!:.. r
T J[O,Tj2nrx
dsdt :s: Lo (1 + Lo)
T
• Px =Po > 2 and bpo(rx):S: bo
Po -1
• I :s: "10 and (3 2 (30 > 2 - - 2 .
Po -
96 CHAPTER 4. KERNEL DENSITY ESTIMATION
Then we have
COROLLARY 4.1
(4 .29) sup
xElRd
Th~Vxh(x) T_=
-+ Lo foJK 2
thus
ThT
d sup V h(x)
'
x ElRd
-+ Lofo JK.
2
sup
xElRd
V (T - Thd
Lo [T/Lol K (X - YrT/LOI)) < II K II~ L2
T
h - T2h2d
T T
0
4.2. OPTIMAL AND SUPEROPTIMAL 97
J
and finally
(4.32) Th~ sup Vxfr(x) -+ Lofo K2
xElRd
Proof: Clear . •
The next theorem emphasizes the fact that the kernel estimator achieves
the best convergence rate in a minimax sense.
THEOREM 4.3 Let fT be the class of all measurable estimators of the den-
sity based on the data (X t , 0 S t S T) then
2
(4.34) lim inf )nf sup T2.Td Ex
2r ( _
fr(x) - fx(x)
)
> 0, x E JRd .
T->+oo fTEFT XEX,
Proof (sketch)
Let Xo be the class of processes X = (Xt , t E JR) JRd··valued and such that
Xt = y[t/Lo], t E JR
where (Yn , n E Z) is a sequence of LLd. r.v.'s with a density f belonging to
C~(e) and such that X EX.
therefore
lim infT-->+oo AT 2: lim infT-->+oo BT .
liminf BT > 0
T-->+oo
hence (4.34) . •
Finally let us indicate that, like in the discrete case (see 2.2), similar results
may be obtained replacing A(f,p) by
.11
[ (s, t), where r satisfies the condition
lim sup -T
T-+oo [O,Tj2nr
dsdt < +00.
In that case the condition f3 > 2(p - 1)/(p - 2) is replaced by the weaker
2d+ 1
condition f3 > -d--'
+1
THEOREM 4.4
Proof
1) Using (4.11) and the stationarity condition g.,t = glt - sl we get
therefore
TV fr(x) ::;
(4 .39)
2 f IKhT(x - y)KhT(X - z)1 (10+00 Igu(Y, z)ldu) dydz
taking limsup on both side and applying Bochner's lemma we obtain (4.35).
Now
110+
00
gu(Y, z)d7L - loT (1 - ;;) gu(Y, Z)d7L1 = Ihoo
gu + loT fgUI
: ; h= II g,. 11= du + loT f II gu 11= du; (y , z) E JR2d; then, the integrability of
II g,. 1100 and the dominated convergence theorem show that the bound vanishes
as T -> +00.
Hence
TVfr(x) =
(4.41 )
100 CHAPTER 4. KERNEL DENSITY ESTIMATION
Now the dominated convergence theorem entails that (y, z) ....... It'" 9v.(Y, z)du
is continuous at (x,x) and finally Bochner's lemma implies (4.36) . •
COROLLARY 4.3
1) If assumptions of Theorem 4.1 hold for each x,
ifC = sup
Jo
xElRd
roo
19v.(X,x)ldu < +00 and if f E C~(f) (r = k + A) and
K E Hk,).., then the choice hT = O(T- 1/ 2T ) leads to
(4.42)
T-.+oo
limsup sup TE(fT(x) - f(x))2
xElRd
~ 2GJK 2.
Example 4.2
Let (Xt, t 2: 0) be a real diffusion process defined by the stochastic differ-
ential equation
(4 .46)
Moreover, under some regularity assumptions on Sand (j, the kernel estimator
of J reaches the full rate ~ . In particular if Xo has the density f , conditions
of Corollary 4.3 are fulfilled (see [KU] and [LED .
where X denotes a r.v. with density f and F denotes the distribution function
associated with f. We have :
102 CHAPTER 4. KERNEL DENSITY ESTIMATION
IT = 2JII.2d
( KhT(X - y)KhT(X - z) {T
Juo
(1 - ~)
T
gu(Y , z)dudydz, T> Uo .
4.2. OPTIMAL AND SUPEROPTIMAL 103
Uo
II 9u 1100 duo
Since the integrand is positive we may apply Fubini's theorem for obtaining
We are now in a position to conclude : Firstly (b), (e) and Fatou 's lemma
imply
for >.2d
lim
T-oo l
0
(4.51) '" 1
1 - p(u) =u~o( +) ILog(u)!l-.a' 0 < (3 < 1
(X t ) is not a.s. continuous (see [AD]) but V fr(x) ~ T1 provided (a) is satisfied.
Finally note that, using Theorem 4.2 one can construct an estimator such
that T 1 -e E(fr(x) - f(x)? --+ 0 (c > 0) a soon as the Gaussian process (X t )
satisfies mild mixing conditions. We will give a more precise result in the next
subsection.
t"
Jo
cp(u) exp (_
2rr
x 2(
1+P
») du = +00
u
4.2. OPTIMAL AND SUPEROPTIMAL 105
(ILl
which is equivalent to io 'P(u)du = +00 since limlL->oP(u) 1 by mean
square continuity. Thus we have clearly the first implication.
Now it is easy to check that l oIL1 'P( u)du < 00 implies lIL1 II 9u 1100 du < 00.
21=
I)
Then Theorem 4.4 entails TV hex) -+ 91L(X, x)du < 00, hence the second
implication.
E (Xu -
u
XO)2 --+EX(?
U--+O
which implies io
{U 1
(E(Xu - X o ?r l/ 2du = +00 and therefore
T· Vh(x) -+ +00 . •
1 1
We now give sufficient conditions for rates between - / +d and -. We will
Tr r T
use conditions A'(p) where p E [1, +00] defined by
A'(p) - 9s,t exists for s f= t, II 9s ,t lip is locally integrable and
1 {
-T
i]O,T]2
II 9s,t lip dsdt = 1 {T
-T
Jo
(1 - -TU) II gu lip du
so that II 9u lip is integrable over ]0, +00[.
Then A'(p) is fulfilled with G p =
2 It"" II gIL lipduo In particular assumptions in Theorem 4.2 imply A' ( +00).
On the other hand if JIL~oo II gIL IiI du < +00 for some Uo > 0, A'(l) is satisfied
since II gIL 111 S; 2.
We now state a result which links the convergence rate with A' (p).
106 CHAPTER 4. KERNEL DENSITY ESTIMATION
THEOREM 4.7
1) If A'(p) holds for some p E [1 , +001 then
Proof
1) We have
Vfr(x):s
1
:S T 2h}d
( [
JIR2d Kq
-u) Kq (xh:;-
(xh:;- -v) dudv ) l/q [
J[O,T]> II g.,t lip dsdt
1 II K II~
:S T h2d-(2d)/q -T
T
1 l[O ,T]>
II g. ,t lipdsdt
hence (4.52) .
2) Clear. •
Note that the optimal rate is reached for p = 2 and the parametric rate for
p = +00 . If p = 1 one obtains the same rate as in Theorem 4.1.Note however
that each of these rate is not necessarily the best one when A'(p) holds.
We complete this section with an example which shows that if the observed
process is nonstationary any rate is possible. Consider the process
(4.54) 11' t
( '2 - k"Y) (11' t - k"Y )
X t = Yk cos (k + 1)"Y _ k"Y + Yk+l sin '2 (k + 1)"Y _ k"Y ;
k"Y :S t < (k + 1)"Y, k E Z; where 'Y is a strictly positive constant and where
(Yk, k E Z) is a sequence of i.i .d . real Gaussian zero mean r.v.'s with variance
4.2. OPTIMAL AND SUPEROPTIMAL 107
a 2 > O. The observation of (Xt ) over [0, T] is in fact equivalent to the ob-
servation of Yo, ... , Y[Tlh] and the best rate is T-1h since an asymptotically
optimal estimator is
-fT(X) = ---exp
1 ( - 1- 2 x2) ,x E I~
ST..j2; 2 ST
[T'h]
where ST = 1 ~ X·
[Tlh]+1 ~ J'
Note that the kernel estimator remains competitive since here r may be
chosen arbitrarily large.
Let us set
'Pa,{3(U) = u{3(Logu)-a , u> 1
where 1 ::; (3 ::; 2 if a = 0, 1 < (3 < 2 if a E R - {a}.
We will say that the kernel estimator has the rate ['Pa,{3(T)r~ if, for
hT = ['Pa,{3(TW~, we have
and
Now to each function 'Pa,{3 and each M > 0 we associate the following subfam-
ily of X* :
The next theorem shows that this rate is actually minimax over X(a, (3, M) :
108 CHAPTER 4. KERNEL DENSITY ESTIMATION
THEOREM 4.8
where :FT denotes the class of all measurable density estimators based on the
data (Xt, Os t S T) ..
1
stationary Gaussian process which is derivable in mean square and with auto-
+00
correlation pO such that Ip(u)1 < 1 and Ip(u)ldu < 00 for some Uo > 0.
Uo
The rate LifT is minimax in the above sense. Proofs of Theorem 4.7 and
Corollary 4.5 which are very technical are omitted. They appear in [BK-B02].
LEMMA 4.2 Let (Zt, t ~ 0) be a real continuous time process such that
(a) For each "1 > 0, there exists a real decreasing function CPr" integrable on
lR+ and satisfying
Proof
First let (Tn) be a sequence of real numbers which satisfies
Tn+l - Tn ~ a > 0, n ~ 1 where a is some constant.
thus L:n 'Pry(Tn) < +00 and the classical Borel-Cantelli lemma yields
P (lim;uP{IZTnI > 1]}) = 0,1] > 0 which in turn implies ZT" -+ 0 a.s.
T 1(k) T n, h
were nl = 1,
TJk) Tn2 where Tn2 - Tn, ~ t, Tn2 - 1 - Tn, < t,
The first part of the current proof shows that ZT~k) p=:;: 0 a.s. for each k. Now
let us set
no = {w : t f-+ Zt(w) is uniformly continuous, ZT(k)
p
-+ 0, k ~ I},
clearly p(no) = 1.
Then if wE no and 1] > 0 there exists k = k(1] ,w) such that It - 81 :s k1 im-
plies IZt(w) -Zs(w)1 < ~ . Consider the sequence (T~k)) : for each p and each n
such that np :s n < n p+1 we have ITn - Tn pI < ~, hence iZTn (w) - ZTnp (w)i <
1]
2
Now for p large enough we have iZTnp (w)i < ~ and consequently IZTn (w)1 <
1] for n large enough. This is valid for each 1] > 0 and each tV E no, thus ZTn -+ 0
a .s . •
110 CHAPTER 4. KERNEL DENSITY ESTIMATION
• hT = CT ( --r
logT)~ (CT ~ C > 0) .
LEMMA 4.3
p - 1 7r +
1) Ifa(u)~,u-{3,u>Owhere{3>max ( 2p_2'~
5d)
then
A
(4 .55) P(IZTI > 7]) ~ T!+j.L ,7] > O,T ~ 1
2) If (X t ) is GSM then
B
(4 .56) P(IZTI > 77) ~ TG(logm T)2 ,7]> 0, T ~ 1
Proof
We may and do suppose that CT = 1 and 7] < l.
1) Let us set
(4 .57) Yjn = 11
"6 j6
(j-l)6
Khr(X - Xt)dt; j = 1, .. . ,n
where nl5 = T , n = [T] (T:::: 1) and consequently 2 > 15 :::: l. Thus we have
1 n
(4.58) fr(x) - Efr(x) =- 2:)Yjn - EYjn).
n j=l
4.3. OPTIMAL AND SUPEROPTIMAL 111
aim we may use inequality (4.17) in Lemma 4.1 with po instead of T and p'
instead of p for convenience. We have readily
Consequently
(4.60) v (tYjn) ~ a~
j=l oh T
2 c
v (q) ~ Ao hd
T
Now we choose 10 = lOT = hT(logm T)T/ (T/ > 0) and we notice that
n EnO lOT
q = 2p 2: -2- = 2' hence
thus
(4.62)
7r + 5d
and since {3 > - - - we have the bound
2r
(4.64) <~
vT_ THJ-L (C4 > 0, /J > 0).
If TJ > 1 it is easy to see that C4 must be replaced by C4TJ HfJ . Collecting (4.62)
and (4.64) we arrive at (4.55).
2) If o{) tends to zero at an exponential rate (4 .62) remains valid but (4.64)
may be improved. From (4.63) we derive the bound
VT :S csQ'Ye-fJ'P + C6cl/2h:;d/2q,e-fJ'P
(4.65)
:S C7exP (-C sT2r'.;.d -<)
where C7 and Cs are strictly positive and ( > 0 arbitrarily small. Consequently
the bound in (4.62) is asymptotically greater than the bound in (4.65), hence
(4.56) . •
The next lemma shows that (ZT) satisfies condition (b) in Lemma 4.2 .
Proof
We only prove (4 .66) for
Now we put
logWT = UT + VT
positive.
The derivative UIr of UT is clearly a 0 (~ ) .
Concerning V';' first we have
Noting that
(4 6. 7) IaKa
aT
(Xj - Xt,j)
hT
I
Ihrl
:S h} ahT II Ka 1100 ; J
I -
= 1, -.. , d.
From (4.67) it is easy to deduce that
and finally
114 CHAPTER 4. KERNEL DENSITY ESTIMATION
I Cl C2
(log W T ) ::; T + rT K (
x-x, ) dt .
Jo hT
f:
where C is constant. Thus Wr is bounded hence (4.66) . Clearly the result
remains valid if K (x 1:;' ) dt = O. •
We are now in a position to state a first consistency result :
1 7r + 5d)
THEOREM 4.9 If a(u) ::; ,,(u- f3 , "( > 0, f3 > max ( 2 pp _- 2' ~ then
(4.68) -1- - -
logm T log T
(T)rrT.r Ih(x) - f(x)1 --->
T-oo
0 a.s.,
m~l,xElRd
Proof
(4.55) implies
P(IZT(x)1 >1J) = o (Tl+ .. )
and (4.66) implies condition (b) in Lemma 4.2.
Hence (4.68) by using Lemma 4.2 . •
m ~ 1,a > O.
Proof
Since K is clearly Lipschitzian we may use a method similar to the method
of the proof in Theorem 2.2 : we t ake as II . IIthe sup norm and we construct
~ ±SH!
a covering of {x :11 x II::; T a } with vf hypercubes where VT '" Ta+~. Thus
we have
(4.70) sup IZT(x)l::;
IIxll:5Ta
sup
l:5j:5 v::'
IZT(xjT) 1+ 0 ((1ogIT)w)
4.3. OPTIMAL AND SUPEROPTIMAL 115
where the XjT'S are the centers of the hypercubes and where w > O. Using
(4.56) we obtain
On the other hand (4.66) shows that T t-t SUPlSjSv~ IZT(xjT) is uniformly
continuous for each w since A does not depend on (x, w). Consequently we may
apply Lemma 4.2 and we obtain (4.69) from (4 .70) . •
COROLLARY 4.6 (Uniform optimal rate.)
If f is ultimately decreasing with respect to II . II, if (Xt} is GSM, if SUPO<t<T II
Xt II is measurable for each Tand if E( sup II X t Ila) < 00 for some-a-> 0
OSt:51
then
(4.71 ) -11 T (T)~
-1T sup Ih(x) - f(x)1 -> 0 a.s.
ogm og xElRd
Proof
Since f is ultimately decreasing we claim that limllull->oo II u I f(u) = O.
To prove this it suffices to note that for R large enough
r
JR/2SlIvIiSR
f(v)dv ~ f(eR)ad Rd
thus from Theorem 4.10 and (4.72) we deduce that it suffices to show that
sup IZT(x)1 -> 0 a.s ..
IIxll>Ta/2
To this aim we first note that sUPOStST II Xt II S T~a and I x II> T 2a imply
x-X T2a
II ~ II> 2hT' OStST.
Now let CK be such that K(u) = 0 if II u II~ CK and let To such that r:; > CK
for, T :2: To.
X - Xt)
We have K ( h:;- = 0 for T:2: To, hence
{ SUP
09ST
II Xt liS T2a,
2
II x II> T2a} => { sup
Ilxll>T2a
IZT(x)1 = o} .
116 CHAPTER 4. KERNEL DENSITY ESTIMATION
then we have
THEOREM 4.11 Under the conditions of Corollary 4.6 except that A(f,.\)
is replaced by H and that hT rv T-'Y where ~ "I < fr
we have for all m ~ 1 id,
(4.73) -11 T
ogm
(T)t
-1
og
T sup Ih(x) - f(x)1
xEIRd
-+ 0 a.s.
Proof
As in the proof of Lemma 4.3 we consider decomposition (4.58) and we
apply inequality (1.26) .
p
The main task is to evaluate the variance of LYJn. First (4.39) yields
j=l
r IKhr(x -
~ 2 JIRd y)Khr(x - z)1 r CXJ Igu(y,z)ldudydz
Jo
~ 2 Jo+CXJ II gu 1100 du ( / IKI) 2 =: M
thus
4.3. OPTIMAL AND SUPEROPTIMAL 117
therefore
v 2 (q) < .J:.... M + II K 1100 c
- p8 h~
Concerning the second term in the bound (1.26) it takes the form
Finally
Zl,T = logm T
1 (T)
logT
1/2
(fT(X) - EJr(x))
thus Lemma 4.2 implies that Zl ,T --+ 0 a .s.
1
logm T
(T) logT
1/2
IEJr(x) - f(x)1 S c(r)
T1 /2 - -yr
logm T. (logT)1/2
118 CHAPTER 4. KERNEL DENSITY ESTIMATION
I
which tends to zero since I ;::: 2r'
where N(m) denotes a random vector with normal distribution N(O, 2::) .
(4.75)
4.5 Sampling
In continuous time, data are often collected by using a sampling scheme. Vari-
ous sampling designs can be employed . In the following we only consider three
kinds of deterministic designs: dichotomy, irregular sampling, admissible sam-
pling.
4.5. SAMPLING 119
4.5.1 Dichotomy
Consider the data (XjT/N ; j = 1, . . . , N) where N = 2n; n = 1, 2, ... T being
fixed. Such a design may be associated with the accuracy of an instrument
used for observing the process (X t ) over [0, T] .
O"N =
2
TI "L....(WjT/
,
N - W(j-l)T/N)
2
j=l
(4.76)
where
o < c ~ c' < 1 and 0 < a ~ 2.
Then if hN = N-"( (0 < I < 1) and if the kernel K satisfies
Ju 4 K(u)du < +00 we have
_ 1 N
where VN = N 2 h 2 L:VK (Xj/N /h N ),
N j=l
120 CHAPTER 4. KERNEL DENSITY ESTIMATION
CN =- N 22h2
N
~1(N -
j=1
j) JK (hU
N
) K (hV
N
) f(u)f(v)dudv,
RN =N
2
L
N-1 (
1- N
j) fj/N(O,O) ,
)=1
and
21l"
RN :::: -
1 1
- L.
N-1 (
1-
. )
2.
(N)0./2
--:-
1l".J2 N )=1
N J
1
which appears to be a Riemann sum for the function 0(1- u)u- a / 2 . Con-
1l"yc'
sequently
liminf RN:::: 1 (1
r;
1l"y c·1- -
1)
--a- - --a-
2- -
,0 < a:::; 2.
2 2
le au -11 ::; au (1 + a2u) (a> 0, u > 0) it is easy to check that TN tends to zero.
(4.78)
N-+oo
A 1
lim VJN(O) = -T
7r
lT
0
1- u
( 1 - P2( u ))1/2 du .
THEOREM 4.14 If (Xt, 0 ::::; t ::::; T) has cadlag sample paths, if K is uni-
formly continuous and if hN -+ hT then
(4 .79) - + /T(x) , x E Rd.
iN(X) N_oo
Proof
We have
iN(X) = l KhN(X - U)dJlN(U)
=l
and
/T(x) KhT(X - U)dJlT(U)
N
where JlN =~L 8(X j T/N) and JlT are empirical measures.
N j=1
Now let <p be a continuous real function defined on lR'. , then for all
w inn
( 4.80) -
in(x)=nhd~K
1 ~ (X -h X tj )
,x ER .
d
n j=1 n
(4.81)
Note that if (8 n ) and (8~) are both admissible then obviously 8~ !::: 8n ·
(1) g8,t = glt-81 exists for s 1= t and II gu 1100::; 1l"(u), U > 0 where (1+u)1l"(u) is
integrable over )0, +oo[ and u1l"(u) is bounded and ultimately decreasing.
Furthermore gu (-, .) is continuous at (x , x).
(2)
Proof
Let us begin with the following preliminary result :
( 4.82)
00
where Hn(y, z) = L ongi8..(Y, z)
i=l
In order to prove (4.82) note first that u7r(u) and 7r(u) are decreasing for U
large enough, U > Uo say.
Therefore
i - 1> 8;;-'UO
Now we have
00 n-1
Hn - G n = ""'
~ ongi8n + ~n~
""' iOngi8
n
i=n i=l
II Hn - G n 1100
+ ~
nUn
roo
Juo
U7r(u)du ,
124 CHAPTER 4. KERNEL DENSITY ESTIMATION
hence (4.82) since 7r(u) and (U7r(u)) are integrable and n8n -+ 00.
(4.83)
and
(4.84) n8n C n -+
roo
2 10 gu(x,x)du.
Now the bias is given by (4.15) and (4.16), then by using (4.83) and (4.84)
we obtain
a a"
E (J~(x) - f(x))2::; h d + a'h~r + -8
n n n n
where a, a' and a" are positive constants. Hence
a'" a"
E (J~(x) - f(x))2 ::; n2r/{2r+d) + n8n
4.5. SAMPLING 125
(4.85)
a
-.!. > ** _
Tn - E(fn f(x))
2
alii
2: n2r/(2r+d) - a
_ '" (Tn6' 2r)
_~
~
The following corollary provides the exact asymptotic quadratic error asso-
ciated with an admissible sampling.
(4 .86)
Proof:
Straightforward since the bias is given by (2 .9) . •
Note that if all the sample path (Xt,O ::; t ::; Tn) is available one obtains a
smaller constant, namely
(4.87)
THEOREM 4.16 Under the conditions of Theorems 4.10 and 4.11 where
= rp-d/2r
<: ,...,
Un .in hd
' n = Un ,
<:
an d r >d:
( 4.88) I
1
T,
(T,)
I:
1/2
sup If~(x) - f(x)1 --> 0 a.s.
ogm n og.in xE lRd nToo
Proof
Let us consider the random variables
Z
In
= K
hn
(x -hn Xj6 n ) _ EK
hn
(x -hn Xj6n )
'
1 <_ J' <_ n.
t
Then
f~(x) - Ef~(x) = .!. Zjn .
n j=l
2 1
V (q) ~ - +-
€
POn {in
with p{in -+ 00 since {in ,..., T;:d/2r and.!. > !i <=} r > d), we get the bound
2 2r
Finally
P logm Tn (Y:)
(
1
log;'n
1/2
If~(x) - Ef~(x)1 > 1/
)
Notes
BANON (1978) was the first who considered density estimation in continuous
time. In his pioneer work he studied the case of a stationary diffusion process by us-
ing a recursive estimator. Related results were obtained by 13ANON and NGUYEN
(1978, 1981), NGUYEN (1979), NGUYEN and PRAM (1980, 1981).
Most of the above results are obtained under the so called Rosenblatt G 2 -condition.
The strong mixing case has been investigated by DELECROIX (1980).
Results about intermediate rates has been established by BLANKE and BOSQ
(1995-1998) .
128 CHAPTER 4. KERNEL DENSITY ESTIMATION
Most of the results in this chapter seem to be new except of course Theorem
4.4, 2 and some simple results which belong to the folklore of density estimation in
continuous time.
Chapter 5
The main results are similar to those obtained in the previous chapter about
density estimation : if the process sample paths are irregular enough then a
parametric rate appears in regression estimation. This fact remains valid for
suitable sampled data and when nonparametric prediction is considered.
Optimal and superoptimal rates are studied in Sections 2, 3 and 5. Section 4
is devoted to limit in distribution. Section 6 deals with sampling and, finally,
applications to forecasting appear in Section 7.
Several of the proofs are not detailed or are omitted since they are easy
combinations of proofs given in Chapters 3 and 4.
129
130 CHAPTER 5. REGRESSION ESTIMATION
Assuming that the Zt'S have the same distribution with density fz(x, y),
we wish to estimate the regression function E(m(Yo) I Xo = .) given the data
(Zt, 0:::; t :::; T) .
f(x) =
ifR
r fz(x, y)dy
dl
and
cp(x) =
JRd
r m(y)fz(x, y)dy
l
,x E JRd.
We may use f and r.p for defining a version of the regression by setting
and
(5.4) 1 iT
CPT(X) = T 0 m(yt)Khr(X - Xt)dt
with lim hT
T->oo
= O( +).
Note that TT may be written under the suggestive form
e
where
Furthermore we put
T--+oo
1
c) lim sup -T r
J[O ,T]2nr O
ds dt = fro < 00.
M*b, (3). ai2)(lt - sl) :S ,It - sl-.6; (s, t) tf. r* where, > 0, (3 > O.
Proof
We first derive a preliminary result, namely
(5.9)
where x is omitted.
Concerning V CPT we may use the same method as for V IT in Theorem 4.2, we
obtain
(5.12) limsupTh~VcpT(x) S M 2f r ' / (x)jK 2.
T-oo
It remains to study the "pseudo-bias"
and X~ is similar.
Finally collecting (5.10), (5.11), (5.12) and (5.13) we get (5.8) with
(5.14)
(
ex) =
2(1 + M2)
cd I(x) (fr + f r ·)
j K
2 c4
+ 2J2(x) (T
2 2
(x)xf(x)
+ X~(x» .•
COROLLARY 5.1
The proof of Corollary 5.1 is analogous to the proofs of Corollaries 5.1 and 5.2.
In particular, given a sequence (Un, Vn ), n E Z of LLd. JRd+d' -valued random
variables one may construct the process
then for a suitable choice of Lo and (Un, Vn ), nEZ, (Zt, t E lR) belongs to Z
and satisfies
2 Cz
(5.16) Ez(rr(x) - r(z)(x)) rv T4/(d+4) ,
(5.16) shows that the "optimal rate" is achieved and (5.15) that better rates
are feasible. These rates are considered in the next section.
h(x',x") = r
J1o,+oo[
Igu(x' ,x")ldu.
5.3. SUPEROPTIMAL ASYMPTOTIC 135
(5.17) . ( ECPT(X))2
h~~:;!,pTE rT(x) - Efr(x) ~ CI(X) ,
(5.19)
TVcpT(X) = h~d loT (1 - f) Cov ( m(Yo)K (x ~TXO) ,
m(Yu)K ( X,,;u )) du,
where the covariance, say IU, may be written
IU = J m(YI)m(Y2)K (X-
~
Xl) K (x - X2) •
~ gu(XI,YI;X2,Y2)dxldX2dYldY2
therefore
(5.20) limsupTVcpT(x)
T--+oo
~ 2 1
0
+00
IGu(x,x)ldu .
136 CHAPTER 5. REGRESSION ESTIMATION
limsupTVJr(x) S 2
T-'>oo
10
+00
!9u(X,x)!du
. (ErpT(X»)2
iImsuPT-'>oo TE rT(x) - EJr(x)
(5.21)
4(M2 + 1) roo
S f(x) io
(!Gu(x , x )! + !9u(X,x)l)du
hence (5.17) .
h - 9u, G u , J tt exist, are bounded, continuous at (x, x), and II gu 1100 , II G u 1100
and II J u 1100 are integrable over 10, +00[.
(5 .23)
5.3. SUPEROPTIMAL ASYMPTOTIC 137
The proof is a combination of the proofs of Theorems 3.1 and 4.4.2 and is
therefore omitted.
lim sup
T-+oo
fr J[O ,Tj2
II Gs,t lip ds dt = Gp < +00.
In A"(p), p belongs to [1, +00] . In the case where G" ,t = Glt-sl, A"(p) is
satisfied as soon as II G u lip is integrable. In particular h implies A" (+00).
VcpT(X) = ;2 r
J[O,Tj2 xIR2d
KhT(X-xt}KhT(X-X2)Gs ,t(;rl,x2)dsdtdxldx2 .
and the rest is clear. The special cases p = 1 and p =, 00 may be treated
similarly. •
Note that the optimal rate is reached for p = 2 while the superoptimal rate
is achieved for p = +00.
138 CHAPTER 5. REGRESSION ESTIMATION
V'PT Cov(fT,'PT) ]
AT= [
Cov(fT,'PT) Vh
moreover we suppose that d = 1 and that ThTAT -> L a constant regular
matrix.
THEOREM 5.5 IfC' hold, f(x) > 0 and a(u) = G(e-,IL) b> 0) then the
choice hT = cT->' (C > 0, ~ < A < ~) entails
rT(x) - rex) ~ N
(5.26)
y'(u'ATu) (x)
COROLLARY 5.2
fT(X) w
(5.27) Vr(x) (rT(x) - rex»~ ----> N
where
(5.28) 1 1
VT(x) = hex) ThT
iT
0
2
m (Yi) K
-
(x---,;;:-
Xt ) 2
dt - rT(x) .
(5.29)
1
Lo gk T
(T T
Log
)2/(4+d)
sup IrT(x) - r(x)1 --> 0 a.s. .
xEA
1 1
2) If h hold, if d = 1, and if hT ~ T-"'( where 4: ::; , < 2 then for each
k 1
(T)
~
1 1/2
(5.30) -L T L
T sup IrT(x) - r(x)1 -----* 0 a.s ..
ogk og xEA
Proof (sketch)
Let us consider the decomposition
Et.pT Eh-h t.pT-Et.pT
rT - E h = rT Eh + Eh
and let us set M = max(l, II m(Yo) 1 00) and T] = infxEA f(x), then for T large
enough we have
Et.pT(X) I
sUPxEA IrT(x) - E h(x)
(5 .31 )
-LT
1 (T)
T
L
2/(4+d)
sup Ih(x) - Eh(x)1 --> 0 a .s._
ogk og xEA
A similar result may be established for t.pT . This can be done by using the
same scheme as in the density case (cf. Lemma 4.2, Lemma 4.3 and Lemma
140 CHAPTER 5. REGRESSION ESTIMATION
4.4) .
One finally obtains
---
1 (T
LogkT
---
LogT
)2/(Hd)
xE.c.
I
sup TT(X) -
E'PT(x)
( )
Efr x
I ---> 0 a.s.
5.6 Sampling
This section will be short because the reader can easily guess that regression
and density estimators behave alike when sampled data are available. Conse-
quently the results in section 4.4 remain valid.
If the data are X tp ' .. , Xtn with 0 < tl < ... < tn and min (tj+l -tj) 2:
l:'SJ:'Sn-l
m > 0 then the asymptotic quadratic and uniform errors are the same as that
of Tn studied in Chapter 3.
(5.32)
Then under conditions similar to these of Theorems 4.13 and 5.6 it may be
proved that On = T;;d/4 is admissible provided h n ~ T;;I/4, and that
where /::;. is any compact set such that inf f(x) > o.
xEll.
and the kernel regression estimator based on the data (Zt, 0 ::; t ::; T - H) .
The nonparametric predictor is
that is
(5.33)
o
where the kernel K has a compact support SK, is strictly positive over SK
and has continuous derivative. Note that these conditions together with left
continuity of paths entails that the denominator in (5.33) is strictly positive
with probability 1.
If the sample paths of (~d are regular, the rates are similar to those obtained
in Chapter 3, specifically in Theorem 3.5 and Corollary 3.1. We therefore focus
142 CHAPTER 5. REGRESSION ESTIMATION
(5 .34) LogkT
1 (T) LogT
1/ 2
[rT(~T) - r(~T)] IeTE~ ~ 0
Proof
Using (5 .13), (5.31) and (5.35) it is easy to realize that it is enough to
study the asymptotic behaviour of OT = E (suPXED.lfr(x) - Efr(xW) and
ofr, = E(suPxED.IIPT(x)-
EIPT(XW) . We only consider OT since or can be treated similarly.
Now we may and do suppose that b. = [0,1]' then using the condition
IK(x") - K(x')1 ::; £Ix" - x'i where £ =11 K'O 1100 we obtain
2£
(5 .37) sup Ifr(x) - Efr(x)l::; sup Ifr(xj) - Efr(xj)1 + k h2 '
xED. 1 ~ j~kT T T
kT
OT ::; 2y;
1+00 II
OTT
gu 1100 du + k28eh4
2
thus
OT = O(T- 1/ 2 )
and since the bias term is a O(T- 1 / 2 ) too, (5 .36) follows . •
The last result requires a stronger assumption: let us suppose that (~t)
is 'Prev-mixing (cf. subsection 3.3.3) and consider the predictor defined for T
large enough by
(j'+H = rT'(~T)
where T' =T - H - LogT . Log2T.
(5.39)
Consequently
O(j)
which proves (5.39) . •
Notes
N. CHEZE-PAYAUD (1994) has proved Corollaries 5.1 and 5.2, and Theorems
5.3 and 5.5. Results about quadratic error associated with admissible sampling may
be found in [CPl.
The other results have been obtained by the a uthor of the present work.
Chapter 6
In Section 1 we define local time and study its possible existence. The asso-
ciated estimator is defined in Section 2. Its consistency under mild ergodicity
conditions is presented in Section 3. Section 4 deals with parametric rates of
convergence, asymptotic normality and law of the iterated logarithm. Finally
a short discussion compares the local time estimator with the kernel estimator.
(6.1)
In the following" a.s." is in general omitted and we often write CT (x) instead
of CT(X,W).
By definition we have
1:
Then, by linearity and monotone convergence we get
1:
By using (6.4) it is easy to check that
6.1.2 Existence
The following statement gives two classical existence criteria for local time.
THEOREM 6.1
1) Let X = (Xt , t E [0,1)) be a measurable real process with absolutely
continuous sample paths. Then the condition
(6.10)
where Ix = {t : X t = x}.
2) Let X = (Xt, t E [0,1)) be a measurable real process. Then X admits a
square integrable local time (i .e. £1 E L2(>\ @ P)) if and only if
In the sequel we will use conditions which are slightly stronger then (6.11)
namely:
(A) fs,t(Y, z) is defined and measurable over (DC n [0, T])2) x U where U is
an open neighbourhood of D = {(x,x), x E JR}.
We now state a theorem concerning existence of local time with some reg-
ularity properties.
148 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR
THEOREM 6.2 If (A) and (B) hold then X has a local time PT such that
2
sup E(Zf(x)-PT(x)) -+0, a<b,KEKI (C).
a$x$b h_O
K(I) K(2») 2
(6.13) sup E ( Zh (x) - Zh' (x) (h, h/)~ (0,0) 0, a < b.
a$x$b
I hh
,
-1
K(l), ,K(2) (x ) -
JR2
Kh(1) ( x-u ) K h(2)
, ( x-v ) rT
D (
u,v )dUdv.
Hence
Actually f!f. does not depend on K since (6.14) implies that Zt;<I) and Zt;<2)
have the same limit whatever K(l) and K(2) in K 1 .
We now prove that a version of fT =: f!f. is a version of local time.
For this purpose we choose K = 1[_! ,+!] and consider (h n ) ! 0. To each
integer k we may associate a sequence (v(n, k), n ~ 1) of integers such that
v( n, k) i 00 and
zt;(,., n, k)(X,W) n ----->
---+ 00
fT(X,W)
(6 .15)
Now by using a lemma in [GH2] (p. 13,2°) and recalling that K = 1[_!,+!] we
conclude that fT is a (measurable) version of local time, defined over IR x no.
Finally it is straighforward to establish continuity in mean square for zt; and,
since zt; converges uniformly over compact sets, we deduce that fT is contin-
uous in mean square. •
2) If (A) and (B) hold and if iT is defined by (C) then Ef~ is a continuous
version of f ·
Proof
2) with probability 1
(6.19)
hence
(6.20)
and
(6.21) sup IfLT(B) - fL(B)1 -> 0 a.s.
BEB.
Proof
Without loss of generality we may suppose that
Xt = UtXO , t E JR
where UT~(W) = ~(Ttw), wEn and (Tt , t E JR) is a measurable group of
transformations in (n, A, P) which preserves P (cf. [DO]) .
1
g(x)dx = 1 EIgo(x)dx = EI 1 go (x)dx = 1 (a .s.)
152 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR
which means that, almost surely, the empirical measure J-Ln converges in varia-
tion to the measure v with density g .
On the other hand the ergodic theorem implies
Note that, in discrete time, (6.21) is not possible since the empirical mea-
sure J-Ln is orthogonal to J-L .
Proof
Continuity of f comes directly from (L). Now let us consider E: > 0 and
a =Xo < Xl < . . . <XN = b such that Wj(fli' 6) < E: ; i = 0, . .. ,N - 1 ; where
,
fli = [Xi, XHlJ and 6 = max(xH1 - Xi).
1 n-1
II f~ - f IIC[a,bl= max sup - Lf(j)(x) - I(x)
0$t$N-1 XE~i n j=O
1 n-1
:s; E: + max sup - L f(j)(x) - f(Xi)
0$i$N-1 XE~i n j=O
1 n-1
where Tn = max
0$i$N-1
L
;;:
j=O
f(j)(Xi) - f(xd and where '(jl denotes local time
1 n-1
II in - I IIC[a,bl :S E: + max
O$i$N - 1 XE~i
sUP;;: L
j=O
Wl(j) (fli' 6) + Tn ·
154 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR
By Theorem 6.3, Tn --> 0 a.s . On the other hand the sequence (Wi{j) (t.i' 8) ;
j = 0,1, . . .) is stationary. Then the ergodic theorem gives
-.- 0 T
limn I fn - f IIC[a,b]::::: E + O~X},~-l E Wf(o) (t.i' 8) a.s.
o
Ifr(x) 0
- f[T] (x)1 ::::: T I
1 tT(x) - t[T] (x ) I+ T[T] If[T](X)
0
- f(x) I + -T-f(x),
T - [T]
x E [a,b].
II f To - fO
[T]
I C[a,b] -< I t([T]) IIC[a ,b] + lifO _
[T] [T]
f II
C[a ,b]
+ II f IIC[a,b]
[T]
and finally (6.27) together with the ergodic theorem applied to ( II t(n) IIC[a,b])
give
II ff}. - f~] IIC[a,b] ---+ 0 a s. ., hence (6.26) . •
T-->oo
Proof
Let K E K 1 , from Theorem 6.2 we get
~ f
T J[O,TF
2
g.,t(x,x)dsdt = -T ((1 -
Jo
U
-T ) gu(x,x)du
hence (6.29) . •
1) If G(x) . Il
= hmsup -T
T-oo [O,TJ2
g.,t(x, x)dsdt < 00
we have
(6.30) lim sup TV f~(x) S G(x).
T ..... oo
we have
+00
Proof: clear . •
Note that E~}(x) = 0 (;2). Thus IT and I~ have the same asymptotic
efficiency and 4.48 and Theorem 4.5 show that I~ is asymptotically minimax.
We now give some converses of Theorem 6.2 and 6.5. First let us introduce
an additional condition concerning Is ,t :
(A') - (A) holds, Is,t is continuous at each point of D and there exists Xo E JR,
which does not depend on (s,t), s of- t, (s ,t) E DC n [O,T]2, such that
E (Zf:(xO))2 = r
J[O,Tj2 x [If?
Kh(XO - y)Kh( XO- z)Is ,t(y,z)dsdtdydz .
6.4. RATES OF CONVERGENCE 157
but fs,t is continuous at (xo,xo) (cf. (A')) thus the liminf is a limit, hence
Ef}(xo):::: r
J10,T]2
fs,t(xo,xo)dsdt
FT(y, z) = r
J10,T)2
fs,t(Y, z)dsdt::; r
J10,T)2
fs,t(xo, J:o)dsdt < 00,
Conversely j if (B) holds, then (A') and (B) imply (C) by Theorem 6.2. •
In order to obtain equivalence for the parametric rate we now introduce a con-
dition stronger than (A') :
(A") - f is continuous and bounded. fu(Y, z) is defined for all u f= 0, symmetric
with respect to u, measurable with respect to (u , Y, z), continuous over D and
such that II fu 1100= fu(xo,xo) where xo does not depend on u.
(D) - 1 +00
To
II gu 1100 du < 00 for some positive To ·
THEOREM 6.7 If X satisfies (A") and (D) the following conditions are
equivalent :
(1) 1 II
00
gu 1100 du < 00 (C.L. condition)
(2)
158 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR
T · Vfr(x) -+ e~Ko) , x E IR
T_""
T· Vf~(x) T-_oo
+ ex , x E IR
• Moreover :
e(K)
x = e(Ko)
x = ex
= u J +OO
9 (x , x)du ,
- 00
x E IR .
Proof
T· Vfr(x) = 21 TO
(1-~) (CPT(U) -1j;T)du+IT
where
CPT( u) r K~T(X - y)K~T(X -
I/{2 z)f,,(y, z)dydz,
1
(x , x), hence
+00
hence
TViTK (x)-Jr-.c xK -2
O O jTo
+OO
gu(x,x)du=:cI
thus
{TO
Jo Iu(xo,xo)du < 00,
and
{TO
Jo II gil. 1100 du < 00,
100 II
so that
gil. 1100 du < 00 .
This result, together with A" , imply A and B. So Theorem 6.2 entails C. Finally
(5 .12) is nothing but (5.8) . In particular Cx = [:00 gu(x, x)du, x E R .
• (4) => (1).
Since A" implies A' and C holds then B holds by Theorem 6.6. In particular
r~ {~
Jo fu(xo , xo}du ~ (T - To) Jo fu(xo, xo}du
{To {T
~ Jo (T - u)fu(xo, xo)du ~ J o (T - u)fu(xo, xo)du < 00,
160 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR
hence
{TO (TO
Jo II gu 1100 du ~ Jo II fu 1100 du + To II f II~ < 00
1 II
and finally
00
gu 1100 du < 00 .
The proof of Theorem 6.7 is therefore complete . •
1 1
if "(>-
T 2
LogT 1
(6.34) V h(O) rv if "(=-
T 2
T - 'Y 1
if " « -
2
Proof is straightforward and therefore omitted.
I:
where {3 > 1 and a > o.
. 2{3
(c) There eXIsts 6 > (3 _ 1 such that E£i(x) < 00, x ER
Note that (b) implies the existence of £T such that E£}(x) < 00. (c) is satisfied
by diffusion processes (see [BA-YR]) and more generally by Markov processes
(see lemma 6.3 below) under mild regularity conditions. If X is geometrically
strongly mixing, the condition for 6 becomes 6 > 2.
6.4. RATES OF CONVERGENCE 161
V
(6 .35) VT(f~(X1) - f(x1),. · ·, f~(xk) - f(xk)) --+ Nk rv N(O, r)
T->oo
r= [1+
-00
00
9U (Xi,Xj)dU]
1~i,j~k
.
Proof (sketch) :
As above we may suppose that T is an integer. On the other hand it suffices to
prove (6.35) for k = 1 since for k > 1 we can use the CRAMER-WOLD device
(cf. [BI2]). Finally theorem I. 7 gives the desired result . •
1 +00
-00
9u(X,x)du = L
+00
k=-oo
COV(£1(X),£(k)(X)), X E R
-00 9u(X,X)du)
\ 1/2
, then we have
0,1
STRASSEN set defined as S = {rp : [0,1] -> R : rp absolutely continuous,
1
rp(O) = rp/2(t)dt::; I}.
Proof
(6.36) is an easy consequence of STRA88EN ([8]) and RlO ([R2]) results .
•
162 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR
(6.37)
(6 .38)
1°
LEMMA 6.3 Let X = (Xt , t :::: 0) be a stationary Markov process such that
fs(Y , z) does exist for s > 0 and (y , z) ...... fs(y , z)ds < 00, (y,z) E 1R 2 for
some positive c > 0 and is continuous over 1R 2 . Then X satisfies (6.38) .
Proof is left to the reader.
We now state our theorem :
THEOREM 6.10 If (i) and (ii) hold and if x ...... E£T(x) is continuous then
(6.39) L
JT 0
T L L T1fr(x)-f(x)1 T----+ Da.s., XEIR
og . og og -> 00
We now turn to uniform convergence for which we need two additional condi-
tions :
(iii) inf EI£I(x) - f(x)12 > 0; (a, b) E 1R 2 , a < b
a~x~b
(iiii) Wl(I)([a,b],8)::; VI 8"1 , 8 > 0 where 'Y > 0 and where VI is an integrable
random variable which does not depend on 8.
THEOREM 6.11 If (i) ...... (iiii) hold and if x ...... E£T(a:) is continuous, then
for all (a,b) E ]R2, a < b
Now let us choose 8n = [:13] where f3 > 2~ (cf. (iiii)). We have the decompo-
sition
where V; is the r.v. associated with Wi(i) in (iiii). Note that such a V; does
! -'Yi3
exist since X is stationary. Now €;;:18J rv n L ------> 0 as n -4 00
Logn· ogLogn
1
and the ergodic theorem gives -
n
L V;
n
------>
a.s.
EV1 ·
;=1
Consequently
p (suP€~1If~(j8n)
j
- 2~) ,TJ > 0
f(j8n )1 > TJ) S n i3 nogogn
Now we have
hence
(6.47) E~l max sup If(x) - f(j8 n )1 ------> O.
J j6n~x~(j+l)6n
Finally (6.45), (6.46), (6.47) together with (6.44) imply €;;:l II f~ - f 11------> 0
a.s. which in turn implies (6.42) . •
6.5. DISCUSSION 165
6.5 Discussion
We have seen that f¥ has many interesting properties : in particular unbi-
asedness and asymptotic efficiency for a large class of processes. The kernel
estimateur ff is not unbiased and does not reach parametric rate for all con-
tinuous densities. Note that theorem 6.2 and definition of LT imply that
(a)
EWiT(h):SCTh>', h>O
where WiT is the modulus of continuity of R.T and Cr > 0 and A > 0 are
constants.
(b)
EIXt - X,I :S dTlt - sl', (8, t) E [0, T]2
where dT > 0 and I > 0 are constants.
(6.50)
166 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR
Proof
First we consider for all x
E I n I 00_
8(1) < II Th21100
K'
n
~
~
i=1
l iTln
(i-1)Tln
ElK.Tln - X tIdt
·
thus
(6.51) E 118(1) I < dT II K'lloo~.
n 00_ 1+1 n'Yh~
~
T
rK(Z)[£T(X - hnz) - £T(x)]dz.
iIR
Hence
18~2)(x)1 ::; ~ l K(z)wiT(hnlzl)dz, x E lR
(6.52)
6.5. DISCUSSION 167
Theorem 6.12 shows that the local time point of view gives new insight on
this kernel estimator since the choice of bandwith will be influenced by this
approximation aspect.
Notes
The role of local time in density estimation has been first noticed by NGUYEN and
PHAM (1980). KUTOYANTS (1995,1997) has studied the unbiased estimator (6.32).
See also DOUKHAN and LEON (1994), BOSQ (1997). Concerning approximation
of local time we refer to AZAIS - FLORENS (1987) and DAVYDOV (1997, 1998)
among others. Apart from Theorem 6.1 results in this Chapter are new . They appear
in BOSQ-DAVYDOV (1998) except Theorem 6.12 which is original.
Chapter 7
Implementation of
nonparametric method and
numerical applications
169
170 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS
For positive (t 's an often used method is the so-called BOX and COX trans-
formation defined as
TA((t) = T
(A _ 1
, A> 0
1
L (t·
n
where (n =-
n t=1
where (/Jt) is a slowly varying function (the "trend component"), (O't) a pe-
riodic function with known period 7 (the "seasonal component") and (~t) a
stationary zero mean process.
If /J and 0' have a parametric form, their estimation may be performed using
least square method. Suppose for instance that
(7.2)
and that
(7.3)
where
O'kt = l{t=k(modr)}i k = 1, . . . ,7.
7.1 . IMPLEMENTATION OF NONPARAMETRlC 171
L
T
(7.4)
Now, given the data 1]1, . .. ,1]n, the least square estimators of a1,' .. ,
a p , C1, ... ,CT are obtained by minimizing
n
The elimination of J.tt and O"t is an alternative technique which seems prefer-
able to estimation because it is more flexible :
1 q
(7.5) Pt = 2 + 1 L 1]t+j , q + 1 ::; t ::; n- q
q j=-q
and
Vk7]t = V (V k - l 1]t) ,k ~ 1,
then if J.tt has the polynomial form (7.2) we get
(7.6)
In the general case where both trend and seasonality appear, the first step
is to approximate the trend by using a moving average which eliminates the
seasonality. If the period T is even, one may set q = ~ and put
where Vj denotes the average of the quantities 1]j+iT-ilj+iT , q < j+iT ~ n-q.
Then, considering the artificial data 1]t = Ct (where Ct = Ck if t = k(mod
T)), one obtains a model with trend and without seasonality which allows the
use of (7.5) .
Note that differencing may also be used for seasonality. Here the difference
operator is given by
\lT1]t = 1]t - 1]t-T ·
Clearly all the above techniques suffer the drawback of perturbating the
data. Thus, if St = f.1.t + at does not vary too much the model (3 .36) may be
considered.
Some theoretical results (cf. [EP]) show that the choice of K does not much
influence the asymptotic behaviour of fn or rn : the naive kernel, the normal
kernel and the Epanechnikov kernel are more or less equivalent.
a) Plug-in method
The best asymptotic choice of hn at a point x is given by (2.7) : if
hn = cn n- 1/ 5 where en -> c > 0 and if assumptions of Theorem 2.1 hold,
then
f"2(X))-1/5 ((JU2K(U)dU?)-1/5
(7.9) eo(x) =( f(x) f K2
Now, it may be easily proved that
(7.11)
1 ,.2
For that purpose we may choose K(u) = /iCe- T and consider the case
y27r
1 x2
where f(x) = /iCe-2,;'I. Then co(f) may be approximated
ay27r
1 ) 1/2
by an = ( ;:;-~(~t - ~n)2 , and a convenient choice of hn is
(7.12)
(7.13)
where ~(1)' ... '~(n) denotes the order statistics associated with 6, ... '~n.
(7.14)
Now the final estimates f~ and f~* are constructed from in and i~ by
setting
(7.15)
and
(7.16) h~' = (2J?r) 1/5 (J J:?(x)) -1/5
n- 1/ 5
hence
(7.17)
and
(7.18)
**
f (x)
n
1 n
= - - ""'
nh**
n L
1
- - exp (
t=l
1 x - ~t
--(- -
'27r
V"!;7r
2 h**
n
)2) '
x E JR.
7.1. IMPLEMENTATION OF NONPARAMETRlC 175
b) Cross-validation
If the regularity of f is unknown one can employ an empirical maximum
likelihood method for the determination of h.
Let us suppose again that K is the normal kernel and consider the em-
pirical likelihood
n
L(h) = II fn,h(~t) ,h > 0
t=l
where
f n,h (x) = :h ~ K ( x ~ ~s ) ,x E lR
-
where
(t) () 1 ~ (~t ~s)
fn - 1,h ~t = (n _ 1)h ~K ---,;- .
s,pt
We now have
and
Then the empirical maximum likelihood estimate iin does exist, hence
the estimate
(7.19) _
fn(x} 1 L
= --- n
-v'27r 1 (x
1 exp (-- _~i 2)
--_. )
,xER
nhn t=l 2 hn
The above methode being based on the maximum likelihood is not robust
and difficult to manipulate when data are correlated.
176 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS
Let us now suppose that the observed process (et) is a-mixing. We intend
to specify an optimal hn with respect to the measure of accuracy
with
t ( ) 1 ~ (x -
fn x = h;:y(t) ~ K - h -
e8 ) ,(t - s)
, is a given function such that ,(0) = 0, ,(x) = 1 if x> I/n , 0 :=:; ,(x) :=:; 1 if
n
x:=:; I/n , where I/n is a positive integer and ;:y(t) = L ,(t - s). Here, defines
8=1
a smooth leave-out procedure.
where Hn is a finite subset of [eln-a, C2n-b] with Cl > 0, Cz > 0, 0 < b :=:;
1 1
--<a<--.
2r + 1 2r + ~
Under some regularity conditions HART and VIEU have proved that hn is
asymptotically optimal in the following sense :
ISE(h n ) ~ 1
infhEHn ISE
we refer to [GHSV] for a complete proof of this result.
Conclusion
Note first that other interesting methods are discussed in [BE-DE], partic-
ularly the double kernel method.
Now, the comparison between the various methods is somewhat difficult. It
should be noticed that the normal kernel (or the EPANECHNIKOV kernel)
and hn = O'nn- 1 / 5 are commonly used in practice and that they provide good
results in many cases for constructing f n or r n '
7.2. PARAMETRIC AND NONPARAMETRIC PREDICTORS 177
7.1.4 Prediction
The nonparametric predictor comes directly from the regression estimator
where (K, h n ) is chosen as indicated in the previous Section.
(7.20)
In the general case it is necessary to find a suitable k (or kN, see 3.4.1). For
convenience we suppose that (~t) is a real process and H = 1. Now let us
consider
2
L
N
(7.21 ) b.N(k) = (~t -ft(k)) , 1 ~k~ ko
t=no
where no and ko are given and ft(k) stands for the predictor of ~t based on the
data 6, ... , ~t-l and associated with the regression E( ~t I (~t-l' ... , ~t-k) = .).
We finally obtain the predictor defined by (3.34). Note that the above
method remains valid if the process is not stationary provided the data should
be of the form (3.36). Otherwise one can stationarize the process by using the
methods indicated in the previous Sections of the current Chapter.
where (€t) is a white noise (Le. the €t'S are i.i.d. and such that 0 < (}"2 = E€~ <
00, E€t = 0) and m ; ¢1, "" ¢p ; 8 1, .. " 8q are real parameters.
If the polynomials
and
8(z) = 1 - 81 z - ... - 8qz q
have no common zeroes and are such that ¢p8q # 0 and ¢(z)8(z) # 0 for Izl :s: 1
then (7.22) admits a unique stationary solution.
Now the BOX and JENKINS (BJ) method mainly consists of the following
steps:
2. Identification of (p,q).
EMO =~ t
t=n-k+I
Iet et- ft I
• The EMP defined as
EMP= ~
k
t
t=n-k+l
i§ii&
where ft is the predictor of et constructed from the data 6, ·· . ,et-l
and £it the empirical quantile associated with the theoretical quantile qt
defined by
The NP-predictor is better than BJ 12 times out of 1'1 for the EMO and 14
times out of 17 for the EMP.
180 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS
and
with
q p'
a; = ao + I: aieLi + I: {3ja;_j ,t E Z
;=1 j=l
where ao, aI, ... , a q" {31, ... , {3p' are real positive parameters such that
q p'
I: ai + I: {3j < l.
i=l j=l
If the conditional distribution of et given et-1, et-2, . .. is gaussian, (~t) is
strictly stationary.
Concerning the robust nonparametric predictors they are based on a-truncated
mean and estimators of the conditional median and the conditional mode.
Here we only describe the conditional mode predictor. It is constructed
by considering a kernel estimator of the conditional density of ~t given ~~~)1 =
(~t-1' ... ,~t-k)' namely
n
I: h;;I Ko (h;;l(y - ~t)) K1 (h;;l(x - ~I~)l))
(7.23)
!n(Ylx) = t=k+l n
I: Kl (h;;l(X_~~~)l))
t=k+1
Y E IR, x E IRk
(7.24) ~ ( Y I ~n(k)) .
X n• +1 = argm:xfn
The method for choosing parameters is the same as in 7.2.3. The comparisons
appear in figures 2 and 3. Parametric (resp. theoretical) and nonparametric
forecasts are more or less equivalent.
7.2. PARAMETRIC AND NONPARAMETRIC PREDICTORS 181
4S0 I
400
II
BJ prediction~
350 data
)00
I
250
200
150 NP predictions
100
133 134 11S 136 131 138 09 14() 141 142 14) 144
Life insurance
BJ : SARIMA (3,0,0) (0,1, 1h2
NP : Conditional mode k = 6
Figure 2
1008 T
1006 . theoretical
1004 predictions
996
992
1
990
994
988
986 + - - - t - I- - - + 1 - - + 1- - t - I- - - i l - - + I - - t - I- - - i l - - - - I I
91 n m ~ ~ % n fl 99 ~
(7.25)
where m =I 0 and (J' > 0 are constants and where (Wt ) is a standard Wiener
process. The initial condition Xo is supposed to be constant and strictly posi-
tive.
(7.26)
Parametric models are useful but the nonlinear character and the complex-
ity of financial evolutions allow to think that, in some situations, nonparametric
methods are well adapted to their analysis.
The author shows that these correlations are not well explained by linear
correlation coefficients, Principal Component Analysis or ARMA models.
Here we only give the results concerning the variations of french ten years
yields.
The nonparametric predictor is constructed with the Gaussian kernel (figure
1) or the EPANECHNIKOV kernel (figure 2), kn = 14 (or 15) and hn is chosen
7.3. SOME APPLICATIONS TO FINANCE AND ECONOMICS 183
2l
2{1
r \ ,.. ....\
Il I' '~J
,\ " \~;- - - ".,1 \" . . '
\
' ...... ,
10 " \
·l
· 10
·11
II
10
-l
- 10
~ Il
(7.27)
Notes
Among the great quantity of methods for implementing non parametric estimators
and predictors we have chosen those which are often used in practice.
Subsections concerning stationarization are inspired by GUERRE (1995) and
BROCKWELL and DAVIS (1991) . (7.12) appears in DEHEUVELS and HOMINAL
(1980) when (7.13) comes from BERLINET and DEVROYE (1994)
The smooth leave out procedure is in GYORFI, HARDLE, SARDA and VIEU
(1989) . A discussion concerning cross validation may be found in BRONIATOWSKI
(1993). See also MARRON (1991).
The numerical applications presented here appear in CARBON and DELECROIX
(1993), ROSA (1993), POGGI (1994), MILCAMPS (1995).
7.4. ANNEX 185
7.4 Annex
Table 1
Table 2
186 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS
Table 3
Table 4
7.4. ANNEX 187
Table 5
Table 6
188 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS
Table 7
Table 8
7.4. ANNEX 189
Table 9
Table 10
190 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS
Table 11
Table 12
7.4. ANNEX 191
Table 13
Table 14
192 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS
Table 15
Table 16
7.4. ANNEX 193
5 137.3 ~{62
10 529.2 630
20 44.5- fi6'
ft = 25 49.0 fi6'
Table 17
French car registrations (april 1987 - september 1988)
t Xt Xt BJ Xt NPk =36
1 192.1 183.1 197.1
2 156.7 173.8 179.1
3 151.2 170.5 180.7
4 195.9 161.9 167.9
5 146.1 136.3 138.7
6 129.6 134.4 144.1
7 232.3 189.1 195.9
8 197.8 190.0 192.5
9 208.9 193.9 204.6
10 160.6 148.6 156.3
11 160.0 153.9 164.5
12 218.0 206.2 221.3
13 189.0 188.6 197.1
14 184.0 181.3 185.6
15 141.6 179.8 195.5
16 210.0 163.6 160.3
17 157.1 133.9 135.4
18 146.7 134.4 141.1
Table 18
194 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS
-- data
- predictions BJ
. . . predictions NP
240
I'
.:,
, \
l~
\ -
/.'
I
I' ,
"\
.
\
\
\ ....
120 L-~ _____ ~~ ___ ~ ___ ~
- errors BJ
- - - errors NP
150
...
.. ,
, ,, ,
-dO "
- data
predictions
- - - ± 3 standard deviation
<1.5
~
ro;: <1
'"g' 3.5 \ ... /
~ - .. " /1
, ' - I I
3
2 . 50L---l~O---2~O---3~O---<1~0--~5~0---5~O~-~7~0~-~5~0~-~9~0~-~100
- data
predictions
- - - ± 3 standard deviation
/
, / ...
..... - , / " I \,
.... - ."
10 20 30 <10 50 60 70 50 90 100
[A-Z J ARAK T. and ZAIZSEV A. (1988). Uniform limit theorems for sums
of independent random variables. Publ. Steklov math. institute, 1.
[A-G JASH R.B. and GARDNER M.F. (1975). Topics in stochastic processes.
Academic Press.
[BW-PR J BASAWA I.V. and PRAKASA RAO B.1.S. (1980) . Statistical inference
for stochastic processes. Academic Press.
[B-D 1 BROCKWELL P.J. and DAVIS R.A. (1991). Time series: theory and
methods. Springer Verlag.
[DK-LN 1DOUKHAN P. and LEON J. (1994). Asymptotics for the local time of
a strongly dependent vector-valued Gaussian random field . Universite de
Paris-Sud, Mathematiques 94.3l.
[GY 1GYORFI L. (1997). How far one can learn probability law from data?
Pub!. ISUP, Vo!' 41, Fasc. 3, p. 3-20.
[GHSV 1GYORFI L., HARDLE W., SARDA P., VIEU P. (1989). Nonparametric
Curve Estimation from Time Series. Lecture Notes in Statist. Springer
Verlag.
[IB 1 IBRAGIMOV LA. (1962). Some limits theorems for stationary Pro-
cesses. Theor. Prob. Appl. 7, 349-382.
[NG-PHI 1NGUYEN H.T. and PHAM D.T. (1980) . Sur l'utilisation du temps local
en Statistique des processus, C.R. Acad. Sci. Paris, 290, A, 165-170.
[PH-TIl PHAM D.T . and TRAN L.T. (1985). Some strong mixing properties of
time series models. Stoch. Proc. App!. 19, 207-303.
[PH-T2 1 PHAM T .D. and TRAN L.T. (1991). Kernel Density Estimation under
a locally Mixing condition - in Nonparametric Functional Estimation and
Related Topics. Ed. G. ROUSSAS NATO ASI SERIES V. 335, 419-430.
lSI I SILVERMAN B.W. (1986). Density estimation for Statistics and Data
Analysis . Chapman and Hall.
[ST-TR I STONE C.J. and TRUONG YK. (1992). Nonparametric function esti-
mation involving time series, Annals of Statist., 20, 1, 77-97.
[SU I STOUT W .F . (1974). Almost sure convergence. Academic Press.
[SR I STRASSEN V. (1964) . An invariance principle for the law of the iterated
logarithm. Z. fUr Wahrscheinlichkeit. unt Geb. 3, 211-226.
[SE I STUTE W. (1982) . Alaw of the logarithm for kernel density estimators.
Ann. Probab. 10, 414-422.
[TR1 I TRAN L.T. (1989) . The £1 convergence of kernel density estimates
under dependence. The Canad. J. of Statist. 17, 2, 197-208.
[TR2 I TRAN L.T. (1990). Kernel density and regression estimation for depen-
dent random variables and time series. Techn. report . Univ. Indiana.
[TR3 I TRAN L.T . (1993). Nonparametric function for time series by local
average estimators. Ann. Statist. 21(2), 1040-1057.
[T-S I TRUONG YK. and STONE C.J . (1992) . Nonparametric function esti-
mation involving time series. Ann. Statist. 20, 77-98.
Absolute regularity 18
Adaptive methods 14
Admissible sampling 14, 122, 140
a-mixing 7, 18
ARCH 1, 178
ARMA process 1, 177, 179, 180
Asymptotic normality 9, 11, 36, 54, 75, 80, 89, 118, 138, 160-161
Autoregressive processes (infinite dimensional) 14
B
tJ-mixing 18
Berbee's lemma 19
Bernstein's inequality 24
Billingley's inequality 22
Black-Scholes formula 182
Bochner's lemma 44, 100
Borel-Cantelli lemma in continuous time 108
Box-Cox transformation 170
Box-Jenkins (method) 1, 177
Bradley's lemma 20
C
Cadlag 90, 121
Cars registrations 184
Central limit theorem 36
Chaos 86
Chaotic data 57
Conditional mode predictor 180
Consistency of local time density estimator 150
Coupling 19
Covariance inequalities 20, 21, 22
Cramer's conditions 24
Cross validation 175-176
Cynical method 172
208 INDEX
D
Davydov's inequality 21-22
Density kernel estimator 3, 42, 90
Deseasonalization 14, 170-172
Dichotomy 119, 140
Differencing 13
Differentiable sample paths 104
Diffusion process 101
Double kernel method 176
Dynamical system 86
E
Electricity consumption 184
Elimination of trend and seasonality 171-172
Empirical measure 12, 42, 68
Epanechnikov kernel 42, 176, 182
Ergodic 150, 153
Ergoduc theorem 151, 152, 154
Errors in variables (processes with) 64-65, 86
Exogeneous variables 15, 177
Exponential type inequalities 7, 24-33
F
<p- mixing 18
<prev-mixing 80, 143
Forecasting : see Prediction
Full rate 101
G
GARCH 180
Gaussian process 10, 19, 74, 100, 104, 122, 153, 160
General stationary processes (prediction for) 81
Geometrically strongly mixing (GSM) processes 46, 90
H
Histogram 3
Hoeffding's inequality 24
I
Implementation of nonparametric method 169-177
Intermediate rates 102-107, (minimaxity of) 107-108
Interpolator 85
Irregular sampling 121
Iterated logarithm (functional law of) 161
INDEX 209
K
Kernel 39
Kernel of order (k, A) 90
Kolmogorov extension theorem 4
Kutoyants theorem 102 L
Large deviations inequalities 7, 24-33
Law of large numbers 34-35
Linear process 18, 46
Local time 145, 146
Local time estimator 149
Local time for semimartingales (existence) 146
Logistic trend 83-84
M
Markov process 76, 141
Markov process of order k 76
Martingale 82
m-dependent 19
Minimax 9, 46, 97, 101, 102, 107, 108
Minimaxity of intermediate rates 107-108
MISE 91
Mixing 7,17
Mixture 4
N
Naive kernel 3, 40
Nonparametric predictor 1, 6, 76, 82, 141, 172, 177
Nonstationary process (prediction for) 82
o
Optimal rate 8, 93
Occupation measure 145
Ornstein-Uhlenbeck process 122
Outliers 84
p
p-adic Process 58
Parametric rate 10
Parametric predictors 177, 180
Periodic 83
Plug-in method 173
Pollution 184
210 INDEX
Pseudo-regression 84
Q
Quadratic error (asymptotic) 43, 69, 91, 125, 155
R
Rate 33
Regression kernel estimator 69, 130
Regression with error 86
Regressogram 5
Rio's inequality 20
Robust 5, 14, 180
s
Sampling 14, 15, 118, 140
SARIMA process 1
Seasonality 13, 170
Semi parametric 14
Similarity 13
Singular distribution 61
Stationary process 7
Statistical error of prediction 6
Superoptimal rate 10, 98, 104, 116, 136, 155, 157
T
Trend 170
Two-a-mixing 19
u
Unbiased density estimator 3, 12, 150, 156
Uniform convergence 8, 10, 11, 46, 72, 108, 139, 153, 162
v
Variance (stabilization of) 169
w
Wavelets 15, 127
y
Yields 2, 182.
Lecture Notes in Statistics Vol. 80: M. Fligner, J. Verducc i (Eds.), Probability Models
and Statistical Analyse$ for Ranking Data. xxii, 306 pages,
For infonnation about Volumes 1 to 61 1992.
please contact Springer-Verlag
Vol . 81 : P.Spirtes, C. Glymour, R. Scheines, Causation,
Vol. 62: J.e. Akkerboom, Testing Problems with Linear or Prediction, and Search. xxiii, 526 pages, 1993 .
Angular Inequality Constraints. xii, 291 pages, 1990.
Vol. 82: A. Korostelev and A. Tsybakov, Minimax Theory
Vol. 63 : J. Pfanzagl, Estimation in Semi parametric Models : of Image Reconstruction. xii, 268 pages, 1993.
Some Recent Developments. iii, 112 pages, 1990.
Vol. 83: e. Gatsonis, J. Hodges, R. Kass, N. Singpurwalla
Vol. 64: S. Gabler, Minimax Solutions in Sannpling from (Editors), Case Studies in Bayesian Statistics. xii, 437 pages,
Finite Populations. v, 132 pages, 1990. 1993.
Vol. 65: A. Janssen, D.M. Mason, Non-Standard Rank Vol. 84: S. Yamada, Pivotal Measures in Statistical
Tests. vi, 252 pages, 1990. Experiments and Sufficiency. vii, 129 pages, 1994.
Vol 66: T. Wright, Exact Confidence Bounds when Vol. 85 : P. Doukhan, Mixing: Properties and Examples. xi,
Sannpling from Small Finite Universes. xvi, 431 pages, 142 pages, 1994.
1991.
Vol. 86: W . Vach, Logistic Regression with Missing Values
Vol. 67: M.A. Tanner, Tools for Statistical Inference: in the Covariates. xi, 139 pages, 1994.
Observed Data and Data Augmentation Methods. vi, 110
pages, 1991. Vol. 87: J. Milller, Lectures on Random Voronoi
Tessellations.vii, 134 pages, 1994 .
Vol. 68: M. Taniguchi, Higher Order Asymptotic Theory for
Time Series Analysis. viii, 160 pages, 1991. Vol. 88: 1. E. Kolass., Series Approximation Methods in
Statistics. Second Edition, ix, 183 pages, 1997.
Vol. 69: N.J.D. Nagelkerke, Maximum Likelihood
Estimation of Functional Relationships. V, 110 pages. 1992. Vol. 89: P. Cheeseman, R.W. Oldford (Editors), Selecting
Models From Data: AI and Statistics IV. xii, 487 pages,
Vol. 70: K. lida, Studies on the Optimal Search Plan . viii, 1994.
130 pages, 1992.
Vol. 90: A. Csenki, Dependability for Systems with a
Vol. 71: E.M.R.A. Engel, A Road to Randomness in Partitioned State Space: Markov and Semi-Markov Theory
Physical Systems. ix, 155 pages, 1992. and Computational Implementation. x, 241 pages, 1994.
Vol. 72: J.K. Lindsey, The Analysis of Stochastic Processes Vol. 91 : J .D.Malley, Slatistical Applications of Jordan
using GUM. vi, 294 pages, 1992. Algebras. viii , 101 pages, 1994.
Vol. 73: B.C. Arnold, E. Castillo, 1.-M. Sarabia, Vol. 92: M. Eerola, Probabilistic Causality in Longitudinal
Conditionally Specified Distributions. xiii, 151 pages, 1992. Studies. vii, 133 pages, 1994 .
Vol. 74 : P. Baronc, A. Frigessi, M. Piccioni, Stochastic Vol. 93: Bernard Van Cutsem (Editor), Classification and
Models, Statistical Methods, and Algorithms in Image Dissimilarity Analysis. xiv, 238 pages, 1994.
Analysis. vi, 258 pages, 1992.
Vol. 94 : Jane F. Gentleman and G .A. Whitmore (Editors),
Vol. 75 : P.K. Goel, N.S. Iyengar (Eds.), Bayesian Analysis Case Studies in Data Analysis. viii, 262 pages, 1994.
in Statistics and Econometrics. xi, 410 pages, 1992.
Vol. 95: Shelemyahu Zacks, Stochastic Visibility in
Vol. 76: L. Bondesson, Generalized Gamma Convolutions Random Fields. x, 175 pages, 1994.
and Related Classes of Distributions and Densities. viii, 173
pages, 1992. Vol. 96: Ibrahim Rahimov, Random Sums and Branching
Stochastic Processes. viii , 195 pages, 1995.
Vol. 77: E. Mannmen, When Does Bootstrap Work?
Asymptotic Results and Simulations. vi, 196 pages, 1992. Vol. 97: R. Szekli, Stochastic Ordering and Dependence in
Applied Probability. viii, 194 pages, 1995.
Vol. 78: L. Fahnncir, B. Francis, R. Gilchrist, G.Tutz
(Eds.), Advances in GUM and Statistical Modelling: Vol. 98: Philippe Barbe and Patrice Bertail, The Weighted
Proceedings of the GUM92 Conference and the 7th Bootstrap. viii, 230 pages, 1995 .
International Workshop on Statistical Modelling, Munich.
13-17 July 1992. ix, 225 pages, 1992. Vol. 99: C.e. Heyde (Editor), Branching Processes:
Proceedings of the First World Congress. viii, 185 pages,
Vol. 79: N. Schmitz, Optimal Sequentially Planned Decision 1995.
Procedures. xii, 209 pages, 1992.
Vol. J00: Wlodzimierz Bryc, The Nonnal Distribution:
Characterizations with Applications. viii, 139 pages, 1995.
Vol. 101: H.H . Andersen, M.H.jbjerre, D. S.rensen, Vol. 119: Masanao Aoki , Arthur M. Havenner, Applications
P.S.Eriksen, Linear and Graphical Models: for the of Computer Aided Time Series Modeling. ix, 329 pages,
Multivariate Complex Normal Distribution. x, 184 pages, 1997 .
1995.
Vol. 120: Maia Berkane, Latent Variable Modeling and
Vol. 102: A.M. Mathai, Serge B. Provost, Takesi Hayakawa, Applications to Causality. vi, 288 pages, 1997.
Bilinear Forms and Zonal Polynomials. x, 378 pages, 1995 .
Vol. 121 : Constantine Gatsonis. James S. Hodges, Robert E.
Vol. 103: Anestis Antoniadis and Georges Oppenheim Kass, Robert McCulloch, Peter Rossi , Nozer D.
(Editors), Wavelets and Statistics. vi, 411 pages, 1995. Singpurwalla (Editors), Case Studies in Bayesian Statistics,
Volume Ill. xvi, 487 pages, 1997.
Vol. 104: Gilg U.H. Secber, Brian J. Francis, Reinhold
Hatzinger, Gabriele Steckel-Berger (Editors), Statistical Vol. 122: Timothy G. Gregoire, David R. Brillinger, Peter J.
Modelling: 10th International Workshop, Innsbruck, July Diggle, Estelle Russek-Cohen, William G. Warren, Russell
10-14th 1995. x, 327 pages, 1995 . D. Wolfinger (Editors), Modeling Longitudinal and
Spatially Correlated Data. x, 402 pages, 1997 .
Vol. 105 : Constantine Gatsonis, James S. Hodges. Robert E.
Kass, Nozer D. Singpurwalla(Editors), Case Studies in Vol. 123: D. Y. Lin and T. R. Fleming (Editors),
Bayesian Statistics, Volume II. x, 354 pages, 1995 . Proceedings of the First Seattle Symposium in Biostatistics:
Survival Analysis. xiii, 308 pages, 1997.
Vol. 106: Harald Niederreiter, Peter Jau-Shyong Shiue
(Editors), Monte Carlo and Quasi-Monte Carlo Methods in Vol. 124: Christine H. MOller, Robust Planning and
Scientific Computing. xiv, 372 pages, 1995. Analysis of Experiments. x, 234 pages, 1997 .
Vol. 107: Masafumi Akahira, Kei Takeuchi, Non-Regular Vol. 125: Valerii V. Fedorov and Peter Hackl , Model-
Statistical Estimation. vii, 183 pages, 1995. oriented Design of Experiments. viii , 117 pages, 1997.
Vol. 108: Wesley L. Schaible (Editor), Indirect Estimators in Vol. 126: Geert Verbeke and Geert Molenberghs, Linear
U.S. Federal Programs. viii, 195 pages. 1995 . Mixed Models in Practice: A SAS-Oriented Approach . xiii,
306 pages, 1997.
Vol . 109: Helmut Rieder (Editor), Robust Statistics, Data
Analysis. and Computer Intensive Methods . xiv. 427 pages. Vol. 127: Harald Niederreiter, Peter Hellekalek, Gerhard
1996. Larcher, and Peter Zinterhof(Editors), Monte Carlo and
Quasi-Monte Carlo Methods 1996, xii, 448 pp., 1997.
Vol. 110: D.Bosq, Nonparametric Statistics for Stochastic
Processes, Second Edition. xxvii, 214 pages, 1998 . Vol. 128: L. Accardi and c.c. Heyde (Editors), Probability
Towards 2000, x, 356 pp. , 1998.
Vol. III: Leon Willenborg, Ton de Waal, Statistical
Disclosure Control in Practice. xiv, 152 pages, 1996. Vol. 129: Wolfgang Hardie, Gerard Kerkyacharian,
Dominique Picard, and Alexander Tsybakov , Wavelets,
Vol. 112: Doug Fischer, Hans-J. Lenz (Editors), Leaming Approximation, and Statistical Applications, xvi, 265 pp.,
from Data. xii, 450 pages, 1996. 1998 .
Vol. 113: Rainer Schwabe, Optimum Designs for Multi- Vol. 130: Bo-Cheng Wei. Exponential Family Nonlinear
Factor Models . viii, 124 pages, 1996. Models, ix, 240 pp .. 1998.
Vol. 114: c.c. Heyde , Yu. V. Prohorov, R. Pyke, and S. T. Vol. 131: Joel L. Horowitz, Semiparametric Methods in
Rachev (Editors), Athens Conference on Applied Econometrics, ix, 204 pp., 1998.
Probability and Time Series Analysis Volume I: Applied
Probability In Honor ofJ,M. Gani. viii, 424 pages, 1996. Vol. 132: Douglas Nychka, Walter W. Piegorsch, and
Lawrence H. Cox (Editors), Case Studies in Environmental
Vol. 115: P.M. Robinson , M. Rosenblatt (Editors), Athens Statistics, viii, 200 pp., 1998 .
Conference on Applied Probability and Time Series
Analysis Volume II: Time Series Analysis In Memory of Vol. 133 : Dipak Dey, Peter MOller, and Debajyoti Sinha
E.J. Hannan. viii, 448 pages, 1996. (Editors), Practical Nonparametric and Semi parametric
Bayesian Statistics, xv , 408 pp. , 1998.
Vol . 116: Genshiro Kitagawa and Will Gersch, Smoothness
Priors Analysis of Time Series. x, 261 pages, 1996, Vol. 134: Yu. A. Kuloyants, Slatisticallnfercncc For Spatial
Poisson Processes, vii, 284 pp ., 1998.
Vol. 117: Paul Glasserrnan, Karl Sigman, David D. Yao
(Editors), Stochastic Networks. xii, 298, 1996. Vol. 135: Christian P. Robert, Discretization and MCMC
Convergence Assessment, x, 192 pp., 1998.
Vol. 118: Radford M. Neal, Bayesian Leaming for Neural
Networks. xv, 183, 1996.