You are on page 1of 219

Lecture Notes in Statistics 110

Edited by P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg,


I. Olkin, N.Wennuth, S.Zeger
Springer Science+Business Media, LLC
D. Bosq

Nonparametric Statistics
for Stochastic Processes

Estimation and Prediction

Second Edition

, Springer
D. Bosq
Universite Pierre et Marie Curie
Institut de Statistique
4 Place Jussieu
75 252 Paris cedex 05
France

Llbrary of Congress Catalog1ng-ln-Publ1catlon Data

Bosq, Denls, 19a9-


Nonparametrlc statlstlcs for stochastlc processes , estimatlon and
predlctlon I D. Bosq. -- 2nd ed.
p. cm. -- (Lecture notes In statistlcs ; 110)
Includes blbltographical references and tndex.
ISBN 978-0-387-98590-9 ISBN 978-1-4612-1718-3 (eBook)
DOI 10.1007/978-1-4612-1718-3
1. Nonparametrtc stattsttcs. 2. Stochasttc processes.
a. Esti.atlon theory. I. Tttle. II. Serles, Lecture -notes in
statlstlcs (Springer-Verlag) ; v. 110.
Oa278.8.B67 1998
519.5·4--dc21 98-28496

Printed on acid-free paper.

© 1998 Springer Science+Business Media New York


Originally published by Springer-Verlag New York, Inc. in 1998

AlI rights reserved. This work may not be translated or copied in whole or in part without the written
permission ofthe publisher Springer Science+Business Media, LLC, except for brief excerpts in connection
with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval,
electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former
are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks
and Merchandise Marks Act, may according1y be used freely by anyone.

Camera ready copy provided by the author.

9 876 5 4 3 2 1

ISBN 978-0-387-98590-9 SPIN 10687139


To MARlE, CAMILLE, ROMANE and LUCAS.
Preface to the first edition

Recently new developments have taken place in the theory of nonpara-


metric statistics for stochastic processes. Optimal asymptotic results have
been obtained and special behaviour of estimators and predictors in con-
tinuous time has been pointed out.

This book is devoted to these questions. It also gives some indica-


tions about implementation of nonparametric methods and comparison
with parametric ones, including numerical results. Ma.ny of the results
presented here are new and have not yet been published, expecially those
in Chapters IV, V and VI.

Apart from some improvements and corrections, this second edition con-
tains a new chapter dealing with the use of local time in density estimation.

I am grateful to W. Hardie, Y. Kutoyants, F. Merlevede and


G. Oppenheim who made important remarks that helped much to improve
the text.

I am greatly indebted to B. Heliot for her careful reading of the manus-


cript which allowed to ameliorate my english. I also express my gratitude to
D. Blanke, L. Cotto and P. Piacentini who read portions of the manuscript
and made some useful suggestions.

I also thank M. Gilchrist and J. Kimmel for their encouragements.


My aknowlegment also goes to M. Carbon, M. Delecroix, B. Milcamps
and J .M. Poggi who authorized me to reproduce their numerical results.

My greatest debt is to D. Tilly who prepared the typescript with care


and efficiency.
Preface to the second edition

This edition contains some improvements and corrections, and two new
chapters.

Chapter 6 deals with the use of local time in density estimation. The
local time furnishes an unbiased density estimator and its approximation
by a kernel estimator gives new insight in the choice of bandwidth.

Implementation and numerical applications to Finance and Economics


are gathered and developed in Chapter 7.

It is a pleasure to thank the readers who have offered useful comments


and suggestions, in particular the anonymous reviewers of this second edi-
tion. I am also indebted to Springer-Verlag for their constant support and
assistance in preparing this edition.
Contents
Preface to the first edition vii
Preface to the second edition ix
SYNOPSIS 1

1. The object of the study 1

2. The kernel density estimator 3


3. The kernel regression estimator and the induced predictor 5
4. Mixing processes 7

5. Density estimation 8
6. Regression estimation and prediction 11
7. The local time density estimator 12
8. Implementation of nonparametric method 13

CHAPTER 1. Inequalities for mixing processes 17


1. Mixing 17
2. Coupling 19
3. Inequalities for covariances and joint densities 20

4. Exponential type inequalities 24

5. Some limit theorems for strongly mixing processes 33


Notes 39
CHAPTER 2. Density estimation for discrete time
processes 41
1. Density estimation 42
2. Optimal asymptotic quadratic error 43
xii CONTENTS

3. Uniform almost sure convergence 46

4. Asymptotic normality 53

5. Nonregular cases 57

Notes 65
CHAPTER 3. Regression estimation and prediction
for discrete time processes 67

1. Regression estimation 67

2. Asymptotic behaviour of the regression estimator 69

3. Prediction for a stationary Markov process of order k 75

4. Prediction for general processes 81

Notes 87

CHAPTER 4. Kernel density estimation for continuous


time processes 89

1. The kernel density estimator in continuous time 89

2. Optimal and superoptimal asymptotic quadratic error 91

3. Optimal and superoptimal uniform convergence rates 108

4. Asymptotic normality 118

5. Sampling 118
Notes 127

CHAPTER 5. Regression estimation and prediction


in continuous time 129

1. The kernel regression estimator in continuous time 129

2. Optimal asymptotic quadratic error 131


3. Superoptimal asymptotic quadratic error 134
4. Limit in distribution 138
5. Uniform convergence rates 139

6. Sampling 140
7. Nonparametric prediction in continuous time 141
Notes 144
CONTENTS xiii

CHAPTER 6. The local time density estimator 145

1. Local time 145


2. Estimation by local time 149
3. Consistency of the local time density estimator 150
4. Rates of convergence 154
5. Discussion 165

Notes 167

CHAPTER 7. Implementation of nonparametric method


and numerical applications 169
1. Implementation of nonparametric method 169

2. Comparison between parametric and nonparametric predictors 177


3. Some applications to Finance and Economics 182

Notes 184

4. Annex 185

References 197

Index 207
Notation

AC, Au B, An B complement of A, union of A and B, intersection of A and


B.
o _
A , A interior of A, closure of A.

(n, A, P) Probability space: n non empty set, A a-Algebra of subsets of n,


P Probability measure on A.

BIRd a-Algebra of Borel sets on ]Rd.

a(Xi , i E I) a-Algebra generated by the random variables Xi, i E I.


i.i.d. r.v.'s independent and identically distributed random variables.

EX, V X, Px, Ix expectation, variance, distribution, density (of X).

E(X I B), E(X I Xi, i E I), VeX I B), vex I Xi, i E I) conditional expecta-
tion, conditional variance (of X), with respect to B or to a(Xi , i E I).

Cov(X, Y), Corr(X, Y) covariance, correlation coefficient (of X and Y) .

8(a) , B(n,p), N(m,a 2 ),)..d Dirac measure, Binomial distribution, normal dis-
tribution, Lebesgue measure over ]Rd.

(X t , t E I) or (X t ) stochastic process.
C([a, b]) Banach space of continuous real functions defined over [a, b], equipped
with the sup norm.

LP(E,B,/-l) (or LP(E), or LP(B), or LP(/-l)) space of (classes) of real B - BIR


measurable functions I such that
xvi NOTATION

II f ilp= (/elfIPdJ.L) lip < +00 (1 :s: p < +00),


II f 1100= inf{a : J.L{f > a} = O} < +00 (p = +00) .
r f(x)dx integral of f with respect to Lebesgue measure on Rd.
JJRd
lA indicator of A: lA(x) = 1, x E A ; = 0, x rf. A .

Logkx defined recursively by Logk(x) = Log(Logk_lX) ifLogk_lx ;::: e, Logk(x) =


1 if Logk_lx < e; k ;::: 2.

[xl integer part of x.

f 0 g defined by (f 0 g) (x, y) = f(x)g(y) .

Un ~ Vn or Un S:' V n . There exist constants Cl and C2 such that


o < Cl Vn < Un < C2 Vn for n large enough.
Un ---> 0.
Vn

Un :s: CVn for some c> 0.

~ weak convergence.

~ convergence in probability.

~ almost sure convergence.


q.m. .
---> convergence m mean square .

• end of a proof.

~E cardinal of E.
Synopsis

S.l The object of the study


Classically time series analysis has two purposes. One of these is to construct a
model which fits the data and then to estimate the model's parameters. The
second object is to use the identified model for prediction.

The popular so-called BOX-JENKINS approach gives a complete solution


of the above mentioned problems through construction, iclentification and fore-
casting of an ARMA process or more generally a SARIMA process (cf. [B-J],
[G-M], [B-D]).

Unfortunately the underlying assumption of linearity which supports the


B-J's theory is rather strong and, therefore, inadequate in many practical sit-
uations.

That inadequacy appears in the forecasts, especially jf the horizon is large.


Consicleration of nonlinear parametric models, like bilinear or ARCH processes,
does not seem to give a notable improvement of the forecasts.

On the contrary a suitable nonparametric predictor supplies rather precise


forecasts even if the underlying model is truly linear and if the horizon is re-
mote. This fact explains the expansion of non parametric methods in time series
analysis during the last decade. Note however that parametric and nonpara-
metric methods are complementary since a parametric model tries to explain
the mechanism which generates the data.

It is important to mention that the duality highlighted at the beginning of


the current section is not conspicuous in a non parametric context because the
underlying model only appears through regularity condit.ions whereas estimat-
ing and forecasting are basic.
2 SYNOPSIS

Figure 1 gives an example of comparison between nonparametric and para-


metric forecasts . Other numerical comparisons appear in Chapter 7.

IJ

10

.J

- nonpar3metric p,edictor
·10
AR..'v(A predictor

·Il .....-----------~------------
~y'i'

Forecasting of french ten years yields


The nonparametric predictor gives better indications
about signs of variation
Figure I
In this book we present optimal asymptotic results on density and regression
non parametric estimation with applications to prediction, as well in discrete
time as in continuous time.

We also try to explain why nonparametric forecasts are (in general) more
accurate than parametric ones. Finally we make suggestions for the implemen-
tation of functional estimators and predictors.

Note that we do not pretend to provide an encyclopaedic treatment of non-


parametric statistics for stochastic processes. Actually our work focuses on
density estimation by kernel and local time methods and prediction by kernel
method.

Now the rest of the synopsis is organized as follows. In 8.2 we construct the
kernel density estimator. The kernel regression estimator and the associated
predictor are considered in 8.3. The mathematical tools defined in Chapter
1 are described in 8.4. 8.5 deals with the asymptotic behaviour of the kernel
density estimator (cf. Chapters 2 and 4). 8.6 is devoted to the convergence of
regression estimators and predictors (cf. Chapters 3 and 5). In 8.7 we point
out the role of local time in density estimation for continuous time processes.
Finally 8.8 discusses sampling, and practical considerations (cf. Chapter 7) .
S.2. THE KERNEL DENSITY ESTIMATOR 3

8.2 The kernel density estimator


We now describe the popular kernel density estimator. For the sake of sim-
plicity we first suppose that the data XI, .. . ,Xn come from a sequence of real
independent random variables with a common density f belonging to some
family F.

If F is large (for example if F contains the continuous densities) it is well


known that no unbiased estimator of f can exist (see [ROI]) . This is due to
the fact that the empirical measure

is not absolutely continuous with respect to Lebesgue measure. On the other


hand the supremum of the likelihood is infinite.

Then, a primary density estimator should be the histogram defined as


~ Vnj . .
fn(x) = ( ) , x E Inj , J E Z,
n an,j - an,j-l
where Inj = [an,j_I,an,j[ and (anj' j E Z) is a strictly increasing sequence
n
such that lajnl-> 00 as Ijl-> 00, and where Vnj = L l[a n ,j _ l,a n ,j[(Xi ),
i=l

If f is continuous over Inj and if an,j -an,j-l is small, then !n(x) is close to
f(x) for each x in I nj , However this estimator does not utilize all the informa-
tion about f(x) contained in data since observations which fall barely outside
Inj do not appear in !n(x). This drawback is particularly obvious if x = an,j-l.

A remedy should be the construction of an adaptable histogram defined


as
fn*() vn(x) , x
x = -h- E ~,
n n
n
where vn(x) = L l[x_~,x+~](Xi) and where hn is a given positive number,
i=l

Note that f~ may be written under the form

where Ko = 1[_!,+!] is the so-called naive kernel.


4 SYNOPSIS

The accuracy of f~ depends heavily on the choice of the "bandwidth" h n .


This choice must conciliate two contradictory requirements : the smallness of
[x - h2n , x+ h2n] and a large number of observations falling in this interval.

Since Evn(x) ~ nhnf(x) (provided hn be small and f(x) > 0) we obtain


the conditions :

If the X;'s are JRd-valued, f~ is defined as

and (C1 ) becomes

Now in order to obtain smoother estimations, one can use other kernels (a
kernel on JRd is a bounded symmetric d-dimensional density such that II u lid
K(u) ----> 0 as II u 11----> 00 and f II u 112 K(u)du < 00). Let K be a kernel, the
associated kernel estimator is

fn(x) = nh~
1
8 (x - Xi)
n
K --,;;:-

u2
-- 1
For example if d = 1 and if K(u) = f<Ce 2, u E R then
v 27r
1 n 1 .(x_X.)2
fn(x) = - 2:--e-2 ~ , x ER
n i=l h n V'2-ff

which is a mixture of Gaussian densities with respective means Xi and variance


h~ .

We now consider the case where the data are realizations of a stochas-
tic process (Xd. In that case the Kolmogorov extension theorem states that
the distribution v of a stochastic process is completely specified by its finite-
dimensional distributions (cf. [A.GJ). Thus the general problem of estimating
v reduces to the estimation of these. To this aim, it is convenient to estimate
the associated densities if they do exist.
S.3. THE KERNEL REGRESSION ESTIMATOR AND PREDICTOR 5

If (Xd is a discrete time process one may use In as well. Finally if (X t , t E


JR) denotes a d-dimensional continuous time process observed over the time
interval [0, TJ, the kernel density estimator is defined by setting

Jr(x) = ~ r
ThT 10
T
K (x -hTXt) dt , x E JRd

where hT is a given positive number.

For other classical methods of density estimation we refer to [PRJ. The


method of kernels has several advantages : it is natural, easy to compute and
robust; moreover it reaches optimal rates as we shall see in Chapter 4.

S.3 The kernel regression estimator and the


ind uced predictor
In the context of regression estimation the analogue of the histogram is the
so-called regressogram : let us consider LLd. bidimensional random variables
(X 1, Y1 ), ... , (Xn , Yn ) such that a specified version r of the nonlinear regression
of Y; on Xi does exist :

r(x) = E(Y; I Xi = x) , x E JR .

The regressogram has the following form

, x E Inj , j E Z
L IJnj(X i)
i=l

where (Inj, j E Z) is defined in 8.2. Note that Tn is defined only if


n

i=l

Tn(X) may be interpreted as an estimator of E(Y; I Xi E I nj ). It clearly


suffers from the same drawback as the histogram and the remedy is similar.
This brings us to define the kernel regression estimator (cf. [NA], [WAD as
6 SYNOPSIS

where (K, h n ) is defined in S.2. The definition remains valid if (Xi, Yi) is
lR d x lR-valued and if data are dependent.

Now, if (Xt, Yt), t E lR, is a lR d x lR-valued continuous time process observed


over the time interval [0, T], the kernel regression estimator is defined as

Let us now turn to prediction. For the sake of simplicity we consider a


real square integrable Markov process (et, t E Z) and the data ,en . Theel, .. .
problem is to construct a statistical predictor of en+H where the horizon
H is a strictly positive integer.

Such a predictor, say €n+H , is an approximation of r(en) = E(en+H I en)


based on 6, ... ,en; and the statistical error of prediction is defined as
E(€n+H -r(en))2.

Note that the total error of prediction is


2 2 2
~
E(en+H - en+H) = E(en+H ~
- r(en)) + E(r(en) - en+H)

where the last term is structural and consequently cannot be controlled by the
statistician.

Now let us consider the bidimensional process (Xt , Yt) = (et, et+H), t E Z
and the associated regression estimator rn-H defined above. It induced a
natural nonparametric predictor via the formula

Similarly if (et, t E lR) is a real square integrable Markov process observed


over [0, Tj , the nonparametric predictor of eT+H associated with rT-H is given
by
rT - H et+HK (eThT- et) dt
Jo
I
~
eT+H = T- H K (eTh~ et) dt
SA. MIXING PROCESSES 7

The final purpose of this book is the asymptotic study of fn+H' &+H and
of some more general predictors.

8.4 Mixing processes


In order to obtain rates of convergence for functional estimators it is necessary
to have measures of dependence between the observed variables at one's dis-
posal.

Some measures of that type are introduced in Chapter 1. The most impor-
tant should be the strong mixing coefficient (cf. [ROj). For the sake of clarity,
let us introduce it in a stationary context.

Let us recall that X is said to be (strictly) stationary if, for any integer
k and anytb ... ,tk, S inZ one has p(x tt+St· .. , x tk+S )=p(x tt,··" x)·
tic

Let X = (Xt, t E Z) be a strictly stationary process, its strong mixing


coefficient of order k is defined as

a(k) = sup IP(B n C) - P(B) P(C)I , k ~ 1


B E a(X., s ::; t)
C E a(Xs, s ~ t + k)

For such a process a(k) does not depend on t. Now X is said to be strongly
mixing (or a-mixing) if lim a(k) = O. This condition specifies a form of
k-oo
asymptotic independence of the past and future of X.
Classical ARMA processes are strongly mixing with coefficients which de-
crease to zero at an exponential rate.

Now if X is strongly mixing it is possible to derive some useful covariance


inequalities. An example is the following: if Y E LOC(a(Xs, s ::; t»
and
Z E LOO(a(Xs, s ~ t + k»
then

ICov(Y, Z)I ::; 4 II Y 110011 Z 1100 a(k).

Among these inequalities the sharper is due to RIO (d. [RI]1993). RIO's
inequality is optimal in some sense (see (1.9) and Theorem 1.1).

Other important outcomes of the strong mixing condition are large de-
viation inequalities. An accurate lemma of BRADLEY (Lemma 1.2) gives
the "cost" of the replacement of dependent random variables by associated
8 SYNOPSIS

independent ones. Using this result and exponential type inequalities for inde-
pendent variables it is thus possible to establish large deviation inequalities for
strongly mixing processes.

As an example we give the following (cf. Theorem 1.3). Let (Xt, t E Z)


be a zero-mean real-valued strictly stationary bounded process. Then for each
integer q E [1, %]
and each E > 0

where

and a ( [~] ) is the strong mixing coefficient of order [~] .


This inequality allows to derive limit theorems for strongly mixing processes
(cf. Theorems 1.5,1.6,1.7).

8.5 Density Estimation


S.5.1 Discrete case
Chapter 2 deals with density estimation for discrete time processes. The main
problem is to achieve the optimal rates, that is the same rates as in the LLd .
case.

First it can be shown that, under some regularity assumptions, if f is twice


differentiable and if (Xl, t E Z) satisfies a mild mixing condition then, for a
suitable choice of (h n ),
n 4/ (d+4) E{fn(x) - f(x)f --+ c
where c is explicit (Theorem 2.1). Thus the optimal rate in quadratic mean is
achieved. The proof uses the covariance inequality stated in S.4.

Concerning uniform convergence it may be proved that for each k :::: 1 we


have
L )2/(d+4»)
sup Ifn(x) - f(x)1 =0 ( Logkn ( ogn a.s.
xERd n
S.5. DENSITY ESTIMATION 9

(cf. Corollary 2.2). This result is (almost) optimal since the uniform rate of

convergence in the LLd. case is 0 ( (L~n) 2/(d+4)).


Here the main assumption is that (Xd is strongly mixing with a(k) :::;
apk (a > 0, 0 < p < 1), and the proof uses the large deviation inequality
presented in SA.

We also establish the following weak convergence result (Theorem 2.3) :

( hd)1/2 ( !n(Xi) - !(Xi) 1 < . < ) ~~ N(m)


n n (fn(Xi))1/2 II K 112' - t - m

where N(m) has the m-dimensional standard normal distribution. Note that
the precise form of this result allows to use it for constructing tests and confi-
dence sets for the density.

Here a(k) = O(k- 2 ), and the proof utilizes the BRADLEY lemma quoted
in SA.

The end of Chapter 2 is devoted to the asymptotic behaviour of !n in some


unusual situations : chaotic data, singular distribution, processes with errors
in variables.

S.5.2 Continuous case


The problem of estimating density by the kernel method for continuous time
processes is investigated in Chapter 4.

The search for optimal rates is performed in a more general setting than in
discrete time, here! is supposed to be k times differentiable with kth partial
derivatives satisfying a Lipschitz condition of order >- (0 < >- :::; 1). Thus the
number r = k + >- characterizes the regularity of f. In that case it is interesting
to choose K in a special class of kernels (cf. Section 4.1).

Then it can be shown that under mild regularity conditions


lim sup sup T 2r /(2r+d)E x (fT(x) - f(x))2 < +00
T->oo XEX, xElRd

where Xl denotes a suitable family of continuous time processes (Corollary


4.2). Furthermore the rate T- 2r / (2r+d) is minimax (Theorem 4.3) .

Now this rate is achieved if the observed sample paths are slowly varying,
otherwise the rate is more accurate.
10 SYNOPSIS

The phenomenon was first pointed out by CASTELLANA and LEADBET-


TER in 1986 (cf. [C-L]). The following is an extension of their result : if the
density f(x" x,) exists for all (s , t), s of. t and if for some p E [1, +ooj we have

(Cp) lim sup ~


T-.oo
r
J]O,TF
I f(x" x,) - f 0 flip dsdt < (X)

then
sup E(fr(x) - f(x))2 =0 (T-pr/(pr+d»)
xERd

(Theorem 4.6); in particular if (Coo) holds then

sup E(fr(x) - f(x))2


xERd
=0 (-T1)
1
From now on will be called "superoptimal rate" or "parametric
T
rate".

Condition (Cp ) first measures the asymptotic independence between Xs and


X t when It - sl is large, second, and above all, the local behaviour of f(x"x,)
when It - sl is small.

If p is large enough (p > 2) the local irregularity of the sample paths fur-
nishes additional information. This explains the improvement of the so called
"optimal rate" .

The situation is especially simple in the Gaussian case : if (Xt ) is a real


stationary Gaussian process, regular enough and if K is a strictly positive
kernel, then Corollary 4.4 entails the following alternative:

• If 1€ (EIX" - XO)2)-1/2 du < (X) then E(fT - f? = 0 (~)

• If 1€ (EIX" - Xo12) -1/2 du = (X) then T E(fT - f)2 --4 00 .

In particular if (X t ) has differentiable sample paths the superoptimal rate is


not achieved.

Now the same phenomenon appears in the study of uniform convergence:


using a special Borel-Cantelli lemma for continuous time processes (cf. Lemma
4.2) one can obtain an optimal rate under mild conditions, but also a superop-
timal rate under stronger conditions. In fact it can be proved that

:~fd Ifr(x) - f(x)1 =0 (( L ~T)1/2 LogkT ) a.s., k::::: 1 .


S.6. REGRESSION ESTIMATION AND PREDICTION 11

8.6 Regression estimation and prediction


S.6.1 Regression estimation
Contrary to the density, the regression cannot be consistently estimated uni-
formly over the whole space.
This because the magnitude of rex) for II x II large is unpredictable. However it
is possible to establish uniform convergence over suitable increasing sequences
of compact sets (d. Theorem 3.3).

Apart from that, regression and density kernel estimators behave similarly.

For example, under mild conditions we have

where c is explicit (Theorem 3.1). Proof of this result is rather intricate since
it is necessary to use one of the exponential type inequalities established in
Chapter 1, in order to control the large deviations of r n - r .

Concerning uniform convergence, a result of the following type may be


obtained (Theorem 3.2) :

sup Irn{x) - r(x)1 = 0 ( (LOgn)a)


2/(dH) a.s.
xES n

where S is a compact set and a is a positive number.

Now, in continuous time, the following result is valid (Corollary 5.1)

lim sup sup T 4/(4+d) Ez(rT(x) - r(z)(x))2 < 00


T .... oo ZEZ

where Z is a suitable family of processes an where Ez denotes expectation with


respect to P z and r(Z) the regression associated with Z.

Similarly, as in the density case, if the sample paths are irregular enough
the kernel estimator exhibits a parametric asymptotic behaviour, namely

T · E(rT(x) - r(x))2 --t C

where c is explicit (Theorem 5.3).

Finally it may be proved that rn and rT have a limit in distribution which


is Gaussian (d. Theorem 3.4 and 5.5 and Corollary 5.2).
12 SYNOPSIS

S.6.2 Prediction
The asymptotic properties of the predictors fn+H and fT+H introduced in S.3
heavily depend on these of the regression estimators which generate them. De-
tails are given in Chapters 3 and 5.

Here we only indicate two noticeable results which are valid under a 'Prev-
mixing condition (a condition stronger than a-mixing).

Firstly ~n+H is asymptotically normal and consequently one may construct


a confidence interval for ~n+H (Theorem 3.7).

Secondly, modifying slightly fT+H one obtains a new predictor, say ~T+H
such that for each compact interval t..

thus the non parametric predictor ~T+H reaches a parametric rate. This could
be a first explanation for the efficiency of nonparametric prediction methods.
Other explanations are given in Section 8.

S.7 The local time density estimator


Let X = (Xt , t E JR) be a real continuous time process. Its (possible) local
time may be defined (almost everywhere) as

iT(X) = lim!.x
010 c;
{t :0 :s t :s T, IXt - xl < :.}
2
where .x denotes Lebesgue measure.

Thus I~ = i: is the density of empirical measure J.LT, that is

Consequently I~ appears as a natural estimator of density.

This estimator has many interesting properties, in particular it is unbiased


(lemma 6.2) . Now if sample paths are irregular then iT is square integrable and
I~ reaches the same parametric rates as the kernel estimator IT (theorems
6.5 and 6.10) . It also satisfies the central limit theorem and the law of iterated
logarithm.
S.B. IMPLEMENTATION OF NONPARAMETRIC METHOD 13

Finally it should be noticed that fr is an approximation of f~. In fact we


have>. a.e.
o
fr(x) =
. 1 (
~m Th Jo K
(x- -hXt)
- dt.

A more useful approximation is given by the kernel estimator associated


with discrete data (cf. Theorem 6.12) . This approximation gives new insight
on the kernel estimator since the choice of bandwith may be influenced by this
aspect.

S.8 Implementation of nonparametric


method
S.8.1 Stationarization
The first step of implementation consists in transformations of the data in order
to obtain stationarity. This can be performed by removing trend and season-
ality after a preliminary estimation (cf. 3.5.2).

However, the above technique suffers the drawback of perturbating the data.
Thus it should be better to use simple transformations as differencing (cf. 3.5.2)
or affine transformations (cf. [PO]).

In fact it is even possible to consider directly the original data and use them
for prediction! For example if (~n, n E Z) is a real square integrable Markov
process, the predictor fn+H introduced in 8.3 may be written as

n-H
fn+H = L Pin~HH
i=l

K C\~~i)
where Pin = H ; i = 1, ... , n
~ K(~\~~i)
thus ~n+H is a weighted mean and the weight Pin appears as a measure of
similarity (cf. [PO]) between (ei,eHH) and (~n,en+H)' In other words the
nonparametric predictor is constructed from the "story'? of the process (et).
Consequently trend and seasonality may be used to "tell this story".

Asymptotic mathematical results related to that observation appear in sub-


section 3.4.2 (see Theorem 3.8).
14 SYNOPSIS

S.8.2 Construction
The construction of a kernel estimator (or predictor) requires a choice of K
and hn . Some theoretical results show that the choice of reasonable K does
not much influence the asymptotic behaviour of In or Tn.

On the contrary the choice of h n turns to be crucial for the estimator's


accuracy. Some indications about this choice are given in subsection 3.5.3.

Note that, if the observed random variables are one-dimensional, the normal
x2
1 --
kernel K(x) = rn=e 2 and hn = (Tn n- 1/ 5 (where (Tn denotes the empirical
v21l"
standard deviation) are commonly used in practice (cf. appendix).

S.8.3 Sampling
The problem of sampling a continuous time process is considered in Sections
4.4 and 5.6.

The most important concept is "admissible sampling" given a process


(Xt, t E R) with irregular paths, we have seen that superoptimal rates are
achieved by nonparametric estimators. For such a process we will say that a
sampling is admissible if it corresponds to the minimal number of data pre-
serving the superoptimal rate (in mean square or uniformly).

Theorem 4.12 and 4.13 state that if X Cn ' X 2Cn , ... ,Xncn are observed (with
On -> 0 and Tn = nOn -> (0) then On = T;;d/2r is admissible provided h n =
rp-l/2r
.Ln •

S.8.4 Advantages of nonparametric methods


One may summarize the advantages of non parametric methods as follows :

1) They are robust,

2) Deseasonalization of data is not necessary,

3) In some situations parametric rates are achieved.

Now we do not pretend that the nonparametric kernel method is a "panacea".


In discrete time, general "adaptive" methods may be considered (cf. [BIJ-[MAJ
for the LLd. case). In continuous time, a new method is considered in [B02]
where continuous time processes are interpreted as infinite dimensional au-
toregressive processes. Semiparametric techniques are also of interest (see for
S.B. IMPLEMENTATION OF NONPARAMETRlC METHOD 15

example [RB-ST]).

Concerning the near future of nonparametric we finally enumerate some


important topics: study of orthogonal series estimators and predictors (in par-
ticular wavelets), image reconstruction, errors in data, presence of exogeneous
variables, sampling, estimation and prediction in large dimension, use of local
time type estimators ....
Chapter 1

Inequalities for mixing


processes

In this chapter we present some inequalities for covariances, joint densities and
partial sums of stochastic discrete time processes when dependence is mea-
sured by strong mixing coefficients. The main tool is coupling with indepen-
dent random variables. Some limit theorems for mixing processes are given as
applications.

1.1 Mixing
In the present paragraph we point out some results about mixing. For the
proofs and details we refer to the bibliography.

Let (st, A , P) be a probability space and let 13 and C be two sub <T-field of
A. In order to estimate the correlation between 13 and C various coefficients
are used:

• a = a(13,C) = sup jP(BnG)-p(B)P(G)j,


BE13
GEC
• (3={3(13 ,C)=E sup jP(G)-P(Gj13)j,
GEC
• r.p = r.p(13 , C) = sup jP(G) - P(G j B)j ,
B E 13, P(B) > 0
GEC

17
18 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES

• p = p(8, C) = sup Icorr(X, Y)I .


X E £2(8)
Y E £2(C)
These coefficients satisfy the following inequalities :

(1.1) 2a ~ (J :s: <p

(l.2) 4a :s: p :s: 2<pl/2.


Now a process (Xt, t E Z) is said to be a-mixing (or strongly mixing) if

where the "sup" may be omitted if (Xd is stationary. Similarly one defines
,6-mixing (or absolute regularity), cp-mixing and p-mixing.

By (l.I) and (1.2) we have the following scheme:

<p-mixing ==? (J-mixing


.IJ. .IJ.
p-mixing ==? a-mixing

It can be shown that the converse implications do not take place.

As an example, consider the linear process


+00
(1.3) Xt = L ajEt_j , t E Z
j=O

where aj = O(e- rj ) , r > 0 and where the Et ' S are independent zero-mean real
random variables with a common density and finite second moment. Then the
series above converges in quadratic mean, and (Xd is p-mixing and therefore
a-mixing with coefficients which decrease to zero at an exponential rate.

The existence of a density for Et is crucial as the well known following


example shows: consider the process
+00
Xt = LTi-1Et_j ,t E Z
j=O

where the Et'S are independent with common distribution l3 (1, ~).

Noting that X t has the uniform density over (0,1) and that
1.2. COUPLING 19

one deduces that X t is the fractional part of 2Xt +l, hence a(Xt ) C a(Xt+l) '
By iteration we get
a(Xt) C a(Xs, s 2: t + k)
thus
1 1
4' 2: ak 2: a(a(Xt),a(Xt )) = 4'
which proves that (X t ) is not a-mixing . •

In the Gaussian case there are special implications between the various
kinds of mixing : if (Xt ) is a Gaussian stationary cp-mixing process, then it is
m-dependent Le., for some m, a(Xs,s :::; t) and O'(Xs,s 2: t + k) are inde-
pendent for k > m . On the other hand we have Pk :::; 2;rrak for any Gaussian
process so that a-mixing and p-mixing are equivalent in this particular case.
However a Gaussian process may be a-mixing without being ,g-mixing.

The above results show that cp-mixing and ,g-mixing are often too restrictive
as far as applications are concerned. Further on we will principally use a and
p-mixing conditions and sometimes the 2-a-mixing condition:

(1.4)

This condition is weaker than strongly mixing.

1.2 Coupling
The use of coupling is fruitful for the study of weakly dependent random vari-
ables. The principle is to replace these by independent ones having respectively
the same distribution. The difference of behaviour between the two kinds of
variables is connected with the mixing coefficients of the dependent random
variables . We now state two important coupling results. For the proofs, which
are rather intricate, we refer to [B] and [BR1] .

LEMMA 1.1 (Berbee's lemma)


Let (X, Y) be a lRd x lR d' -valued random vector. Then there exists a lR d ' -
valued random vector y* such that

(1) Py • = Py and y* is independent of X ,

(2) P(Y* i Y) = ,g(O'(X) , a(Y)) .

It can be proved that" =" cannot be replaced by "<" , thus the result is optimal.
20 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES

LEMMA 1.2 (Bradley's lemma)


Let (X, Y) be a Rd x IR-valued random vector such that Y E LP(P) for some
p E [1, +ooj. Letc be a real number such that IIY +clip > 0, ande EjO, IIY +cllpj.
Then, there exists a random variable Y' such that

(1) Py. = Py and y* is independent of X,

(2) P(IY' - YI > e) ~ 11 (e-11IY + cll p)P/(2 p+l) [a(O'(X), 0'(Y))j2P/(2P+l) .

In the original statement of this lemma, 11 is replaced by 18 and c = 0 but the


proof is not different. We will see the usefulness of Lemma 1.2 in Section 1.4.

1.3 Inequalities for covariances and joint


densities
Essential to the study of estimator's quadratic error are covariance inequalities.
The following Rio's inequality is optimal up to a constant factor.

THEOREM 1.1 (Rio's inequality)


Let X and Y be two integrable real-valued random variables and let
Qx(u) = inf{t : P(IXI > t) ~ u} be the quantile function of IXI. Then if
QxQy is integrable over (0,1) we have

(1.5) ICov(X, Y)I ~ 2 fo2CX Qx(u)Qy(u)du


where a = a(O'(X), O'(Y)).

Proof

Putting X+ = sup(O, X) and X- = sup(O, -X) we get

(1.6) Cov(X, Y) = Cov(X+, y+) + Cov(X - , Y-)


-Cov(X-, Y+) - Cov(X+, Y-) .

An integration by parts shows that

Cov(X+, y+) =
Jr
R2
+
[P(X > u, Y > v) - P(X > u)P(Y > v)jdudv,

which implies

(1.7) Cov(X+,Y+) ~ r inf(a,P(X > u),P(Y > v))dudv.


jRt
1.3. INEQUALITIES FOR COVARIANCES 21

Now apply (1.6), (1.7) and the elementary inequality


(a !\ a !\ c) + (a !\ a !\ d) + (a !\ b !\ c) + (a !\ b !\ d) S; 2[(2a) !\ (a + b) !\ (c + d) J

to a = P(X > u) , b = P(-X > u) , c = P(Y > v) , d = P(-Y > v) to


obtain

ICov(X, y)1 S; 2 r inf(2a, P(IXI > u) , P(IYI > v)dudv


J'Ut 2+
= : T.

It remains to prove that

(1.8) T r201 Qx(u)Qy(u)du.


= 2 Jo
For that purpose consider a r.v . U with uniform distribution over [O, lJ and a
bivariate r.v. (Z, T) defined by
(Z,T) = (0,O)lu~201 + (Qx(U),Qy(U))lU<201 .
Thus
E(ZT) = Jo r201 Qx(u)Qy(u)du
and
(Z > u,T > v) = (U < 2a, U < P(IXI > u),U < P(IYI > v)),
hence
E(ZT) = J'Ut2 P(Z > u, T > v)dudv
+
= J'Ut2 inf(2a, P(IXI > u) , P(WI > v))dudv
+
which entails (1.8) and the proof is thus complete . •

Conversely it can be proved that if f..t is a symmetric probability distribu-


tion over IR and if a E ]0, iJ, there exists two r.v.'s X and Y with common
distribution f..t such that a(O"(X),O"(Y)) S; a and

(1.9) Cov(X, Y) ~
r201
2"1 Jo [Qx(u)]2du .
Proof may be found in [R11].

We now present two inequalities which are less general but more tractable.
COROLLARY 1.1 Let X and Y be two real valued r·andom variables such
1 1 1
that X E Lq(P), Y E Lr(p) where q > I, r> 1 and - + - = 1- -, then
q r p

(1.10)
22 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES

(Davydov's inequality).

In particular if X E UJO(P), Y E UXl(P) then

(1.11) ICov(X, Y) I :::; 411X lloo IIYlioo a


(Billingsley's inequality).
Proof

Suppose first that q and r are finite. Then Markov's inequality yields

P (IX I > IIXllq)


u 1/ q <u
- ,
0 < u <_ 1

which implies
Q (u) < IIXllq O<u:::;1.
x - u 1/ q
Now, using (1.5) we obtain

ICov(X, Y)I < 2 r2Q IIXllq IIYllr du


- Jo u 1/ q u 1/ r

hence (1.10).

If q = r = +00 we clearly have


Qx(u) :::; Qx(O) = IIXlloo
thus
212Q Qx(u)Qy(u)du :::; 4aIIXlloollYlloo .•
Note that (1.10) is valid if q = +00 and r > 1. If q = 1 or r = 1 the resulting
inequality becomes trivial.

We now consider the local measure of dependence defined by

(1.12) g(X ,Y)(x,y) = f(x,Y)(x , y) - fx(x)fy(y); X,y E ]Rd,

where (X, Y) is a ]Rd x ]Rd-valued random vector and where fz denotes the
density of the random vector Z with respect to Lebesgue measure.
The following statement connects 9 = g(X ,Y ) with a = a(O'(X), O'(Y)).
1.3. INEQUALITIES FOR COVARIANCES 23

LEMMA 1.3 If (X, Y) has an absolutely continuous distribution with respect


to Lebesgue measure on JR2d then

(1.l3)
If in addition g satisfies the Lipschitz's condition
(1.l4) Ig(x',y') - g(x,y)1 ::; f(llx' - xl12 + lIy' _ YI12)1/2,
x,x',y,y' E JRd, for some constant f, then there exists a constant ")'(d, f) such
that
(1.l5) IIglioo ::; ,(d, f)a: 1/(2d+ l).
Furthermore one may choose ")'(d, f) = Vi 2 +f.j2 where lid denotes the volume
of the unit ball in JRd.

Proof

By the definition of a: it is clear that


a:::; sup lP(x ,y)(D) - (Px 1)9 Py)(D)1 =: S.
DE.13~d

Now using Scheffe's theorem (cf. [BIll p. 224) we obtain


1
S = "2 l1g111
hence (1.l3).
On the other hand, set
B(x,c) = {x': IIx' -xII::; c} ,c > 0, x E JRd .

Then, for any (x , y) E JR2d, we have


a: :::: IP(X E B(x,c), Y E B(y, c)) - P(X E B(x , c))P(Y E B(y,c))1

:::: IfB(X,E)XB(Y ,E) g(u, v)dudvl =: I.


Now by the mean value property we get
I= lI}c 2d lg(x',y')1
for some (x',y') in B(x,c) x B(y,c).

On the other hand (1.14) yields


Ig(x,y)1 :::; Ig(x',y')1 + t'cV2
hence
Ig(x, y)1 :::; v::
dC
2d + fcV2
choosing c = a: 1/(2d+l) we obtain (1.15) . •
24 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES

1.4 Exponential type inequalities


We now turn to the study of large deviations for partial sums of strongly mix-
ing processes.

Let us begin with exponential type inequalities for independent random


variables.
THEOREM 1.2 Let Xl, ... , Xn be independent zero-mean real-valued ran-
n
dam variables and let Sn = L Xi. The following inequalities hold
i=l

(1) If ai ::; Xi ::; bi ; i = 1, ... , n where aI, bl , ... , an, bn are constant then

(1.16) P(fS.1 ~ 2~p


t) S ( , t >0

(Hoeffding's inequality).
(2) If there exists c > 0 such that
(1.17)
i = 1, ... , n ; k = 3,4, ...
(Cramer's conditions) then

(1.18) P(ISnl ~ t) ::; 2exp (- n


4LExl +2ct
t
2
) , t>0

i=l

(Bernstein's inequality).
Proof
(1) First, let X be a real-valued zero-mean random variable such that
a ::; X ::; b. We claim that

(1.19) E(exp)"X)::;exp (
),,2(b -
8
a?) ,),,>0.

In order to prove (1.19) we consider the convexity inequality


b- x x - a Ab
e AX < __ e Aa
- b-a
+- -e
b-a' -
a < x < b.
-
1.4. EXPONENTIAL TYPE INEQUALITIES 25

Replacing x by X and taking the expectation, it follows that

E(e AX ) ~ _b_e,Xa _ _ a_e'xb =: 'P.


b-a b-a
Thus
'P = [1 - p + pe'x(b-a)Je-p'x(b- ,~)
=: exp( 1/;(u))
a
where p = --b ' u = ),(b - a), 1/;(u) = -pu + Log(1- p + peU ).
a-
Now it is easy to check that 1/;(0) = 1/;'(0) = 0 and
"( ) p(1 - p)e- U 1
1/; u = (p + (1 _ p)e-uj2 ~ 4 '
consequently the Taylor's formula leads to
u2 ),2(b - a)2
1/;(u)~8= 8
hence (1.19) . •

We are now in a position to establish (1.16).


The main tool is the famous "Bernstein's trick" : since
(1.20)
we have
P(Sn ~ t) ~ e-,XtE(e'xSn)
n
~ e-,Xt IT E(e AXi ).
i=1
Now applying (1.19) to XI, ... ,Xn we obtain

4t
Choosing)' = n it follows that
2)bi - ai)2
i=1

P(Sn ~ t) ~ exp - n 2t2 ) =: A.


(
Z)b i - ai?
;=1
26 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES

Similarly an application of (1.19) to the random variables -Xi shows


that
P(Sn :::; -t) = P( -Sn 2: t) :::; A
and the proof is complete since

(1.21) P(ISnl 2: t) = P(Sn 2: t) + P(Sn :::; -t) . •

1
(2) For 0 < A < - according to Cramer's conditions (1.17) we have
c

(1.22)

Using (1.22) and the dominated convergence theorem we can deduce that

Using again the Bernstein's trick we obtain

IT E(e
n
P(Sn 2: t) :::; e-)..t AX ,)
;=1

<
- e
->.t
exp
(A2 1~EX;)
_ AC .

t
Now the choice A = n leads to
2LEX; +ct
;=1

2
P(Sn 2: t) :s; exp (- n t )

4 LEX; +2ct
;=1

and it suffices to use (1.21) to get the desired result . •


1.4. EXPONENTIAL TYPE INEQUALITIES 27

It should be noticed that these inequalities are optimal up to a constant


in the exponent as the following Kolmogorov's converse exponential inequality
shows: if conditions in Theorem 1.2 (1) hold with b; = --a; = b, i = 1, ... ,n,
then, for any, > 0 there exist kb) > 0 and c:b) > 0 such that if

n ) 1/2 ( n )
t 2 kb) ( ~ EX; and tb :S c:(r) ~ EX;

it can be inferred that

(1.23) P(Sn 2 t) 2 exp (- ~ +,


2LEX;
t2)
;=1

We refer to [STj for a proof of this inequality.

On the other hand it can be seen that Cramer's conditions (1.17) are equiv-
alent to existence of E (e XX i ) for some, > O. We refer to [A - Zj for a
discussion.
We now turn to the study of the dependent case. For any real discrete time
process (Xt , t E Z) we define the strongly mixing coefficients as

(1.24) a(k) = supa(o-(X.,s:S t),o-(X.,s 2 t+k)) ;1. = 1,2, . .. .


tEZ

Note that this scheme applies to a finite number of random variables since it
is always possible to complete a sequence by adding an infinite number of de-
generate random variables.

The following theorem provides inequalities for bounded stochastic pro-


cesses.

THEOREM 1.3 Let (X t , t E Z) be a zero-mean real-valued process such that


sup IIXt ll oo :S b. Then
l:'St:'Sn

(1) For each integer q E [1, %] and each c: > 0

(1.25) P(ISnl > nc:):S 4exp (-;:2q)


+22 (1 + ~) qa ([;q])'
1/2
28 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES

(2) For each integer q E [1, ~] and each e: > 0

(1.26) P(ISnl > ne:)::; 4exp (-8V~~q)q)


+22 (1 + :br /2
qa ([~]),
where
2 be:
v 2(q) = p2a2(q) +2
with p = 2n and a 2 (q) = max E (([jpJ + 1 - jp)X[jpl+1 + X[jp]+2 +
q O$J$2q-1
... + X[(j+l)P] + ((j + l)p - [(j + l)pJ) X[(H1)p+1](
Proof
(1) Consider the auxiliary continuous time process Yt = X[t+1], t E R. We
clearly have Sn = Yudu. Ion
Let us now define "blocks" as follows

Vq = J2(q-l)p
(2 -l)PQ
Yu du
n
where p = - .
2q

Using recursively Bradley's lemma 1.2 we may define independent r.v.'s


W 1 , ... , Wq such that P Wj = PVj
, j = 1, ... ,q and

(1.27) P(IWj _ "}I >~) ::; 11 (",,} ~ ell 00 ) 1/2 a([pJ)

Here e = bbp and ~ = min (~;, (0 - l)bP) for some b > 1 which will be
specified below.

Note that, for each j ,

II"} + ell oo ~ e -IIVjlloo ~ (0 - l)bp > 0


so that 0 < ~ :S II"} + cll oo as required in Lemma 1.2 .
1.4. EXPONENTIAL TYPE INEQUALITIES 29

Now, according to the choice of c and ~, (1.27) may be written

P(IWj - Vjl >~) <


-
11 ( min ((ne/(4q))
(8 + l)b
P
, (8 - l)bp)
) 1/2
a([pJ)

: ; 11 (max (88 +-11, 4qbp(8ne + 1))) 1/2 a([pJ).

If 8 = 1 + 2be then

P(IWj - Vjl >~) ::; 11 (2 + ;b)


1/2 (2b)

1/2
a([pJ)

thus

(1.28) P(IWj - Vjl > ~) ::; 11 ( 1+ €4b) 1/2


a([pJ) .

On the other hand we may apply Hoeffding's inequality (1.16) to the


Wj's. We then obtain

(1.29) P(!~
L.
W.!> ne)
4
::; 2exp (_..!::S2 ).
J 16pb2
1

We are now in a position to conclude.

Clearly

(1.30) P(ISnl > ne) ::;

and

{!t Vj! > ~e} {! t C Vj ! > ~e ;IVj - Wj I ::; C j = 1, ... ,q }

u {y IVj - Wjl > ~} ,


30 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES

hence

P (I~~I > ~e) ~ P (I~Wjl > ~e -q~) + ~P(lVj - wjl >~)


~ P (I~Wjl > :e) + ~P(I~ _ wjl > ~).
Consequently (1.2S) and (1.29) give the upper bound

P (I~VjI > ~e) ~ 2exp (- ::2 q) + 11 (1 + ~y/2 qa(w]) ,


and the same bound is valid for the Vi's. According to (1.30) , inequality
(1.25) is thus established . •
(2) The proof of (1.26) is similar except that, here, we use Bernstein's in-
equality (LIS) instead of the Hoeffding's one.
So we have

Now, since PWj = P Vj we have

and

E (Ij~+1)P YudU) 2 = E (([jpJ + 1 - jp )X[jPI+l + X[jpI+2 + .. .


+X(j+1)p] + ((j + l)p - [(j + l)p]) X[(j+1)p]+d 2 .

Taking into account the above overestimate and using (1.31) we obtain
after some easy calculations

P (l~w·1
L > ne)
4 ~ 2exp (-~)
J Sv (q) 2
1

which entails (1.26) . •


1.4. EXPONENTIAL TYPE INEQUALITIES 31

Note that by using (1.ll) it is easy to see that

(1.32)

We would like to mention that although (1.26) is sharper than (1.25) when
E and a(.) are small enough, however (1.25) is more tractable in some practical
situations.

The next theorem is devoted to the general case where the Xt's are not
necessarily bounded but satisfy Cramer's conditions.
THEOREM 1.4 Let (Xt, t E Z) be a zero-mean real-valued process.
Suppose that there exists c > 0 such that

(1.33) EIXt/k ~ ck - 2 k!EX; < +00 ; t = 1, ... , n ; k = 3,4, ...


then for each n 2: 2, each integer q E [1, ~], each E > 0 and each k 2: 3
(1.34) P(ISnl > nE) ~

where
2
al
n
= 2-
q
+ 2 ( 1 + 25 m 2E + 5CE ),v.nth
. 2
m2 =
2
max EXt,
l~t~n
2

and
a2(k) = lln 1 + -
(5mk~) , with mk = max IIXtll k .
E l~t~n

Proof

Let q and r be integers such that

1 ~ qr ~ n < (q + l)r.
Consider the partial sums

Zl Xl + Xr + l + + X(q-l)r+l
Z2 X2 + X r +2 + + X(q-l)r+2

Zr Xr + X2r + + Xqr
~ X qr+ 1 + + + Xn ifqr<n
0 otherwise.
32 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES

We clearly have

Now, in order to get an upper bound for P 4nE) we apply recur-


(IZII> 5r
sively Bradley's lemma 1.2 : let k be an integer 2: 2, 0 a real> 1 and ~ such
that

o< ~ $ (0 -1)mk $ IIXU-l)r+l + omkllk $ (0 + 1)mk j j = 1, .. . ,q .


We may and do suppose that mk is strictly positive, otherwise the inequality
should be trivial.

Then, there exist independent r.v.'s Yj j 1, . . . ,q such that PYj


PX U - 1 )r+l and
(1.36) P(llj - X(j-l) r +ll > ~) S
k

11 ( "X(j-l)T+l +omkllk)2k"+T (a (r )) 2k-IT.


2k

~
Choosing
2E
8 = 1 + - - and ~ = -
2E Yields
.
5mk 5

(1.37) P (Ilj - X(j-I)r+ll > 2;) S 11 (1 + 5;k) trn (a(r))d:h.


Now elementary computations give

(1.38) P (IZII > ~:E) $ P (IY1+ ... + Yql > 2~E)


+ t
J=1
P (Wj - X(j-l)r+ll > 2;) .
Applying Bernstein's inequality (1.18) to the lj's we obtain

(1.39) P (WI + .. . +Yql > 2;E) S 2exp ( - 25mr: 5a) .


Thus combining (1.37) , (1.38) and (1.39) we get an upper bound for
P (I Z II > 4~E) . Clearly the same bound remains valid for Z2, . . . ,Zr .
1.5. SOME LIMIT THEOREMS FOR STRONGLY MIXING 33

The proof will be complete if we exhibit a suitable overestimate for P (18..1 > ~t:).
For that purpose we write

< exp( ->.nt:/5)E(e>.6) , >. > 0


>.k )
1 + ~ kTE I8..lk .
00
::; exp( ->.nt:/5) (

Now Minkowski's inequality and (1.33) entail

Hence for a suitable >.

thus choosing>. = 8/(n - qr)c, 0 < 8 < 1 we get

P ( 8.. > -nt:)


5
::; 8c 1 - 8
m~)
( 1 + -2 -
2
(81'5c
- exp - -- -n) . r

Using the same method for -8.. we obtain


P ( 18..1>-
nt:)
5
::;21+
( 2
_ 8c 1-8
_ 2
m 2_ ) exp --q .
2
(8t:)
5c

Choosing 8 = a/(5m~ + ce) yields

P (18..1 > ~t:) ::; 2 (1 + 5(5mf+ a)) exp (- 5(5mr+ ce) q) .


Collecting the above bounds we obtain the claimed result according to (1.35) .

1.5 Some limit theorems for strongly mixing
processes
It is well known that the laws of large numbers hold for stochastic processes
provided classical ergodicity conditions (cf. [DO]). However the Statistician
needs some convergence rate in order to convince himself of applicability of the
34 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES

theoretical results.

The present section is devoted to the study of convergence rates under


strongly mixing conditions. To this aim we use the inequalities in the previous
sections.
We first state a result concerning the weak law of large numbers .
THEOREM 1.5 Let (Xt , t E Z) be a zero-mean real-valued stationary process
such that for some r > 2

and
La(k)l-~ < +00
k;:::l

then the series L Cov(Xo , Xk) is absolutely convergent, has a nonnegative


kEZ
sum (72 and
Sn 2
(1.40) nVar- -7 (7 •
n
Proof

First we study the series L Cov(Xo, Xk) . By using (1.10) with q = rand
kEZ
1 2
-=l--weget
p r

r _(2a(k))1-2/r(EIXo
ICOV(XO,Xk)1 :::; 2_
r-2
n2/ r

which proves the absolute convergence of the series since L a(k)1-2/r < + 00.
k;:::l
Now clearly
Sn = n- 1
nVar-
n
L Cov(Xs,Xt ) ,
O$s ,t$n-l

(Xd being stationary it follows that

S
nVar nn = L
n- l ( Ikl) CoV(XO,Xk) .
1 - --:;
k=-(n-l)

Thus an application of the Lebesgue dominated convergence theorem entails

lim nVar Sn = (]"2 > 0


n-+oo n -
1.5. SOME LIMIT THEOREMS FOR STRONGLY MIXING 35

and the theorem is thus established. •

The following proposition provides pointwise results.


THEOREM 1.6 Let (Xt, t E Z) be a zero-mean real-valtted process satisfying
Cramer's conditions (1. 33}. We have the following
(I) If (Xt ) is m-dependent, then
Bn
(1.41) -;. 0 a.s ..
ynLog2nLogn

(2) If (X t ) is a-mixing with a(k) :::; apk, a> 0,0 < p < 1
then
(1.42) Bn -;. 0 a.s ..
y'nLog 2nLogn
Proof

. /Log 2nLogn
(1) Using (1.34) for n > m, e =V n TJ, TJ > 0 and q = [njm + 1]
we get

p ( IBnl > TJ)


JnLog2nLogn

:::; ( 4(m + 1) + 2 ( 1 + 0 ( LOg2n.LOgn)))


n exp( -dLog 2 nLogn)

where d is some positive constant. Therefore

L
n>m
P ( IBnl
JnLog 2nLogn
> TJ) < +00 , TJ > 0

and the Borel Cantelli lemma (cf. [BI 2]) yields (1.41) . •
. . . Log 2nLogn
(2) Usmg agam (1.34) wIth e = y'n TJ, TJ > 0, k = 2 and

q = [L nL
og2n ogn
+ 1] leads to

p ( ISnl > TJ) =


y'nLog 2 nLogn
O(Log 2 nLogn exp( -d'Log 2nLogn)) + O( n exp( -d"Log 2nLogn))
where d' and d" are some positive constant. Hence (1.42) using again
Borel-Cantelli lemma. •
36 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES

Note that (1.41) and (1.42) are nearly optimal since the law of the iterated
logarithm implies that Sn f> 0 a.s. even for independent summands.
v'nLog 2 n
We now give a central limit theorem for strongly mixing processes.
THEOREM 1.7 Suppose that (Xt , t E /Z) is a zero-mean real-valued strictly
stationary process such that for some I > 2 and some b > 0

and
(1.43)

where a is a positive constant and {3 > ~2'


1-
+00
then, if (]'2 = L Cov(Xo, Xk) > 0 we have
k=-oo

(1.44)

Proof

First (]'2 does exist by Theorem 1.5. Now consider the blocks

V{ = X p +1 + ... + X p +q

v; = Xrp+(r-l)q+l + ... + Xr(p+q)


where
rep + q) ::; n < rep + q + 1)
and l' "" Logn , P"" _n_ _ n l / 4 , q "" nl/4.
Logn
Using Lemma 1.2 we construct independent random variables W l ,.··, Wr
such that PWj = PVj and

(1.45) P(IWj - Vjl > 0 ::; 11 (11 v:.J +~ cll 'Y ) ~ a(q) --.::L..
2.,+1 ;

. E:(],..fii
J = 1, ... , r ; where ~ = -- (E: > 0) , c = P (IIXolI'Y ( > 1).
r

Note that for n large enough we have


1.5. SOME LIMIT THEOREMS FOR STRONGLY MIXING 37

.
smce P rv --
n
and -
fo rv
fo
-L-' so that (1.45)
. .
IS valId .
Logn r ogn
Consequently setting

~ _ VI + ... + Vr _ WI + ... + Wr
n - ufo 17fo
we obtain

thus
....:I..-

(1.46) P(I~nl > c) ::; llr (


IIVI +
~'
ell ) 2-,+1
a(q)~
2
=: m n·

Now let us prove the asymptotic normality of


WI + ...r;;+ Wr . First using
17" n
(1.43) and combinatorial arguments it can be checked that for 2 < "(' < "( and
"(' enough close to 2

EIWJ'I" <_ TiP,'/2 .


,.J -- 1, ... , r

where Ti is a positive constant. We refer to [YO] for the details.

On the other hand, using stationarity and (1.40) we get

We are now in a position to show that Liapounov's condition (see [BI2])


holds. Actually

and rI-;f -+ 0 since "(' > 2 . Consequently

WI + ... + Wr = (rp) 1/2 WI + ... + Wr ~N rv N(O, 1).


ufo n 17.jfP

Vl+",+Vr
Now in order to obtain the asymptotic normality of fo it suffices
17 n
to prove that bon converges to zero in probability ([BIll). To this aim we use
38 CHAPTER 1. INEQUALITIES FOR MIXING PROCESSES

(1.46), we have

so that (1.43) easily yields


lim mn = O.
n~oo

Finally consider the identity


r r

LV} LV;
Sn = _1_ _ + _1_ _ + Rn
ufo ufo ufo ufo
where
X r (p+q)+l + .. . + Xn if r(p + q) < n
o otherwise
r

LV;
It remains to show that 1 r;;; and ~ converge to zero in probability.
Uyn Uyn

First we clearly have


r

LV;
_1_ ~ N ""N(O, l)
U ,;qr
therefore
r r

LV; LV;
_1_ _ = fqT_1__ ~ 0
ufo V-:;;: u,;qr
. fqT rr;;g;;
smce V-:;;: "" V~ .
Second, using Tchebychev's inequality we get

Rn ~O
ufo
Collecting the above results we obtain (1.44) . •
1.5. SOME LIMIT THEOREMS FOR STRONGLY MIXING 39

Notice that a functional central limit theorem may be shown when assump-
tions of Theorem 1.7 hold.

Notes
The strong mixing condition has been introduced by ROSENBLATT ([R01]) in 1956.
The basic properties of strong mixing conditions are studied by BRADLEY in [BR2J
but the most complete reference should be the book by OOUKHAN ([DKJ 1994).
The coupling lemma's are from [BJ and [BR1J with a slight improvement due to
RHOMARI ([RHJ 1994). The optimal RIO's inequality is in ([RIlJ 1993).

The second part of Lemma 1.3 is given in [B08J. Concerning the exponential
inequalities (1.16) and (1.18) some improvements may be found in the BENNETT's
paper [BEJ. The original forms and the proof's method of Theorems 1.3 and 1.4 are
obtained in [B06J. The present statement is an amelioration using some ideas of
RHOMARI. Related inequalities may be found in [OK1] and [C] .

Theorem 1.5 is a result of DAVY DOV ([0]), Theorem 1.6 is an easy consequence of
the exponential inequalities and Theorem 1.7 was obtained by IBRAGIMOV in [IB],
here the proof is simpler than the original one since we use the powerful Bradley's
lemma.
Chapter 2

Density estimation for


discrete time processes

This chapter deals with nonparametric density estimation for sequences of cor-
related random variables.

We consider here the popular convolution kernel estimate. That natural


and simple method has well resisted to the other suggested estimates such as
Projection estimates (in particular wavelets), Nearest Neighbour estimates, Re-
cursive estimates and, more generally, estimates based on 6- sequences.

We shall see that, under mild conditions, it is possible to obtain the same
convergence rates and the same asymptotic distribution as in the LLd. case.

The asymptotic behaviour of the kernel estimate in some non regular cases
(errors in variables, chaotic data, singular distribution) is studied at the end of
the chapter.

Let us define a d-dimensional kernel as an application K : lRd --> lR where


K is a bounded symmetric density with respect to Lebesgue measure, such that

lim liulldK(u) = O
Ilull ...... oo

and
r
l)Rd
II u 112 K(u)du < +00

where II . II denotes any norm over lR d .

41
42 CHAPTER 2. DENSITY ESTIMATION

Given a kernel K and a smoothing parameter h we set

(2.1)

Typical examples of kernels are :


• The naive kernel K = 1[ -2, ' +2,]d
• The normal kernel K(u) = (27r)-d/2 exp( - II u 112 /2), u E JRd

• The Epanechnikov kernel K(u) = (~v'5)d g(1-1) 1[-v'S,+v'SJ(Ui),

u = (UI, ... ,Ud) E ]Rd.

2.1 Density estimation


Let (X t , t E IE) be a ]Rd-valued stochastic process. Suppose that the Xt'S have
a common density I and X!, ... , Xn are observed.

An estimate for I cannot be constructed directly from the empirical measure


1 n
(2.2) j.tn = - LO(X')
n t=l

since j.tn is not absolutely continuous with respect to Lebesgue measure over
JRd. So in order to obtain such an estimate it is necessary to transform j.tn
in a suitable way. The kernel method consists in a regularization of j.tn by
convolution with a smoothed kernel, leading to the kernel estimator
(2.3)
which can be written

(2.4) In(x) =
n
h1d ~
L... K
n t=l
(x- Xt)
-h-
n

Practical considerations leading to In are discussed in S.2.

Clearly the choice of (h n) is crucial for the efficiency of In. In fact , it


appears that, under some general assumptions, the conditions
(2.5) hn->O(+) ,nh~->+oo (n->+oo)

are necessary and sufficient for the consistency of In. This will be clarified
below; from now on , we do suppose that (2.5) is satisfied unless otherwise
stated.
2.2. OPTIMAL ASYMPTOTIC QUADRATIC ERROR 43

2.2 Optimal asymptotic quadratic error


Let us begin the study of (fn) by evaluating the asymptotic quadratic error.
We will show that, under mild conditions, this error turns out to be the same
as in the Li .d. case.

We need some notations and assumptions : we suppose that for each couple
(t, t'), toft' the random vector (Xt, X t ,) has a d ensity and we set

gt ,t' = f(x"x,,) - f ® f , toft'.

Furthermore we suppose that gt ,t' satisfies one of these two hypothesis:

• H1 . 8p = sUP!t'-tl2:1 II 9t ,t' IIp< +00 for some p E]2 , +00] .

• H2 . Igt,dz') - gt,dz)1 ~ f II z' - z II ; z, z ' E R2d for some constant f.


On the other hand let us denote by C2 ,d(b) the space of twice continuously
differentiable real valued functions f, defined on R d , and such that II f lloo~ b
and II f(2) lloo~ b where f(2) denotes any partial derivative of order 2 for f.
We suppose that the density f belongs to C2 ,d(b).

Finally (X t ) is supposed to be 2-et-mixing (see 1.4) and such that

(2 .6)

for some positive constants 'Y and (3.

Now we state the result .

THEOREM 2.1 If f(x) > 0, ifH 1 (resp. H 2 ) holds and if {3 > 2 P - 1 (resp.
p-2
2d+ 1
(3 > -d--) then the choice h n = cnn-1/(dH ) where en - -t c > 0 leads to
+1
(2 .7) n 4 /(dH) E[fn(x) - f(X)]2 ---t C(c, K, f) > 0

where

c
C(c, K,f)=4
4
(
L
l:5i ,j:5d
8 f ' (x)
8x8x
'
2

J
J uiujK(u)du ) +7
f J
:I (x) 2
K.
44 CHAPTER 2. DENSITY ESTIMATION

Proof
The following decomposition is valid:
1
E(Jn(x) - f(x)? (Efn(x) - f(x))2 +; VarKhn (x - Xl)

+
1
n(n-l) L COV(Khn(X-Xt),Khn(X-Xt'))
I:SW-tl:Sn-1
: B~(x) + VJn(X) + Cn .
(2.8)
We treat each term separately. First we consider the bias :

Bn(x) JlRd Kh n (x - u)[J(u) - f(x)]du


JlRd K(v)[f(x - hnv) - f(x)]dv .
By using Taylor's formula and the symmetry of K we get

where 0 < () < 1. Thus a simple application of Lebesgue dominated convergence


theorem gives

(2.9)

Now Vfn(x) is nothing else but the variance of fn in the i.i.d. case. It can be
written

then writing Rd = {u :1I u II::; 7J} u {u :11 u II> 17} where 7J is small enough it is
easy to infer that

(2 .10) h~ JKL (x - u)f(u)du -> f(x) J K2

J
and
(2 .11 ) Kh n (x - u)f(u)du -> f(x) .

(In fact (2.10) and (2 .11) are two forms of a famous Bochner's lemma (see
[PAD , [C-L] and Chapter 4).
2.2. OPTIMAL ASYMPTOTIC QUADRATIC ERROR 45

Hence (2.10) and (2.11) imply

(2.12) nh~Vfn{x) -+ f{x) J K2.

The covariance term C n remains to be studied. First note that

Cov (Khn{X - Xt),Khn{X - X t')) = J Khn{X - u)Kh.{x - v)9t,du,v)dudv


= :Ct,t' .
Thus, if HI holds, Holder inequality yields

(2.13)

1 1
where -
p
+ -q = 1.
On the other hand Billingsley's inequality (1.11) entails

(2.14)

thus
ICt,t'l :S I'n(lt' - tl)
where

Consequently

2 n-l
ICnl :S -
n t=1
L
I'n{t)

which implies

where Un ~ h;.2d/ q /3 •

Now elementary calculations give

(2.15) nh~ICnl = 0(1).

Finally using the decomposition (2.8), the asymptotic results (2.9), (2.12),
(2.15) and the fact that h n ~ cn- 1/(4+d) we obtain the claim (2.7).
46 CHAPTER 2. DENSITY ESTIMATION

When H2 holds the proof is similar. The only difference lies in the overes-
timation of Ct,t' : using (2.14) in Lemma 1.3 we get
1/(2d+l}
(2.16) ICt,t,1 ~ i(d,£) [oP}(It' - tl) ]
and consequently

ICnl ~ n2 [vn~ i(d, £) [oP} (It' - tl)] 1/(2d+1}


+ h;;2d t~ 4 II K II~ iC{3
]

. -(2d+1}/i3 d . 2d + I
then the chOlce Vn c::e h n leads to nhnlCnl = 0(1) smce (3 > -d-- ' •
+1

In order to obtain a uniform result let us introduce the family


X =X(£,b,i,{3) oflRd-valued stochastic processes (Xt,t E Z) satisfying H2
for a fixed £, such that f E C2 ,d(b) and satisfying (2.6) with the same i and
2d+ 1
{3 > -d--' Then we clearly have
+1
COROLLARY 2.1
SUPXEX lim n..... oo n 4 /(d+4} SUPxEIRd E(fn(x) - f(x))2 < +00 .

Finally it can be proved that n- 4 /(d+4} is the best attainable rate in a


minimax sense. We shall establish this kind of result in a more general
context in Chapter 4 (see Theorem 4.3) .

2.3 Uniform almost sure convergence


The quadratic error is a useful measure of the accuracy of a density estimate.
However it is not completely satisfactory since it does not provide information
concerning the shape of the graph of f whereas the similarity between the
graph of fn and that of f is crucial for the user.

A good measure of this similarity should be the uniform distance between


fn and f· In the current section we study the magnitude of this distance.

Let us introduce the notion of " geometrically strongly mixing" (GSM) pro-
cess. We will say that (Xt ) is GSM if there exist Co > 0 and p E [0, I[ such
that
(2.17) a(k) ~ Copk k 2: 1.
Note that usual linear processes are GSM (see [DKJ).

The following lemma deals with simple almost sure convergence of fn for a
GSM process.
2.3. UNIFORM ALMOST SURE CONVERGENCE 47

LEMMA 2.1 Let (Xt • t E Z) be a strictly stationary GSM lI~d-valued process


and let f be the density of X t .
nhd
1) If f is continuous at x and if (Log~)2 -+ +00 then

(2.18) fn(x) -+ f(x) a.s.

--L.
LOgn)
2) If f E C2,d(b) for some b and if hn = en ( ----;- d+4
where en -+ c > 0,
then for all x E ]Rd and all integer k

(2.19) -L-1
ogkn
(n);rh
-L-
ogn
Un(x) - f(x)) -+ 0 a.s ..

Proof
1) The continuity of f at x and (2.11) yield
Efn(x) -+ f(x) ,

thus it suffices to prove that

fn(x) - Efn(x) -+ 0 a.s ..

For that purpose we apply inequality (1.26) to the random variables

Yin = Kh n (x - Xl) - EKhn (X - Xl) .1::; t ::; n.

Note that IYinl ::;11 K 1100 h:;;.d an choose q = qn = [foh~d/2] . Then by


using the GSM assumption (2 .17) and Billingsley's
inequality (1.11) it is easy to infer that

therefore for all c > 0 we have for n large enough

Hence

P(lfn(x) - Efn(x)1
(2.20)
48 CHAPTER 2. DENSITY ESTIMATION

which implies

P(/fn(x) - Efn(x)/ > e) :S /3exp (_'Vnh~2)

where /3 = /3(1':, K, d) and, = ,(I':, K, d) are strictly positive.

nh d
Now setting Un = (Log~)2 we obtain the bound

(2.21) P{/fn{x) - Efn{x)/ > e):S /3r.:-


n1'yU n

thus
L P{/fn{x) - Efn{x)/ > 1':) < +00 , e > 0
n

and the Borel-Cantelli lemma entails the desired result . •


2) Concerning the bias we have established the following expression in the
proof of Theorem 2.1

Efn{x) - f{x) = 2.
h
2
2
JL
l<i '<d
_ ,J_
2

Xi
8f (x - Bhnv)ViVjK(v)dv
-8 8
Xj

where B = B(f, x, hn , v) E ]0,1[.

Then since f E C2 ,d{b) we have


(2.22) /Efn{x) - f{x) / :s ah~
where a = a(d, K, b) does not depend on n.

Now set
LOgn)*
en = Logkn ( -n- ,

we get
ac2
l':~lIEfn(x) - f(x)1 ::; - L
n
ogk n
and the bound vanishes at infinity. Thus we only need to show that

To this aim we again apply inequality (1.26) with


2.3. UNIFORM ALMOST SURE CONVERGENCE 49

Substituting in (1.26) and symplifying we obtain for n large enough

(2.23)
where

and where the Ci'S are strictly positive constants.

Therefore for all 1] °


> we have

hence (2 .19) by using Borel-Cantelli lemma. •

Now we can state a uniform result on an increasing sequence of compact


sets.
THEOREM 2.2 Let (X t , t E Z) be a strictly stationary GSM JRd-valued pro-
cess and let f be the density of X t .
Let f n be the kernel estimate associated with a kernel K 'which satisfies a Lip-
schitz condition.
hd
1) If f is uniformly continuous and if (L:g~)2 --> +00 then, for all 'Y > 0,

(2 .24) sup Ifn(x) - f(x)1


IIxll:5n'
--> ° a.s ..

1.

2) If f E C2 ,d(b) for some b and if h n = en -n


( LOgn) ",'4 where Cn --> C >0
then, for all 'Y > 0 and all integer k

(2 .25) -L-1
ogk n
(n)*
-L
ogn
sup Ifn(x) - f(x)1
IIxll:5n'
-> 0 a.s . .

Proof
1) f being a uniformly continuous integrable function, it is therefore bounded.
Thus it is easy to see that for all 6 > 0

SUPxElRd IEfn(x) - f(x)1 ::; SUPxElRd SUPllyll:5 6 If(x - y) - f(x)1


(2.26)
+ 2 II f 1100 hZII>6h;: 1 K(z)dz
50 CHAPTER 2. DENSITY ESTIMATION

Choosing 15- 1 and n large enough that bound can be made arbitrarily
small, hence
sup IEfn(x) - f(x)l-> 0
xElRd

Now we have to show that

sup Ifn(x) - Efn(x)l-> 0 a.s., 'Y> 0


Ilxll$n"Y

where for convenience we take II .II as the sup norm on ]Rd, defined by
II (Xl"'" Xd) 11= SUP1<i<d IXil· In the sequel we may and do suppose
that 'Y > 1. - -

The idea of the proof is to consider a covering of

Bn = {x :11 x II~ n'Y}

by I/~ closed hypercubes B jn = {X :11 x - Xjn II~ ::}, 1 ~ j ~ I/~ such


that Bjn n Bj'n = 0, 1 ~ j,j' ~ I/~, j -I- j' and to write

sup Ifn(x) - Efn(x) I = sup fl jn =: fln


IIxll$n"Y l$j$v~

where f'ljn = SUPxEB j n Ifn(x) - Efn(x)l·

Now by assumption there exists £ > 0 such that

IK(u') - K(u)1 ~ £11 u' - u I ; u, u' E IRd


Consequently

(2.27)

and similarly

(2.28)
2.3. UNIFORM ALMOST SURE CONVERGENCE 51

it follows that
~n~~~+-L 1
og2 n
where ~~ = SUPl$j$v~ Ifn(xjn) - Efn(xjn)l·
Now, for all c > 0

v~
P(L1~ > c) ~ L P(lfn(xjn) - Efn(xjn)1 > c)
j=l

and noting that (2.21) does not depend on x we get


P(L1~ > c) ~ l/~ n'Yvf3~ Un

where Un --+ +00, hence EP(L1~ > c) < +00 which implies L1~ --+ 0 a s. .
which in turn implies L1n --+ 0 a.s. and the proof of (2.24) is complete . •

2) Since f E C2 ,d(b) we may apply (2 .22) and setting again


2
LOgn) T+4
Cn = Logkn ( -n-
we obtain

which shows the uniform convergence of the bias.

It remains to be shown that

c;;-l sup Ifn(x) - Efn{x)1 ---+ 0 a.s ..


Iixli$n'Y

For that purpose we again consider a covering of Bn by hypercubes B jn ,


1 ~ j ~ l/~ but this time we choose l/n = [n
'Y+;:rh ] .

Then by using (2.27) and (2.28) we obtain

-lA < -lA' 2R 1


cn Un - cn Un + -L--
ogkn (Logn )3/fd+4l
.

Now (2.23) entails

Clearly El/~vn < +00 and (2 .25) follows . •


52 CHAPTER 2. DENSITY ESTIMATION

Note that the obtained rate in (2.25) is nearly optimal; in fact by applying a
theorem of STUTE (see [SED for Li.d. Xt's we obtain for all A > 0 and f > 0

n )
( Logn
2
<I+4
II:~~A
Ifn(x)f(x)
- f(x) I
-->
( (d +d4)& J 2)K
1/2

a.s ..

We now study the uniform behaviour of fn over the entire space. The results
are summarized in the following corollary.
COROLLARY 2.2 Let fn be the estimator associated with the normal kernel.
Then
1) If assumptions of Theorem 2.1 hold and if in addition E II Xo II < 00 we
have
(2.29) sup Ifn(x) - f(x)1 --> 0 a.s ..
xElRd

2) If assumptions of Theorem 2.2 hold and if

(2.30) -
limllxll_oo I x II d+2 f(x) < +00

then for every integer k

(2.31) -L-1
ogk n
(n)*
-L
ogn
sup Ifn(x) - f(x)1
xElRd
--> 0 a.s . .

It may be shown that (2.29) and (2.31) are still valid if K has compact support.

Proof
1) Since (2.24) is valid it suffices to establish that

sup Ifn(x) - f(x)1 --> 0 a.s ..


IIxll>n'Y

Now f is an integrable uniformly continuous function , hence


(2.32) sup f(x) --> 0
Ilxll>n'Y

and it remains to check that

(2.33) sup fn(x) --> 0 a.s . .


IIxll>n'Y

First note that we may choose 'Y > 2, then, by Markov inequality
2.4. ASYMPTOTIC NORMALITY 53

Therefore by Borel-Cantelli lemma

P (lim UII
t=l
Xt II> n2'Y) = 0 .

Now let us setno = lim {O


II X t II::; ~'Y} .
For every w E no there exists no{w) such that n ~ no{w) implies
n'Y
II Xt{w) II::; 2; t = 1, ... , n. Therefore, if II x II> n'Y we have
x - Xt(w) n'Y
II h II> -h ; t = 1, ... ,no
n 2 n

Then choosing Xn E Rd such that II Xn II = nh'Y we obtain


2 n

K ( X - hXt{w»)
n
< K(xn) t = 1, ... ,n

hence

sup fn{x,w)::;
IIxll>n~
1
hd
n n
L K (x ~ X
(no(w)
t=l n
t
)
+ (n - no{w»K{xn)
)

and finally

{21r)-d/2 (
(2.34) sup fn(x,w) ::; hd no{w) + exp (' ---h
1 n2'Y))
2 '
IIxll>n~ n n , 24 n

wE no,n ~ no(w), which implies (2.33) . •


2) The proof is similar. From (2.30) we get

c:;;l sup fn(x,w) ---> 0 , wE no


IIxll>n~

and by using Theorem 2.2 we come to (2.31) . •

2.4 Asymptotic normality


Let (Xt, t E Z) be a JRd-valued Strictly Stationary Process with marginal dis-
tributions satistying the following conditions
C1) The density ft" ... ,t, of (X t ., . .. ,Xt ,) exists whenever tl < t2 < t3 < t4
and
sup II ft., ..,t, 1100< +00 .
t. <t,<t3<t,
54 CHAPTER 2. DENSITY ESTIMATION

C2 ) SUPt.<t2 II ft. ,t2 - f 1)9 f 1100< +00 .

C 3 ) f E C2,d(b) .
Then we have
THEOREM 2.3 If C 1, C2, C3 hold, if a(k) = O(k- f3 ) where f3 ~ 2 and if
hn = L ~ n-
og ogn
m,
c > 0 then for all integer m and all distincts Xi'S such
that f(Xi) > 0

(2 .35) ( nhd)1/2 ( fn(Xi) - f(xi) , 1::; i ::;


n (fn(Xi) J K2)1/2
m) ~ N(m)
where N(m) denotes a random vector with standard normal distribution in Rm.
Before the proof, let us mention that the precise form of (2.35) is useful for
constructing confidence sets for (f(X1)" '" f(xm))·

Proof
We first show that

(2 36) ( hd)1/2 (fn(Xi) - Efn(x;) 1 < . < ) ~ (N(m) N(m))


. n n (f(Xi) J K2)1/2' - Z - m 1 , • •• , m .

According to the Cramer-Wold device (see [BIll p.49) it suffices to prove


m
that whatever (AI, ... , Am) E Rm such that L IAil i= 0
i=l

( 2.37) ( hd)I/2 ~ Afn(Xi) - Efn(xi) ~ ~ AN(m)


n n L... t (f(x .) JK2)1/2 L... t t .
t=1 t t=l

Let us set n
Sn = LYi,n
t=l

where

t = 1, . .. , n, and consider the blocks


V 1n = YIn + ... +Y pn = Y p+ 1 ,n + ... +Yp+q,n , ... ,
, V{n
Vrn = y(r - l)(p+q)+l,n + .. + Y rp+(r-l)q,n , V:n = Y rp+(r-l)q+l,n + ... +
Yr(p+q),n,
where
r(p + q) ::; n < r(p + q + 1) .
2.4. ASYMPTOTIC NORMALITY 55

From Bradley's lemma 1.2 there exist i.i.d. random variables WIn, ... ,
Wrn such that PWjn = PVjn and

P(IV}n - Wjnl > en) :::: 11 (" V}n;n Crt 1100) 1/2 a(q);j = 1, . .. ,r,

. ~ IAil1i K 1100 (d)I/2


wIth en = 2p 8 (J(xd J K2)1/2 ,en = £ rphn ,£ > o.
Therefore, if r, p, q tend to infinity we have

(2 .38)

Now consider r r

LVjn LWjn
(2.39) D. _ j=1 _ -,-j_=_1---:-7:::"
n - (rph~)1/2 (rph~)1/2'

by using (2.38) we get for all £ >0


P (lD.nl > £) = O(pl/4r3/4a(q)h;;d/4)

Then choosing r ~ n a , p ~ n I - a , q ~ n C , 0 < a < 1, 0 < e < 1 where a


and e will be specified below, we obtain

(2.40) P(ID.nl > £) = 0 (n 1 / 2(a-2CJ3+ffi) (LogLogn)d/4)

.
which tends to zero provIded that e > 2(3 1( + ++ 2)
a d
d 4 ,thus

(2.41) D. n -> 0 in probability .

We now prove the asymptotic normality of


r

LWjn
Zn = (~:~~)1/2 ,n ~1 .

To this aim we apply the Liapounov condition (see [BII] p. 44). It suffices
to show that
r

L
j=1
E IWjn l
3

(2.42) Zn = (rVarWIn )3/2 -> 0 .


56 CHAPTER 2. DENSITY ESTIMATION

By using the same arguments as in Theorem 2.1 it can be shown that

VarWln = VarVln rv ph~ L >.; .


i=1

On the other hand, the uniform boundedness of ftt, .. ,t4 implies

Now applying Schwarz inequality in (2.42) and the here above results we
obtain

that is

8
which tends to zero provided that a ::::: 3(d + 4)
Now write
n n

Sn = LVJn + LVln + tin ,


j=1 j=1

it is easy to see that

LVln
r )
j=1
Var (
(rph~)1/2

and

Var ( tin 1/2) = O(na- 1)


(rph~)

which both tend to zero provided that 0 < c < 1 - a.

· .
Th ese con d ItlOns 2/3 - -/3--
are satls.fi e d I'f a < -/3-- 1 -dd -+ 2 w h'IC h IS. com-

8 1( 2) .
2 +1 2 +1 +4
d+
patible with a::::: 3(d + 4) and c> 2/3 a + d + 4 Sillce /3 ::::: 2.
2.5. NONREGULAR CASES 57

Collecting the above results we obtain

L
m
Zn ~ A;N;(m)
i=1
~n ~ 0

L VJn
r
(rph~) -1/2 ~ 0
j=l
(rph~r1/2 Dn ~ 0

which put together imply (2.36) by using [BIll (Theorem 4.1 p. 25).

Now in (2.36) Eln(Xi) may be replaced by I(Xi) since I E C2 ,d(b), in fact,


using (2 .9), we get

Finally since Lemma 2.1 shows that

in (2.36) we may also replace I(Xi) by In(Xi) and the proof of Theorem 2.3 is
therefore complete. •

2.5 Nonregular cases


In this section we will show that In may have an interesting behaviour in some
unusual situations.

2.5.1 Chaotic data


We begin with an example which shows that asymptotic independence is not
a necessary condition for obtaining sharp rates of convergence for In.

Let r be a mapping from E to E where E belongs to B]Rd and (Xt ;


t = 0, 1, 2, ... ) a sequence of r .v.'s which satisfies

(2.43) Xt = r(Xt - d ; t = 1,2, .. .


we suppose that the following conditions hold
D 1 ) r preserves /.L, a probability on E having a density I with respect to
Lebesgue measure.
58 CHAPTER 2. DENSITY ESTIMATION

D2 ) There exists 'Yo> 0, P E]O , 1[ and c > °such that

for each hypercube E in Rd satisfying p,(E) < 'Yo .


D3 ) Xo has a density go and, for each t ~ 1, X t has a density 'Trtgo such that
+00
L II 'Jrtgo - f 1100< +00 .
t=1

A typical example of a model satisfying Db D 2 , D3 should be the p - adic


Process.
(2.44) X t = pXt - 1 (modI) , t ~ 1 ,
where p ~ 2 is an integer and where Xo has a density go concentrated on [0, 1]
and with a bounded derivative.
Other examples may be found in [LM] and [RU].

Clearly the process defined by (2.43) is not strongly mixing. However we


have the following astonishing result.

THEOREM 2.4 Let f~ be the kernel estimate associated with the naive kernel
K = 1[-!,+W' Suppose that Dl1 D2, D3 are satisfied and that f E C2,d(b).
.
Then if h n = C
(L-n-
ogn ) I/(d+4)
°
,e> we have

(2.45) E (J~(x) - f(X))2 = 0((L~n) I


4 d+4) ,x E E.
Thus despite the high dependence between the Xt'S, the convergence rate of
f~ is the same as in the Li.d. case, up to a logarithm. This fact comes from
the chaotic character of r. Note that (2.45) shows that f~ is useful for the
empirical determination of the invariant measure of a dynamical system.

Proof of Theorem 2·4


First assume that go = f, then from D 1 , 'Jr t go =I for each t and with
similar notations as in Theorem 2.1 we have

where En, V In and en are defined in (2.8).


2.5. NONREGULAR CASES 59

L ) 1/(d+4)
Therefore by using (2.9), (2.12) and h n = c ( ~n we obtain

4/(dH»)
(2.46) B~(x) + Vfn(x) = 0 (( L~n )

.
Now settmg Bn = [hn hn] d
-"2' +"2 we get

IP(Xt E Bn,Xt , E Bn) - P(Xt E Bn)P(Xt , E Bn)1 :::: 2J1(Bn)


:::: 2 II f 1100 h~ .
Thus from D2 it follows that for n large enough

IP(Xt E B n , Xt' E Bn) - P(Xt E Bn)P(Xt, E Bn)1


(2.47)
< cn(lt' - tJ)

where
cn(lt' - tl) = min(211 f 1100 h~ ,cp1t'-t l ).
From (2.47) we deduce that

LOgh- d ]
Now choosing Wn = [ LOgp~l we find that

Ie (x)1 < 4 II f
1100 Logh;;d +~ ~ t
n - nhd Logp-l nh2d ~ p
n n t>Wn

;n ,
which implies
4/(dH»)
(2.48) IGn(x)1 = 0 (( )

and (2.45) is proved if 90 = f.


We now turn to the general case. First we have (with clear notations)
60 CHAPTER 2. DENSITY ESTIMATION

then by using D3 we easily get

(2.49)

On the other hand

hence

thus
(2.50)

It remains to find a bound for

To this aim we note that if 0 < t < t'

and that

therefore
n-1
::; n 2 6h 2d ~(
~ n - t) II 'Tr
t
go - f d
1100 h n
n t=l
(2.51) 6 +00
::; nh d
n t=l
L II 'Trtgo - f 1100 ,
finally (2.46), (2.48), (2.49), (2 .50) and (2.51) imply (2.45) . •
2.5. NONREGULAR CASES 61

2.5.2 Singular distribution


Consider a stationary Process with marginal distributions concentrated on S, a
closed set in JRd of Lebesgue measure zero. Then we will see that fn explodes in
a neighbourhood of S and vanishes elsewhere. Making use of these properties
it is thus possible to construct an estimate for S.

More precisely we consider a G8M strictly stationary JRd-valued process


(Zt, t E Z) with components (Yi, X t , .. . , Xt-k) where k := d - 2 ;::: 0 and such
that
(2.52) Yi = r(Xt ,·· . ,Xt - k } , t E Z .
We consider the density estimate written under the form

fn(x, y}
1
= (n _ k}h~+2 ~ K
n-k (Xx-
hn
(t)
'
y - Yt+k
hn
) '

where n > k , x E JRk+1, Y E JR, X(t) = (X t , . .. , Xt - k).


For convenience we take K = l[_~ ~lk+2 and, as usual, we suppose that (h n )
2 '+2
tends to zero.

The assumptions we shall use are the following

8 1} There exists Cr > 0 and a neighbourhood V of x such


Ir(x"} - r(x/}1 ::; Cr II x" - Xl II ;xl , x" E V.

8 2 ) X(t) has a density fx continuous and strictly positive at x.


Then the zero-infinite behaviour of fn is given by the next theorem.
THEOREM 2.5
1) If 8 1 holds then for n ;::: no where no depends only on (h n ) we have

(2.53) sup fn(x, y) = o.


ly-r(xll>max(l,c r )h n

1 2
2) If 8 and 8 hold and if (~~;:;2 -+ +00 then for all 'Y E ] 0, ~ [ we have
(2.54) limhn inf fn(x,y»O a.s. .
ly-r(x)I<"yh n

8imilar results may be proved for other kernel estimates (see [BOI]) .
62 CHAPTER 2. DENSITY ESTIMATION

Proof
1) First we may and do suppose that Cr 2: 1 and that we use the sup norms
in ]Rk+l and ]Rk+2 . Now if n is large enough we may use 8 1 to write that

II X(t) - x 11< h2n implies Ir(X(t)) - r(x)1 :S Cr h2n .

Thus, if Iy - r(x)1 > crhn we have

:S Iy - r(X(t))1
:S Iy - r(X(t))1

h
which entails Iy - r(X(t))1 > Cr 2n.

Consequently we have

Now put

( Y ) -_ (X -h X(t)
Unt X,
n
'
Y - r(x(t)))
hn ' 1:S t :S n - k ,

by using (2 .55) we get

1 1
II Unt(x, y) II> 2 min(1 , cr ) =2 ,1:S t :S n - k

thus we obtain fn(x,y) = 0 hence (2 .53) . •

2) From Iy - r(x)1 < ,hn with 0 <, < ~ and II x - XCt) II:S ahn we deduce
that for n large enough, we have

Now, if a = min (1 ~:,,~), we obtain II Unt(x,y) II:S ~ which implies


K(unt(x,y)) = 1 for t = 1, . . . , n - k.

Therefore for all y such that Iy - r(x)1 < ,hn, we have

(2.56)
2.5. NONREGULAR CASES 63

where

Now it suffices to note that f~ is a kernel density estimate, then, applying


Lemma 2.1 we obtain

f~(x) --+ fx(x) a.s.

and by (2.56)

lim inf (2a)-k- 1 hn fn(x,y) ~ fx(x) a.s.


Iy-r(x}l<-rh n

which is the claimed result . •

As an application we now consider an estimate Tn of l' defined by

(2.57) fn(X ,Tn(X)) = ma:xfn(x,y)


yEIR

where Tn(X) is chosen in such a way that it should be measurable.

The consistency of Tn is given by the following

nh k + 1
COROLLARY 2.3 IfSI and S2 aTe satisfied and if (Lo;n)2 ...... +00 then faT
each 8 EjO, l[
(2 .58)

Proof
Let be a positive number. If h~I+c5ITn(X) - T(x)1 > € then we have

ITn(x) - T(X) I > ma:x(l , cr)hn , n ~ no using (2 .53) we obtain for n >
max(no , ne:)

and from (2 .57)

fn(x, y) = 0 , Y E IR , n ~ ma:x(no, no) .

Now (2.54) implies that this event has a probability zero, hence the result . •

We refer to [B01] for the construction of an estimate for k which uses again
Theorem 2.5.
64 CHAPTER 2. DENSITY ESTIMATION

2.5.3 Processes with errors in variable.


Let (Xt, t E IE) be a real strictly stationary process. Suppose that each X t is
observed with a random error et, thus the observed process can be written as

Yt = Xt + et , t E IE

where the et'S are Li.d. and the processes (X t ) and (ed are independent.

Here we have a Deconvolution Problem: estimate the density I of X t


from the data YI ,· . . , Yn, the density Ie of et being known. That last assump-
tion is somewhat restrictive but it ensures the identifiability of the problem.

The commonly used estimate is given by

(2.59)

where Kn is the deconvolution kernel:

(2.60) K- n (Y ) -- - 1
27T
1 IR
e
-iuy c/>K(U)
(-1) dU
c/>e uh n
, Y E lR .

In (2.60) c/>e denotes the characteristic function of eo and c/>K the Fourier
transform of the classical kernel K.

Under some technical assumptions, asymptotic results may be derived.


Clearly the convergence rates are weaker than those in the regular case. Here
we only give some indications, we refer to the literature for a complete exposi-
tion.

Two important cases may be considered

1) If <Pe is algebraically decreasing at infinite it is possible to reach good


convergence rates.

In fact under some regularity and strongly mixing conditions and if


IE C2,d(b) it may be proved that
(2.61) n~ E(jn(x) - I(X))2 -> C >0
where c is specified and where ;3 is such that

lim u{3<Pe(u) = a#-O .


lul--oo
2.5. NONREG ULAR CASES· 65

On the other hand we have


Logn
!~g lin(x) - (~)
5+2)3
2 )
(2.62) f(x)1 = 0 ( a.s;

where D is a compact set.

Then the loss is small: compare (2.61) with (2.7) and (2.62) with (2.25).
2) If 1>" is geometrically decreasing at infinite the convergence rates are
poor.

If, for instance, 1>,,(u) is of order exp( -alul i3 ) (a > 0) at infinite then
under some conditions

and
sup lin(x) - f(x)1 = O((Logn)-2/ i3 )
xED
where D is compact.

Unfortunately an improvement is not possible since the above rates and


the last two ones specifically are optimal (see [Fl]).

For multidimensional versions of the above results and asymptotic normality


we refer to the bibliography.

Notes

The kernel density estimator was introduced by ROSENBLATT ([R01]) in 1956


and PARZEN [PAl in 1962. A great number of people have studied this estimate in
the i.i.d. case.

In the strongly mixing case one can mention ROUSSAS, ROSENBLATT, TRAN ,
TRUONG-STONE, MASRY, BOSQ, ROBINSON, PHAM-TRAN among others.

DELECROIX in [DE2] (1987) has first considered the case of an ergodic process.
In 1992, GYORFI and LUGOSI [GL] have shown that the kernel density estimator
is not universally consistent for an ergodic process.

Chaotic data and singular distributions are studied by BOSQ ([BOll and [BOS]).
Processes with errors have been recently considered by FAN and MASRY.

The choice of bandwidth will be discussed in Chapter 7.


Chapter 3

Regression estimation and


prediction for discrete
time processes

The construction and study of a non parametric predictor are the main purpose
of this chapter. In practice such a predictor is in general more efficient and
more flexible than the predictors based on BOX and JENKINS method, and
nearly equivalent if the underlying model is truly linear. This surprising fact
will be clarified at the end of the chapter.

In Sections 1 and 2 we will study the kernel regression estimator obtaining


optimal rates in quadratic mean and uniformly almost surely and deriving the
asymptotic distribution . Section 3 will be devoted to prediction for a kth order
Markov process. Prediction in the general case will be presented in Section
4. This section also contains some ideas about related topics : interpolation,
outliers detection, chaos, regression with error.

3.1 Regression estimation


Let Zt = (Xt, Yi), t E Z be a JRd x JRd' -valued strictly stationary process
and let m be a Borelian function of JRd' into JR such that E(lm(Yo)l) < +00.

We suppose that Zo admits a density fz(x,y) and that fz(x ,· ) and mOfz(x, ·)
are in L: 1 (>, d') for each x in JRd . Then, we may define the functional parameters

(3.1) f(x) = J fz(x, y)dy , x E JRd.

67
68 CHAPTER 3. REGRESSION ESTIMATION

(3.2) <p(x) = J m(y)fz(x,y)dy , x E JRd.

and

r(x) <p(x)/f(x) if f(x) > 0


(3.3)
Em(Yo) if f(x) = o.
Clearly, rO is a version of E(m(Yo)IXo = .) . We will say that r( ·) is a
regression parameter.
Typical example of regression parameters are

r(x) P(Y E B I X = x) (B E Bad)


r(x) E(yk I X = x) (k ~ I,d' = 1)
r(x) V(Y I X = x) (d' = 1)

The problem is to construct an estimator of r based on the data


Zt, 1 ::; t ::; n. As for density the method uses a convolution kernel which
regularizes the empirical measures.

Consider the empirical measure

1
L
n
Vn = - D(Xt,m(Y,))
n
t=l

and its marginal distribution

A regularization of /In and Vn by convolution leads to natural estimators of


f and <p :
(3.4) ~ (x- -hX-t )
fn(x) = nh1 d LK , x E JR
d
n t=l n

and

(3.5)
1 ~
<Pn(x) = nh~ ~ m(Yt}K
- Xt)
(xh;:- , x E JR ,
d

where K is a strictly positive kernel (see Chapter 2) and hn a smoothing pa-


rameter satisfying limn --+ oo h n = O( +) .
3.2. ASYMPTOTIC BEHAVIOUR 69

Practical considerations leading to Tn are discussed in 8.3.

Consequently the kernel estimator of r is defined as

(3.6)
Note that, if K is not strictly positive, definition (3.6) must be completed:
1 n
for fn(x) = 0 one may choose rn(x) = - L m(yt) which is clearly more nat-
n t=l
ural than the arbitrary Tn(X) = 0 used by many authors.

Note also that an interesting form of Tn should be the weighted mean


n

(3.7) Tn(X) = LPnt(X)m(yt)


t=l

where Pnt(X) = K X-Xt) I ~


(h;:- ~ K (X-X)
T , 1 ::; t ::; n if fn(x) =J 0 and

Pnt(x) =.!.,
n
1::; t::; n if fn(x) = O.

3.2 Asymptotic behaviour of the regression


estimator
3.2.1 Quadratic error
In order to obtain the exact asymptotic quadratic error for Tn, we need the
following assumptions and notation :
1. The density f(x.,x,) exists for s i- t, belongs to C2,d(b) and satisfies
sup I f(x.,xtl - f 0 f IIp< +00
s#t

for some P Ej2, +ooj.


2. f and <p belong to C 2 ,d(b), fz belongs to C 2 ,d+d' (b) for some b.
3. There exists a > 0 such that
E(expalm(yt)l) < +00.

4. The strong mixing coefficient a(·) of (Zt) satisfies


a(k) ::; ,k-(3 , k ~ 1

where, > 0 and !3 > O.


70 CHAPTER 3. REGRESSION ESTIMATION

Now, if I(x) > 0, we set

C(x , c,K,f,r) = c: ( L f)~ + f)~ogf


l$ i ,j$d
f)
Xl XJ
.
Xl
:r. jUiUjK(U)dU)
XJ
2

+ -d V (X)jK 2
c f(x) ,

where v(x) = j m(y)29~~~~) dy - r(x)2 is a version of V(m(Yo) IXo = x).

Then we have

THEOREM 3.1 If 1 -> 4 hold, if f(x) > 0 and if (3 > max( 2(p - 1) , d + 2)
p-2
then the choice h n = enn-1/(d+4) where Cn -> c> 0 leads to
(3.8) n 4/(d+4)E(r n (x) -r(x)? -> C(x,c,K,f,r).

Proof

We omit x and write


f - In r 'Pn - 'P
(3 .9) rn - r = (rn - r ) - I - + 7(f - In) + -1-'
thus

E(r n - r)2 = An + En + C n
where

and
2
en =- J2E[(rn - r)(c.pn - c.p)(fn - f)].

The quantity E(f - fn? has been already studied in Theorem 2.1 and the
other terms in An may be studied similarly. After some calculations one obtains
n 4 / (d+4) An -> C(x, c, K, I, r) .

It remains to prove the asymptotic negligibility of En and C n with respect


to n- 4 /(d+4) . We only consider En since the treatment of C n is similar.
3.2. ASYMPTOTIC BEHAVIOUR 71

Given I > °and c > 0, we have

Bn :S (n' + Irl)E(lrn - rlllrnl<o;n~)Un - 1)2 (Ilrn-rl<O;lI<-l-<h


+Ilrn-rl>n<-l-<h) .

Using successively Schwarz and Holder inequalities we get for n large enough :
Bn ::; 2n'n-(l+eh EUn - 1)2 + 2n'[EUn - 1)4]1/2
(3 .10)
[E (Ir n - rI2vllrnl<o;n~) jl/2V [p (lrn - rl > n-(l+eh: Irnl ::; n') jl/2V ,
1 1
where - + - = l.
v w
The first term in the bound is a O(n-C1'n- 4 /(dH) = o(n- 4 /(d+4). In the
second term we treat each factor separately. First we clearly have

11K 1100 h~ (EUn - f?)1/2


(3.11)
d-2) .
o (nT+4
On the other hand since

we have, writing 2v = 2(v - 1) +2 ,

(3 .12)

In order to evaluate the last factor, we put

and

First we have
72 CHAPTER 3. REGRESSION ESTIMATION

then, assumption 3 and Markov inequality imply

PeEn) S nexp( -an"Y)E(expalm(Yo)1) .

On the other hand we may apply the exponential type inequality (1.34) to
the random variables Vt - EVt, 1 S t S n. Then, choosing q ~ n A , collecting
the above bounds and choosing v large enough, 'Y and € small enough and A
4
close enough to -d- we get En = o(n4/(d+4») and the proof of Theorem 3.1
+4
is now complete . •

3.2.2 Uniform almost sure convergence


Uniform convergence of a regression estimator may be obtained over compact
sets, but in general not over the whole space, even if some information about
the behaviour of rex) for II x II large is available.

As a simple example let us consider the case where the Zt'S are i.i.d. bi-
variate Gaussian variables with standard margins and Cov(Xt , Yt) = p. In this
case the classical estimator of rex) = px is Pnx where

and clearly
(3 .13) sup IPnx - pxl = +00 a.s ..
xEIR

In fact it is impossible to construct a regression estimator Rn such that

sup IRn(x) - pxl -> 0 a.s.


xEIR

since this property implies that, whatever (un) i 00,

Un ( Rn~:n) - p) -> 0 a.s.

and consequently it should be possible to obtain an estimator of P with an


arbitrary sharp rate of convergence.

However it is possible to establish uniform convergence over suitable increas-


ing sequences of compact sets. In the following, we will say that a sequence (Sn)
of compact sets in Rd is regular (with respect to f) if there exists a sequence
(f3n) of real numbers and a'Y > 0 such that for each n

inf f(x) ::::: f3n > 0 and 8(Sn) S n"Y


xES n
3.2. ASYMPTOTIC BEHAVIOUR 73

where 6(Sn) denotes the diameter of Sn .

We first consider convergence on a fixed compact set. In that case the


obtained rate is optimal (up to a logarithm).

THEOREM 3.2 Let (Zt) be a GSM strictly stationary process such that f
and'P belong to C 2 ,d(b) for some b and such that E(expalm(YoW) < +00 for
some a > 0 and some T > O.

nhd
Then if K is Lipschitzian, if n 1 -t +00 and ~f S is a compact set
(Logn)2+"T
such that infxEs f(x) > 0, we have

(3.14) sup Irn(x) - r(x)l-t 0 a.s ..


xES

.
Furthermore if h n ~ (
Logn 2 -"T
(n)
1) 1/(d+4)
.
, then for each mteger k

n 2/(d+4)
(3.15) ( ) sup Irn(x) - r(x)1 -~ 0 a.s ..
Logkn(Logn)~+ 1-~;r:h xES

Proof (Sketch)

We omit x and write

suplrnl 1
sup Irn - rl ~ ~f sup Ifn - fl + T f sup l'Pn - 'PI .
S l~ S l~ S

First, by using (3.7) we get for each A >0


P(sup Irnl > A) ~ P( sup Im(yt)1 > A) .
S l:::;t::;n

Now, Markov inequality entails

P(suplrnl > A) ~ nexp(-aAT)E(expalm(YoW),


S

2
then, choosing A = cT(Logn)l/T where c > - and using Borel Cantelli lemma
a
we obtain
P(limsup{suplrnl > cT (Logn)1fr}) = o.
n S

On the other hand, using inequality (1.25) and a covering of S it is easy to


prove that suPs l'Pn - 'PI -t 0 a.s. and (Logn)l/T sups Ifn - II ---t 0 a.s. (see
74 CHAPTER 3. REGRESSION ESTIMATION

the proof of Theorem 2.2), hence (3.14).

The proof of (3.15) is similar but uses the more precise inequality (1.26)
instead of (1.25) . Details are omitted . •

The following theorem considers a varying compact set :


THEOREM 3.3 If assumptions of Theorem 3.2 hold, if (8 n ) is a sequence of
real numbers such that there exists a regular sequence of compact sets satisfying

8n (Logn) (1+t:) - (1- t:) d;4


-'----=--'----::-;-;-:--:-:-------+0
i3
n n 2 /{d+4}

(L ?-(1/T}) 1/{d+4}
then the choice h n -::: ( ogn n entails

(3.16) 8n sup Irn(x) - r(x)1


xESn
-T 0 a.s ..

The proof is similar to that of Theorem 3.2 and therefore omitted.

Example 3.1

Suppose that m = IB where B E BIRd and that f(x) -:::11 x II-V for II x II
2
tending to infinity (p> d), then if 8 < -d- - 'YP we have
+4

nO sup ItPnt(X)IB(Yt) - P(Yo E BIXo = X)I-T 0 a.s.


Ilxll~n~ t=1

where Pnt(x) is defined with (3.7).

Example 3.2

Suppose that (Zt) is a bivariate Gaussian process with X t '" N(o, 0"2)
(0" > 0). Then, if m is the identity, we have for each c EjO, ~[
n~-e
~ sup Irn(x) - r(x)1 -T 0 a.s ..
ogn Ixl~(7v'2€Logn

3.2.3 Limit in distribution


We now state some conditions which ensure weak convergence of
'Yn(rn(x) - r(x)) for a suitable choice of ("In).
3.3. PREDICTION FOR A STATIONARY MARKOV PROCESS 75

For convenience we suppose that

where (~t, t E Z) is a real strictly stationary process, and that m is the identity.

Now the assumptions are

1. EIXol4+6 < +00 for some positive 8.


82 + 4
2. (X t ) is a-mixing with a(k) = O(k- 13 ) where !3 > -U·
3. f and r.p are in C2,1(b) for some b.
4. fOE (Yo21Xo = .) is continuous at x.

5. sup II E (YoY/IXt = . ,Xo = .) 1100< +00


tEZ

where i 2: 0, j 2: 0, i +j = 2.

6. sup II f(xo ,xtl 1100< +00.


t~k

then we have

THEOREM 3.4 If conditions 1 ----> 7 hold, if f(x)v(x) > 0 and if


then
nh~+2 ----> 0

(3.17) fn((X)) (rn(x) - r(x)) ~N rv N(O, 1)


Vn x

where Vn is the kernel estimator of v.


The proof is rather intricate but similar to the proof of Theorem 2.3 and there-
fore omitted.

3.3 Prediction for a stationary Markov process


of order k
Let (~t, t E Z) be a Rdo-valued strictly stationary process. Suppose that (~d is
a Markov process of order k, namely
76 CHAPTER 3. REGRESSION ESTIMATION

or equivalently

for each Borelian real function F such that E(IF(~o)l) < +00.

Given the data 6, ... , ~N we want to predict the non-observed square inte-
grable real random variable

where 1 ::; H ::; N -k and where m is measurable and bounded on compact sets.

For that purpose let us construct the associated process

and consider the kernel regression estimator r n based on the data


(Zt,l ::; t ::; n) where n = N - k + 1 - H. In the present case d = kdo and
d' = do.

From r n we construct the predictor

(3.19)

and we set

3.3.1 Quadratic prediction error


We study here the asymptotic quadratic prediction error first for a bounded
process and then in the general case. In the bounded case, we have the following

THEOREM 3.5 If (~t) is a bounded, GSM, strictly stationary kth order


Markovian process and if the associated process (Zd admits functional parame-
ters f and'P continuously differentiable on S = {J > O} then, for a Lipschitzian
(LOgn)2-e) 1/(d+2)
K and h n ~ ( where
n
o < c < 2 we have

(3.20)
3.3. PREDICTION FOR A STATIONARY MARKOV PROCESS 77

Proof (sketch)

First we clearly have

Now, an integration by parts gives

E(sup(rn(X) - r(x)?) = 2
xES
1 0
+00
vP(sup Irn(x) - r(x)1
xES
> v)dv

Finally, in order to bound above the integrand, it suffices to make a covering


of S and then use the exponential inequality (1.26) (see the proofs of Theorems
2.2 and 3.2) . •

Note that if f and <p are twice continuously differentiable on S the rate is
not improved because the bias remains the same on the edge of S.

We deal with the general case in the following


THEOREM 3.6 If (~t) is a GSM, strictly stationary, kth order Markovian
process and if the associated process (Zt) satisfies: f and <p E C 2 ,d(b) for
some band E(expalltn < 00 for some a > 0 and some 7 > 0; then for a
. .. ((LOgn?-~) l/(dH)
Lzpschztzwn K, hn c:::: n and each reg'ular sequence (Sn) of
compact sets

(3.21)

and

(3.22) E ((rn (Xn+H ) - r(Xn+H) )21xn+H (j. Sn) =


=0 ((Logn)2/'r P(II XO II> b(Sn)1/2) .

Proof

The proof of (3 .21) is similar to that of (3.20) and is therefore omitted.

Concerning (3.22) first, note that, since conditional expectation is a con-


traction in L4(n, A, P) (see [RAJ) we have

(3.23)
78 CHAPTER 3. REGRESSION ESTIMATION

On the other hand (3.7) yields

rn(Xn+H):S sup Im(et+k-1+H)1


l$t$n

and it is easy to prove that the condition E(expalm(eoW') < +00 implies

(3.24) E ( sup Im(et+k-1+HW) = O((Logn)p/T),


l$t$n

thus
(3.25)
Now let us write

~n E((rn(Xn +H ) -r(Xn+H))2Ixn+HV!'sJ

:s 2E (r;(X n +H )lxn+HV!'sJ + 2E (r2(Xn+H)Ixn+HV!'Sn)


then by using Schwarz inequality, (3.23) and (3.25) we get

hence (3.22) . •

Note that while the rate in (3.20) is nearly optimal, there is a loss of rate
in the general case. As indicated above, the reason for it is the unpredictable
behaviour of r(x) for large values of II x II.

Example 3.3

Take a one dimensional Markov process (et) with f(x) ~ cexp( -c'lxjT)
(T ~1, c > 0, c' > 0), then, using (3.21) and (3.22) it is easy to check that

3.3.2 Almost sure convergence of the predictor


The empirical error Irn(Xn+H ) - r(Xn+H)1 gives a good idea of the predictor's
accuracy. We now study its asymptotic behaviour. As above we separate the
bounded case and the general case.
COROLLARY 3.1 If conditions in Theorem 3.5 hold, then, for each € > 0,

(3.27)
n 1 /(d+2)
----(-l-)-."c-lrn(Xn+H) - r(Xn+H)1
(Logn)'+ -, d+2
-----+ ° a.s ..
3.3. PREDICTION FOR A STATIONARY MARKOV PROCESS 79

Proof

Since
Irn(Xn+H) - r(Xn+H)1 :S sup Irn(x) - r(.r) I
xES

the result follows from Theorem 3.2 applied to the associated process (Zt) . •
COROLLARY 3.2 If conditions in Theorem 3.6 hold and if

on(Logn)(l+f; )-(1- f;);rh


--"-''--.:::..-'::----;::-;,...,..,...,.,---------->0
f3n n 2 /(d+4)
then
(3 .28)
Furthermore, if

then
(3.29)
Proof

For (3.28) it suffices to write

Onlrn(Xn+H) - r(Xn+H )IIXn+HESn :S sup Irn(r) - r(x)1


xES n
and then to apply Theorem 3.3.

A simple application of Tchebychev inequality and (3.22) entails (3 .29) . •

In example 3.3 we obtain, up to a logarithm, the rate n- 2 / 25 .

3.3.3 Limit in distribution


In order to derive the asymptotic distribution of rn(X n+ H ) we need an inde-
pendence asymptotic condition stronger than a-mixing : a process (~t, t E Z)
is said to be 'Prev -mixing if the reversed process (~-t,t E Z) is 'P-mixing. For
such a process we have the following
LEMMA 3.1 Let (~t) be a 'Prev-mixing process and let TJ be a O'(~t, t ::; k)-
measurable bounded complex random variable, then for each positive integer
P
(3.30)
where 'Prev(.) is the 'P-mixing coefficient of (~-t).
80 CHAPTER 3. REGRESSION ESTIMATION

Proof See [RS-IO]

THEOREM 3.1 If (~t) is i.prev-mixing and if conditions of Theorem 3.4 hold,


then

(3.31) (/ K2)-1 (n::~~~n;)J)) 1/2 [rn(Xn+H) - r(Xn+H)] ~N


where N '" N(O, 1).
Proof
Let us first consider the kernel estimator

rn,(x) = -
t.
t-1
m(Yt)K
,
(x ~ ~t) n
, xER
d

tK(X hn'-Xt)
t=l

where n' = n - [LogLogn] and where hn' = hn .


Let us similarly define fn' and Vn' , and set

2 n' f n' (X )) 1/2


1 n 'hd d
Zn'(X) = (/ K )- ( Vn' (X) (rn' (X) - r(x)) , x E R .

Then it is easy to check that (3.31) is valid if and only if

(3 .32)

Now, in order to establish (3.32), we first apply (3.30) to eiuZn ' (x) , obtaining

(3.33) IE (eiuZn'(X)IXn+H) _E(eiUZn'( X»)1 :S4i.prev(n+H-n'),


and since E (eiUZ",(x») -> e- u
; , we also have

E (eiuZn'(X) IXn+H) -> e-'4- a.s ..

Now

and the dominated convergence theorem implies

E (eiUZ",(Xn+H») --> e- 4 , uE R
3.4. PREDICTION FOR GENERAL PROCESSES 81

which proves (3.32). Theorem 3.7 is therefore established . •

Note that, by using the precise form of (3.31), one may construct confidence
intervals for r(Xn+H)'

3.4 Prediction for general processes


The assumptions used in the above section allowed us to obtain good rates.
However these assumptions are rather restrictive for applications. In the cur-
rent section we consider some more realistic conditions concerning the observed
process. We will successively study the general stationary case, the nonstation-
ary case and some related topics (interpolation, chaos, regression with error).

3.4.1 Prediction for general stationary processes


Most of the stationary processes encountered in practice are not Markovian
even if they can be approached by a kth order Markov process for a suitable k.
In some cases the process is Markovian but k is unknown. Some methods for
choosing k are available in literature, particularly in the linear case: see [B-D]
and [G-M]. Finally, in practice, k appears as a "truncation parameter" which
may depend on the number of observations.

In order to take that fact into account we are induced to consider associated
processes of the form

where limN_oo kN = 00 and limN_oo N - kN = 00. Here the observed process


(~t) is JRdo-valued and strictly stationary.

The predictor of m(~N+H) is defined as

~y,
~ t,N
K(XN+H,N-Xt,N)
h
(3.34) * ( X N+H,N )
rn = .;..---=.,n.,.---------
t-l n

L K (XN+H,N - Xt ' N)
t=l hn

where n = N - kN + 1- Hand K = K~kN where Ko is a do-dimensional kernel.

Now some martingale considerations imply that E(m(EN+H) I EN, ... ,


EN-kN+l) is close to E(m(EN+H)IE.,s :s
N) for large N. Then under reg-
ularity conditions similar to those of Section 3.3 and using the same methods
82 CHAPTER 3. REGRESSION ESTIMATION

it may be proved that

(3.35)

provided kN = O((LogN)8) for some b > O.


It is clearly hopeless to reach a sharp rate in the general case. In fact, it can
be proved that a (Logn)-8' rate is possible. For precise results and details we
refer to [RH].

3.4.2 Prediction for nonstationary processes


We now consider a simple form of nonstationarity supposing that an observed
process ('Tld admits the decomposition

(3.36) 'Tlt = ~t + St , t E Z
where (~t) is a non-observed strictly stationary process and (St) an unknown
deterministic sequence. For the estimation of (St) we refer to Section 4.

Now, if an estimator s of S is available, one may consider the artificial data


€t = 'Tlt - St ; t = 1, ... , n
and use it for prediction. However that method suffers from a drawback : €t
is perturbed and cannot be considered as a good approximation of ~t (see for
example [G-MJ).

Here we only make regularity assumptions on S and do not try to estimate


it. In fact we want to show that the non parametric predictor considered in
Section 3.3 exhibits some kind of robustness with respect to the nonstationar-
ity produced by s.

In order to simplify the exposition we assume that ('Tlt) is a real valued


Markov process and that we want to predict 'Tln+l given 'Tll, ··· , 'Tln.

In the following 9 denotes the density of (~o , 6), f the density of ~o, r the
regression of 6 on ~o and c.p = r f ·

Concerning S we introduce the condition

C- S is bounded and there exist real functions 7 and cp, and a b 2: 0 such that
3.4. PREDICTION FOR GENERAL PROCESSES 83

and

Example 3.4
If f and <p are bounded and if s is periodic with period T then C is valid
with
_ 1 T
f( ·) = T Lf(· - St)
t=l

J
and

~( . ) = T1 L
T
yg(. - St,y - sddy .
t=l

Example 3.5

If f satisfies a Lipschitz condition and if Sn

t
---> S (respectively S is bounded

and.!.. IStl ---> 0) then 10 = f(- - s) (respectively 1= I) . Furthermore,


n t=l
if <p satisfies a Lipschitz condition, if f is bounded and if s is bounded and
1 n
:; ; L IStl ---> 0, then ~ = <p .
t=l

Example 3.6
A simple example of non periodic s should be

St = 1/[1 + exp( -at + b)] ,t >0

with a > 0 ; it corresponds to a logistic trend.

1 n
Finally, note that the condition -
n
L IStl ---> 0 may be compatible with the
t=l
appearance of some outliers.

We now define a pseudo-regression by setting

r(x) ~(x) if 1(x) > 0


f(x)
1 n
E~o + lim sup :;;; L St if f(x) = O.
t=l
84 CHAPTER 3. REGRESSION ESTIMATION

r(x) appears as an approximation of E(1]n+1l1]n = x). If for instance Sn -+ s,


we have for a continuous rand f(x) > 0 :

r(x) = s + r(x - s) = n-+oo


lim E(1]n+1l1]n = x).

If s is periodic with period T, a rough estimate of


1 T
I r(x) - E(1]n+ll1]n = x) I is IT 2:)Sn+l - st}!.
t=1

The kind of robustness we deal with here consists in the fact that the kernel
predictor

(3.37) r n(1]n) =~
{:r 1]t+l K (1]n - 1]t )
---,;;:- /
~K
(:r (1]n - 1]t)
---,;;:-

is a good approximation of r(1]n). This property is specified in the following


statement:
THEOREM 3.8 If(et) satisfies the conditions of Theorem 3.5 and ifC holds,
then for some 8' 2 0

(3.38)

besides 8' > 0 if 8 > o.


The proof is similar to that of Theorem 3.2 and therefore omitted. Details may
be found in [BO] .

Generalisation

One may consider the model

(3.39) 1]t = et + St , t E Z,

where (St) is a bounded process independent of (et). Then an analogous result


may be established.

3.4.3 Related topics


We now briefly consider some extensions.

Interpolation
Let (et, t E Z) be a real strictly stationary process observed at times
-nl, . .. , -1, +1, . .. n2. The interpolation problem consists in evaluating the
missing data eo.
3.4. PREDICTION FOR GENERAL PROCESSES 85

Consider the associated process

where tEEn = {-nl + en,· . . ,-kn - 1, en + 1, ... , n2 - k n } with


0< en < nl = nl(n), 0 < k n < n2 = n2(n).

Making use of a strictly positive en +kn-dimensional kernel Kn we construct


the interpolator
~ Y;t,n K n (Xo,n h- Xt,n)
L
~ _ tEEn n
(3.40)
O,n - 2: Kn (Xo,n;: Xt,n)
tEEn n
which may be interpreted as an approximation of

Then, with slight modifications, the results concerning the nonparametric


predictor remain valid for ~O,n . For details we refer to [RH].

Obviously eo,n may also be used for detecting outliers by comparing an ob-
served random variable ~tD with its interpolate etDon' If we adopt the simple
scheme (3.36) we obtain a test problem with null hypothesis Ho : StD = O.

In order to construct a test we suppose that to = 0 and under Ho we set

p(T/) = P(I~o - E(~o I Xo,n)1 > T/) , T/ > 0

and
q(c) = inf{T/ : p(T/) :=; c} , 0 < c < 1.
Now a natural estimator for p is

It induces an estimator for q defined by

hence a critical region of the form


86 CHAPTER 3. REGRESSION ESTIMATION

Chaos

Consider the dynamical system defined by (2.43) :

Xt=r(X t - 1) , t=1,2, ...


then, if d = 1, r is the regression of X t on X t - 1 and it can be estimated by the
kernel method. Under classical conditions we have (cf. [MAD

E(rn(x) - r(x))2 =0
L
(( ~n
)4/(d+4)) .

Note that Tn (defined by (2.57)) furnishes an alternative estimator for r.

Regression with error

The problem of regression with error may be stated as follows :

Let(XP) ,yt), t E Z be a JR2-valued strictly stationary process observed at


times 1, .. . ,n. (XP)) has the decomposition

XP) = X + et , t E Z
t

where the et'S are i.i.d; and where (Xt ) and (et) are independent.

The problem is to estimate rO = E(m(yt)IXt = .) where m is some real


mesurable function such that EI(m(yt)1 < +00.

In the particular case where co has a known density, say fe, the estimator
takes the form

where Kn is given by (2.60) .

Now the asymptotic results are similar to those indicated in 2.5.3 : good
convergence rates if rPe is algebraically decreasing and poor rates if rPe; is geo-
metrically decreasing. See [MS].

Note that this model is different from (3.39) since here the observed process
is stationary.
3.4. PREDICTION FOR GENERAL PROCESSES 87

Notes.
Estimation of regression function by the kernel method was first investigated by
NADARAJA (1964) and WATSON (1964). A great number of people have studied
the problem in the i.i.d. case. An early bibliography has been collected by
COLLOMB (1981) . The case of time series has been studied by GYORFI-HARDLE-
SARDA and VIEU (1989) among others.

Theorem 3.1 is due to BOSQ and CHEZE (1994). Theorems 3.2, 3.3 are taken
from BOSQ (1991) and RHOMARI (1994) . Theorem 3.4 and results about predic-
tion and interpolation for kth Markov processes and general st ationary processes are
mainly due to RHOMARI (1994). For related results see the references.

Prediction for non stationary processes is taken from BOSQ (1991) .


Chapter 4

Kernel density estimation


for continuous time
processes

In this chapter we investigate the problem of estimating density for continuous


time processes when continuous or sampled data are available.

We shall see that the situation is somewhat different from the discrete case.
In fact, if the observed process paths are slowly varying the optimal rates are
the same as in the discrete case. If, on the contrary, these paths are irregular
one obtains supemptimal rates in quadratic mean and uniformly almost surely.
It is noteworthy that these rates are preserved if the process is observed at
judicious discrete instants.

The link between appearance of superoptimal rates and existence of local


time will be considered in Chapter 6.

In Section 1 we introduce the kernel estimator in a continuous time context.


Section 2 is devoted to quadratic error while Section 3 deals with uniform con-
vergence. Asymptotic normality appears in Section 4. Sampling is considered
in Section 5.

4.1 The kernel density estimator in continuous


time
Let (Xt, t E JR) be a JRd-valued continuous time process defined on a probability
space (n, A, P). In all the following we assume that (Xt ) is measurable (Le.

89
90 CHAPTER 4. KERNEL DENSITY ESTIMATION

(t,w) ----; Xt(w) is BIR ® A - BIRd measurable).

Suppose that the Xt's have a common distribution J-t. We wish to estimate
J-t from the data (Xt,O ::::: t ::::: T). A primary estimator for J-t is the empirical
measure J-tT defined as

(4.1)

Now if J-t has a density, say f, one may regularize J-tT by convolution, leading
to the kernel density estimator defined as

(4.2) fr(x) = -1
d loT K (x- -hXt)
- dt,x E R d
ThT 0 T

where K is a kernel (see Chapter 2) and where hT ----; O( +) as T ----; +00.

In some situations we will consider the space Hk ,)' of the kernels of order
(k,).) (k E N,O < ). ::::: 1) Le. the space of mapping K : Rd ----; R bounded,
integrable, such that flR d K(u)du = 1 and satisfying the conditions

(4.3) and

r U~' ... U~d K(U1,""


llR d
Ud)du1 ... dUd = 0,
aI, ... ,ad E N; a1 + ... + ad = j, 1 ::::: j ::::: k.

Note that a kernel is a positive kernel of order (1,1). On the other hand
we will use two mixing coefficients :

a(2)(u) = sup sup IP(A n B) - P(A)P(B)I, u ~ 0


tEIR
(4.4) A a(Xt )
E
BE a(Xt +u )

a(u) = sup sup IP(A n B) - P(A)P(B)I, u ~ 0


tEIR
(4.5) A E a(X. , s::::: t)
BE a(X., s ~ t + u)
In particular we will say that (Xt ) is GSM if a( u) ::::: apu , u > 0
(a> 0, 0 < p < 1).
Using the fact that a(X.) is countably generated and employing the exten-
sion theorem (cf. [BI3]) it is easy to check measurability of a(2) (.). Similarly
a(-) is measurable as soon as (X t ) is CADLAG (Le. the paths of (X t ) are
4.2. OPTIMAL AND SUPEROPTIMAL 91

continuous on the right and have a limit on the left at each t).

Concerning the properties of f we introduce the space e~(I!) (r = k + >.,


o < >. ::; 1, kEN) of real valued function f, defined on ]Rd, which are k times
differentiable and such that
af(k) , af(k) I ' A •
(4.6) Iaxi' ... a~Jx) - axi' ... ax~d (x) ::; q x - x II ,
x,x' E ]Rd;jl + ... + jd = k. Note that e 2,d(b) is included in e~(b).

Finally it is interesting to note that the problem of estimating the finite


dimensional distributions P(X" "",x'm) of (X t ) may be reduced to the problem
of estimating 11 by considering the JRmd-valued process

(4.7) yt = (Xt, X t +(t2- t ,)" " , Xt+(tm-t,j) , t E R.

4.2 Optimal and superoptimal asymptotic


quadratic error
In the current section we will assume that the X t '8 have the same dis-
tribution but not a stronger condition like stationarity. We will see later the
usefulness of that degree of freedom.

4.2.1 Consistency
Let us begin with a simple consistency result .
THEOREM 4.1 Iff is continuous at x and if 0:(2) E Ll (>.) then the condition
Th~d -+ +00 implies
(4.8) E(Jr(x) - f(x)? -+ O.
Furthermore if f E etH(R), K E Hk,A and hT ~ T- 1 /(2r+2d) where r = k + >.
then
(4.9) E(Jr(x) - f(x))2 = O(T-r/(r+d») .
Proof
Using the classical Bochner's lemma (see (2.11)) we get

(4.10) EJr(x) = r
}'J(d Khr(x - u)f(u)du h-r>- O f(x).
Now Fubini's theorem entails
VJr(x) =
(4.11)
r
T\ },O,TJ2Cov (Khr(x - Xs),Khr(x - Xt)) dsdt,
92 CHAPTER 4. KERNEL DENSITY ESTIMATION

then by using covariance inequality (1.11) we obtain

Vfr(x)~4I1K II~
T 2 h}d
r
J[0,Tj2
Q(2)(lt-s[)dsdt.

The integral on the right side, say I, may be written as

I 2 l{ dt J~ Q(2) (It - s[)ds


(4.12)
2 J: dt J~ Q(2) (u)du ~ 2T Jo+ oo Q(2)(u)du

and finally

(4 .13) Vfr(x) ~ 81~~JI~ 1+ 00


Q(2)(u)du = 0 (I/Th}d)

which leads to (4.8) by using (4.10) and

(4.14) E(fT(X) - f(X))2 = Vfr(x) + (Efr(x) - f(x)?


Now, in order to prove (4.9) we study the bias of fro
Taylor formula and (4.3) entail

bT(X) Efr(x) - f(x) = JlRd K(u)[J(x - hTU) - f(x)]du


_ hiT 1 IRd
K(u)
.+
J1
L 31
u1 , ...
id 8(k)f
u d, i 1 d (x - BhTU)du
+ . _k)1 . . .. )d.8x 1 ... 8X'..d
... Jd-

where 0 < B < 1. Now using again (4.3) we obtain

bT(X) = hiT 1 IRd


K(u)
.
J1
L
+ +. ... Jd -k
-
uh
)1· ··
u jd
1,· · · d,
· )d·
[ 8(k) f
31
8x 1 ··
id
.8xd
(x - BhTU)

8(k) f ]
- .J1 . (x) du
8x 1 .. . 8x Jd
d
and (4.6) implies
(4.15)
where

thus (4.14), (4.13) and (4.15) yield

E(fr(x) - f(X))2 = 0 (~) + 0 (h¥) = O(T-r/(r+d)) .•


ThT
4.2. OPTIMAL AND SUPEROPTIMAL 93

4.2.2 Optimal rate


We now show that under mild mixing conditions the kernel estimator has at
less the same rate, as in the Li.d. case. This rate will be called "optimal rate".

In the following 9. ,t = fx.,x, - f ® f . We state the main assumptions

A(r,p) - There exists r E BJR2 containing D = {(s, t) E]R2 : s = t}


and p Ej2, +ooj such that

a) 9.,t exists for (s, t) rt r,

. 11
c) hmsup -T
T--++oo [o,TFnr
dsdt = fr < 00 .

Mb,f3) - a(2)(lt - sl) :S lit - sl-,8; (s,t) rt r


where I > 0 and 13 > o.

The following lemma furnishes an upper bound for the variance of fT.
p-1
LEMMA 4.1 If A(r,p) and Mb , f3) hold for some r,p, 'Y, 13, with 13 ~ 2--
p-2
then
v Jr(x) :S _1 E
Th~
[~K2
h~
(x -hTXo)] .2.T r i[O ,Tj2nr
dsdt
(4.17)
+ (211 KII~ 8 (r) + 811:_1I~ I) :!~
p

p 2d 1
where q = - - and 1) = -(1 - -) - d ~ O.
p-1 q 13
Proof
Let us consider the decomposition

Th~VJr(x) = r
i[O,T]2nr
Cov (K (x -hXT ' K(~-hTXt)) Th~
s)dsdt
(4.18)

The first integral can be bounded above by

( 4.19) IT := h1d EK 2
T
(x- -h -o ) . T 1
T
X 1
[O,TJ 2 nr
dsdt.
94 CHAPTER 4. KERNEL DENSITY ESTIMATION

Concerning the second integral, we may use A(r, p) and Holder's inequality
with respect to Lebesgue measure for obtaining

(4 .20) I (K (--,;;;:-
COY
X-Xs) ,K (X-Xt))1
~ (2d)/q II K IIq Dp(r),(s,t) Il"r,
~ hT 2 .

thus, setting ET = {(s, t) : It - r


sl ~ h (2d) /Q.6 }

we get

Jr : T~~ iO,T)2nr nET


c COY [K (X ~TXs ) ,K ( X ~TXt ) ] dsdt
~ -;. r
ThT }O$S"5,t$T,t_s$h:;.(2d)/Q{3
II K II~ h¥d)/QDp(r)dsdt,
hence
(4.21)
On the other hand, Billingsley's inequality (1.11) yields

' .
JT'

hence
(4.22)
J' < 811 K II~ / h«2d)/q)(1-t)-d
T- ,6-1 T '

and finally (4.19), (4.21) and (4.22) imply (4.17) . •

We are now in a position to state the result

THEOREM 4.2 (Optimal rate).


p-l
1) II A(f,p) and M(T,,6) hold lor some f , p,/,,6 with,6 > 2 - -2 , I is
p-
continuous at x and Th~ --> +00 then

(4 .23) limsupTh~Vfr(x) ~ f r / (x)jK 2;


T-+oo

il I is bounded, then

(4.24) limsup sup Th~Vfr(x) ~ er II f 1100 jK2.


T-+oo xElRd
4.2. OPTIMAL AND SUPEROPTIMAL 95

2) If in addition f E C~(f)(r = k + >.) and if hT := cTT-l/(2r+d) where


CT -> C > 0, then

(4.25) limsup sup T 2r /(2r+d) E(fT(X) - f(X))2 S C


T--++oo xElRd

h 1100 J K2
C = fr II f cd
were

+c2r (2: it + . . +jd =k j/.jd! JRd II u II>' IUll j, .. . ludljdIK(u)ldu) 2.


Proof
1) Using (4.17) and noting that here T} is strictly positive and that

lim
T--+oo
h~EK~T(X - Xo) = f(x) / K2

we get (4.23). Concerning (4.24) it suffices to note that

2) From (4.15) we deduce that

(4.26) sup IEfr(x) - f(x)12 S cZr)h¥


xElRd

where c(r) is given by (4.16) . Thus (4.25) is a straightforward consequence


of (4.24) and (4.26) . •

If (3 = 2P -
1 (in particular if P = +00 and (3 = 2) the same rates are valid
p-2
but with a constant greater than C.

In order to show that the above rates are achieved for some processes, let
us consider the family X of processes X = (Xt , t E lR) which satisfy the above
hypothesis uniformly, in the following sense : there exist positive constants
fo, Lo , bo , "10, (30 and Po such that for each X E X and with clear notations

• II fx Ii00S fo

• -.!:.. r
T J[O,Tj2nrx
dsdt :s: Lo (1 + Lo)
T
• Px =Po > 2 and bpo(rx):S: bo
Po -1
• I :s: "10 and (3 2 (30 > 2 - - 2 .
Po -
96 CHAPTER 4. KERNEL DENSITY ESTIMATION

Then we have
COROLLARY 4.1

(4.27) lim max sup


T~+oo XEX XElRd
Th~Vxh(x) = L O foJK 2
where Vx denotes the variance if the underlying process is x.
Proof
An easy consequence of (4.17) is

(4 2. 8) lim sup max sup Th~Vxh(x):S L OfoJK 2.


T~+oo XEX xElRd

It remains to exhibit a process X in X such that

(4 .29) sup
xElRd
Th~Vxh(x) T_=
-+ Lo foJK 2

To this aim we consider a sequence (Yn , n E Z) of Li.d. JRd-valued random


variables with a density f such that II f 11 00 = f(xo) = fo for some Xo· Now let
us set
(4 .30) X t = Yrt/Lol , t E R ,

then X belongs to X with r = UnEz {(s,t) : [;J = [;J = n}, C r = L o,


bpo(r) = 0 and a(2)(lt - 81) = 0 if (8, t) Ere .

Now for that particular process fT takes a special form, namely

(4.31) f () = [T/Lolf' ()+T-LolT/LolK(X - YrT/Lol)


T x T/Lo T x Th~ hT

where iT is a kernel estimator of f associated with the LLd . sample


Yo, . . . , YrT/Lol- l.

Then from (2.12) it is easy to deduce that

[T/Lolh~ sup ViT(X) -+ foJK 2


xE lRd

thus
ThT
d sup V h(x)
'
x ElRd
-+ Lofo JK.
2

On the other hand

sup
xElRd
V (T - Thd
Lo [T/Lol K (X - YrT/LOI)) < II K II~ L2
T
h - T2h2d
T T
0
4.2. OPTIMAL AND SUPEROPTIMAL 97

J
and finally
(4.32) Th~ sup Vxfr(x) -+ Lofo K2
xElRd

which implies (4.27) . •

It should be noticed that the process X is not stationary.


COROLLARY 4.2 Let Xl = {X : X E X, fx E O~(e)}. The choice
hT = cT T-I/(2r+d) where CT ---> C > 0 implies
(4.33) limsup sup sup T 2r /(2r+d)Ex(fT(X) - f( ;r))2 SO'
T->+oo XEX, xElRd
L ofoJK 2
where 0' -- cd + ere
2 2
(r)'

Proof: Clear . •

The next theorem emphasizes the fact that the kernel estimator achieves
the best convergence rate in a minimax sense.
THEOREM 4.3 Let fT be the class of all measurable estimators of the den-
sity based on the data (X t , 0 S t S T) then
2
(4.34) lim inf )nf sup T2.Td Ex
2r ( _
fr(x) - fx(x)
)
> 0, x E JRd .
T->+oo fTEFT XEX,
Proof (sketch)
Let Xo be the class of processes X = (Xt , t E JR) JRd··valued and such that
Xt = y[t/Lo], t E JR
where (Yn , n E Z) is a sequence of LLd. r.v.'s with a density f belonging to
C~(e) and such that X EX.

If iT E fT then it induces an estimator f[T ] which belongs to the family


frr ] of the measurable density estimators based on Li.d. data Y1 , · · · , y[T/L o]'

Conversely each estimator ftT] E frTJ generates iT E fT by setting

iT(X; Xt, 0 S t S T) = /tT] (x; Xl,' .. , X[T])'


Now we clearly have
2r ( _ ) 2
A T := _inf sup T2.Td Ex fr(x) - f(x)
hEFT XEX

2: _inf sup T~Ex (iT (x) - f(x))2


hEFT XEX,

= • inf • sup T~E (Nr] (x) - f(x)f =: BT


fIT] EFITJ fEC~(l)
98 CHAPTER 4. KERNEL DENSITY ESTIMATION

therefore
lim infT-->+oo AT 2: lim infT-->+oo BT .

Now applying a theorem of Ibragimov-Hasminski [IB-HAJ (in fact a d-


dimensional version of this theorem) we obtain

liminf BT > 0
T-->+oo

hence (4.34) . •

An easy adaptation of the above proof should provide an analogous mini-


max result for the discrete case.

Finally let us indicate that, like in the discrete case (see 2.2), similar results
may be obtained replacing A(f,p) by

A/(r) - gs,t exists for (s , t) tt f and is Lipschitzian uniformly with respect to

.11
[ (s, t), where r satisfies the condition
lim sup -T
T-+oo [O,Tj2nr
dsdt < +00.

In that case the condition f3 > 2(p - 1)/(p - 2) is replaced by the weaker
2d+ 1
condition f3 > -d--'
+1

4.2.3 Superoptimal rate


The following theorem produces a surprising result : if the distribution of
(Xs, X t ) is not too close to a singular distribution for Is - tl small then fr
. 1
converges at the "superoptlmal rate" T
Processes for which the rate T- 1 is reached will be called "irregular paths
processes" .

THEOREM 4.4

1) If gs,t = glt - sl exists for s i: t, if


(y,z) ...... r Ig,,(y,z)ldu is defined for any (y , z)
l]o,+oo[
E ]R2d is bounded

and is continuous at (x , x) then

(4.35) lim sup T .Vfr(x) :s 2(/ IKI)2 roo Ig,,(x,x)ldu


T-++oo 10
4.2. OPTIMAL AND SUPEROPTIMAL 99

2) If g.,t = glt-sl exists for s i= t, if u ->11 gu 1100 is integrable on ]0, +oo[


and if gu is continuous at (x, x) for each u > 0 then

(4.36) T· Vfr(x) -> 2 Jo


r+ oo
gu(x,x)d7L.

Proof
1) Using (4.11) and the stationarity condition g.,t = glt - sl we get

(4.37) T· V fr(x) = 2 loT (1 - ;;) Cov (KhT(X - Xo), KhT(X - Xu)) du .

Now for each u > 0


Cov (KhT(X - Xo), KhT(X - Xu)) =
(4.38)
r
JllI. 2d
KhT(X - y)KhT(X - z)gu(y,z)dydz

therefore
TV fr(x) ::;
(4 .39)
2 f IKhT(x - y)KhT(X - z)1 (10+00 Igu(Y, z)ldu) dydz
taking limsup on both side and applying Bochner's lemma we obtain (4.35).

2) Since (u, y, z) >--+ KhT(X-y)KhT(X-Z) (1 - ;;) gu(Y, Z)I[O,T)(U) is integrable


we may apply Fubini's theorem to (4.37) and (4.38) leading to
TVfr(x) =
(4.40)

Now

110+
00
gu(Y, z)d7L - loT (1 - ;;) gu(Y, Z)d7L1 = Ihoo
gu + loT fgUI
: ; h= II g,. 11= du + loT f II gu 11= du; (y , z) E JR2d; then, the integrability of
II g,. 1100 and the dominated convergence theorem show that the bound vanishes
as T -> +00.
Hence
TVfr(x) =
(4.41 )
100 CHAPTER 4. KERNEL DENSITY ESTIMATION

Now the dominated convergence theorem entails that (y, z) ....... It'" 9v.(Y, z)du
is continuous at (x,x) and finally Bochner's lemma implies (4.36) . •
COROLLARY 4.3
1) If assumptions of Theorem 4.1 hold for each x,
ifC = sup
Jo
xElRd
roo
19v.(X,x)ldu < +00 and if f E C~(f) (r = k + A) and
K E Hk,).., then the choice hT = O(T- 1/ 2T ) leads to

(4.42)
T-.+oo
limsup sup TE(fT(x) - f(x))2
xElRd
~ 2GJK 2.

2) If assumptions of Theorem 4.2 hold, f E C 2,d(b) and K is a (positive)


kernel then hT = 0(T-l/4) implies

(4.43) T E(fT(X) - f(x)? r-::;:' 2 Jo


roo gv.(x,x)du.
3) More generally if assumptions of Theorem 4.2 hold, f satisfies an Holder
condition of order A (0 < A ::; 1) and K is a (positive) kernel then
hT = O(e- T ) implies (4.43).
Proof: Clear . •

The various choice of hT in the above corollary allow to obtain asymptotic


efficiency. Note however that, from a practical point of view, these choice are
somewhat unrealistic since they do not balance variance and (bias)2 for finite T.

On the other hand it is interesting to note that a choice of hT which would


give asymptotic efficiency for all continuous f is not possible since the bias
depends on the modulus of continuity of f. An estimator which captures this
global efficiency will appear in Chapter 6.

Let us now give examples of applicability of Corollary 4.3 .


Example 4.1
Let (X t , t E IR) be a Gaussian real stationary process with zero mean and
autocorrelation function
(4.44)
when u --> 0, where 0 < B < 2.

Then it is easy to verify that

(4.45) Igu(x, y)1 ~ alp(u)11Iul>b + (c + dlul- 4) 1IuI9,u;",0


4.2. OPTIMAL AND SUPEROPTIMAL 101

where a, b, c, d are suitable constants.

Consequently, conditions in Corollary 4.3 are satisfied as soon as p is inte-


grable on ]0, +00[.

Example 4.2
Let (Xt, t 2: 0) be a real diffusion process defined by the stochastic differ-
ential equation

(4 .46)

where Sand (j satisfy a Lipschitz condition and the condition

1 = 1(S) = l (j-2(X) exp {21X S(Y)(j-2(Y)dY } dx < +00


and where (Wt, t 2: 0) is a standard Wiener process.

It may be proved that such a process admits a stationary distribution with


density given by

Moreover, under some regularity assumptions on Sand (j, the kernel estimator
of J reaches the full rate ~ . In particular if Xo has the density f , conditions
of Corollary 4.3 are fulfilled (see [KU] and [LED .

Actually it is possible to obtain a minimax result. For this purpose let us


introduce a vicinity of the model (4.46) defined as

Here ~ E U6 = {~ : ~ is measurable and II ~ 1100< 8} .


Now let us denote S the class of functions S such that (4.47) has a unique
solution and sup 1(8 +~) < 00 . The corresponding expectation will be
f:>EU6
denoted Ef:> and the corresponding density function ff:> . Finally put

J = {4J 2( )E (l{x>X} - F(X))2}-1


x (j(X)f(X)

where X denotes a r.v. with density f and F denotes the distribution function
associated with f. We have :
102 CHAPTER 4. KERNEL DENSITY ESTIMATION

THEOREM 4.5 (Y. KUTOYANTS)


If S E Sand J > 0 then

lim lim )nf sup TEe,. (fr(x) - h(x))2 2': r1.


6!O TToo hEFT HEU.
Proof: See [K] . •

Now it can be proved that fr reaches the minimax bound J- 1 . Furthermore


if condition in Theorem 4.4.2 hold we have

(4.48) r 1= 2 fooo gu(x, x)du ;


we refer to [K], [LE2] and [V] for details.

4.2.4 Intermediate rates


It is natural to formulate the following problem : what are all the possible
rates for density estimators in continuous time? We give a partial answer in
the present subsection.

We begin with a proposition which shows that, in some sense, conditions in


1
Theorem 4.4 are necessary for obtaining the superoptimal rate T .

THEOREM 4.6 Let (Xt, t E JR) be a JRd-valued process such that

(a) g8 ,t = glt-81 exists for s =1= t and 1+00 II


Uo
gu 1100 du < 00 fOT tiD > o.

(b) f is continuous at x and f(xo,x u ) is continuous at (x, x) for u > o.

(c) foUl f(xo,x u ) (x, x)du = +00 for U1 > o.


Then if K is a strictly positive kernel

(4.49) lim TV fr(x) = +00 ,


T-oo
and consequently T E (fr(x) - f(x))2 -> +00.
Proof
We first consider the integral

IT = 2JII.2d
( KhT(X - y)KhT(X - z) {T
Juo
(1 - ~)
T
gu(Y , z)dudydz, T> Uo .
4.2. OPTIMAL AND SUPEROPTIMAL 103

Using (a) we obtain the bound

IITI ::; 2 1+00

Uo
II 9u 1100 duo

On the other hand, (b) implies

lim ( KhT(X - y)KhT(X - z)f(y)f(z)dydz = f2(x).


hT-O JR,2d
Then, by using (4.40) we get
TVfr(x) =
2Jo("0 (1-~)
T
du ( KhT(X -
JR,2d
y)KhT(X - z)f(Xo,Xu)(y,z)dydz + 0(1).

Now, since T ~ 2uo implies 2 (1 - ~) ~ 1 it suffices to show that JT ---+ 00


where

Since the integrand is positive we may apply Fubini's theorem for obtaining

where the inner integral is finite for >.2d-almost all (y, z) .

Now by considering the affine transformation (y, z) H (x - hTV, X - hTW)


and by using the image measure theorem (see [RAJ) we obtain

Jr = { K(v)K(w)dvdw f(Xo,X u )(x - hTV,X - hTw)du.


JR.2. Jo
{UO

We are now in a position to conclude : Firstly (b), (e) and Fatou 's lemma
imply

for >.2d
lim
T-oo l
0

almost all (v, w).


UO f(Xo ,Xu )(x - hTV,X - hTW)du= +00

Applying again Fatou's lemma we get JT ---+ 00 thus


TVfr(x) ---+ 00

and the proof is therefore complete. •

In the Gaussian case we have the following


104 CHAPTER 4. KERNEL DENSITY ESTIMATION

COROLLARY 4.4 Let (Xt, t E JR) be a real stationary Gaussian process,


continuous in mean square and such that
(aJ ICov(Xo, Xu)1 < V X o, u> 0 and
+00
1
ICov(Xo, Xu)ldu < 00, Uo > O.
UD

Then if K is a strictly positive kernel we have

In particular if (X t ) has differentiable sample paths then


T Vfr(x) --+ +00.
We see that, at least for Gaussian processes, the "full rate" is closely linked
with the irregularity of sample paths. It is interesting to note that, in order to
reach the full rate, continuity of (X t ) is not required.
For example if the autocorrelation satisfies

(4.51) '" 1
1 - p(u) =u~o( +) ILog(u)!l-.a' 0 < (3 < 1

(X t ) is not a.s. continuous (see [AD]) but V fr(x) ~ T1 provided (a) is satisfied.
Finally note that, using Theorem 4.2 one can construct an estimator such
that T 1 -e E(fr(x) - f(x)? --+ 0 (c > 0) a soon as the Gaussian process (X t )
satisfies mild mixing conditions. We will give a more precise result in the next
subsection.

Proof of Corollary 4·4


We may and do suppose that EXo = 0 and EX5 = 1 and we put
p(u) = E(XoXu) .

Let us set cp(u) = (1 - P2 (u) ) - 1/2 , u > 0 then we have


<p(u) (<p(U)2 2
exp --2-(Y - 2p(u)yz + z) ;
2 )
f(Xo ,Xu)(Y, z) = ~

(y,z) E JR2d, u > O.

Here condition (c) in Theorem 4.5 may be written

t"
Jo
cp(u) exp (_
2rr
x 2(
1+P
») du = +00
u
4.2. OPTIMAL AND SUPEROPTIMAL 105

(ILl
which is equivalent to io 'P(u)du = +00 since limlL->oP(u) 1 by mean
square continuity. Thus we have clearly the first implication.

Now it is easy to check that l oIL1 'P( u)du < 00 implies lIL1 II 9u 1100 du < 00.

21=
I)

Then Theorem 4.4 entails TV hex) -+ 91L(X, x)du < 00, hence the second
implication.

Finally, if (X t ) has differentiable sample paths, they are differentiable in


mean square too (see [IB]-[HAJ) and consequently

E (Xu -
u
XO)2 --+EX(?
U--+O

Condition (a) implies EX~2 > 0, then

which implies io
{U 1
(E(Xu - X o ?r l/ 2du = +00 and therefore
T· Vh(x) -+ +00 . •

1 1
We now give sufficient conditions for rates between - / +d and -. We will
Tr r T
use conditions A'(p) where p E [1, +00] defined by
A'(p) - 9s,t exists for s f= t, II 9s ,t lip is locally integrable and

lim sup ~ { II 9s,t lip dsdt = G p < +00.


T->+= J] I) ,T]2

Note that if 9s,t = 9lt-sl we have

1 {
-T
i]O,T]2
II 9s,t lip dsdt = 1 {T
-T
Jo
(1 - -TU) II gu lip du
so that II 9u lip is integrable over ]0, +00[.
Then A'(p) is fulfilled with G p =
2 It"" II gIL lipduo In particular assumptions in Theorem 4.2 imply A' ( +00).
On the other hand if JIL~oo II gIL IiI du < +00 for some Uo > 0, A'(l) is satisfied
since II gIL 111 S; 2.

We now state a result which links the convergence rate with A' (p).
106 CHAPTER 4. KERNEL DENSITY ESTIMATION

THEOREM 4.7
1) If A'(p) holds for some p E [1 , +001 then

(4 5. 2) limsupTh¥d) /PVfr(x) :S Gp II K II~


T_=
where q = p/(p - 1).
2) If in addition f E C~(f) (r = k + ,\), K E Hk ,A and hT = cT T-p/(2pr+2d)
(CT ---+ C > 0) then

limsuPT_= Tpr/(pr+d)E(fr(x) - f(x»)2


(4 5. 3)

Proof

1) We have

V fr(x) = T2~}d lO,T]> dsdt [l 2d K (x h~ u) K (x h~ v) g.,t(u , V)dUdV] .

Applying Holder inequality in the inner integral we get

Vfr(x):s
1
:S T 2h}d
( [
JIR2d Kq
-u) Kq (xh:;-
(xh:;- -v) dudv ) l/q [
J[O,T]> II g.,t lip dsdt
1 II K II~
:S T h2d-(2d)/q -T
T
1 l[O ,T]>
II g. ,t lipdsdt

hence (4.52) .

2) Clear. •

Note that the optimal rate is reached for p = 2 and the parametric rate for
p = +00 . If p = 1 one obtains the same rate as in Theorem 4.1.Note however
that each of these rate is not necessarily the best one when A'(p) holds.

We complete this section with an example which shows that if the observed
process is nonstationary any rate is possible. Consider the process

(4.54) 11' t
( '2 - k"Y) (11' t - k"Y )
X t = Yk cos (k + 1)"Y _ k"Y + Yk+l sin '2 (k + 1)"Y _ k"Y ;

k"Y :S t < (k + 1)"Y, k E Z; where 'Y is a strictly positive constant and where
(Yk, k E Z) is a sequence of i.i .d . real Gaussian zero mean r.v.'s with variance
4.2. OPTIMAL AND SUPEROPTIMAL 107

a 2 > O. The observation of (Xt ) over [0, T] is in fact equivalent to the ob-
servation of Yo, ... , Y[Tlh] and the best rate is T-1h since an asymptotically
optimal estimator is

-fT(X) = ---exp
1 ( - 1- 2 x2) ,x E I~
ST..j2; 2 ST
[T'h]
where ST = 1 ~ X·
[Tlh]+1 ~ J'

Note that the kernel estimator remains competitive since here r may be
chosen arbitrarily large.

Finally, I being any strictly positive number, we have a family of processes


for which any rate of the form T-1h is attained.

4.2.5 Minimaxity of intermediate rates


We now consider a family of intermediate rates which are minimax in a specific
sense.
Let X* be the family of Rd valued process X = (X t , t E JR) such that the Xt's
have a common bounded density f E C~(£) with f(x) " 0 where x is a fixed
point in Rd.

Let us set
'Pa,{3(U) = u{3(Logu)-a , u> 1
where 1 ::; (3 ::; 2 if a = 0, 1 < (3 < 2 if a E R - {a}.

We will say that the kernel estimator has the rate ['Pa,{3(T)r~ if, for
hT = ['Pa,{3(TW~, we have

Iimsup['Pa,{3(T)]~Ex(fT(X) - f(x)f < 00,


T-+=

and

Now to each function 'Pa,{3 and each M > 0 we associate the following subfam-
ily of X* :

X(a, (3, M) = {X E X* :11 f 11=::; M and fr has the


rate ['Pa,{3(T)r~ at each x E R d }.

The next theorem shows that this rate is actually minimax over X(a, (3, M) :
108 CHAPTER 4. KERNEL DENSITY ESTIMATION

THEOREM 4.8

liminf )nf sup [CPa,{1(T)]~Ex(/T(x) - f(x)? > 0, x E lR d


T--->oo lEFT XEX(a,{1,M)

where :FT denotes the class of all measurable density estimators based on the
data (Xt, Os t S T) ..

In particular Theorem 4.7 implies minimaxity of the optimal rate (d.


4.2.2) and the superoptimal rate (d. 4.2.3) since they correspond to the
respective choice a = 0, (3 = 1 and a = 0, (3 = 1 + ~.
2r
As a byproduct we can exhibit the rate associated with a Gaussian process:

COROLLARY 4.5 Let X = (Xt, t E lR) be a real zero-mean measurable

1
stationary Gaussian process which is derivable in mean square and with auto-
+00
correlation pO such that Ip(u)1 < 1 and Ip(u)ldu < 00 for some Uo > 0.
Uo

Then, if K is a strictly positive kernel, we have


T T
0< liminf L TExUT(x)- f(x))2 S lim sup L TEx(h(x)- f(x)? < 00 ; x ER
T--->oo og T---> oo og

The rate LifT is minimax in the above sense. Proofs of Theorem 4.7 and
Corollary 4.5 which are very technical are omitted. They appear in [BK-B02].

4.3 Optimal and superoptimal uniform


convergence rates
For the study of uniform convergence we need a Borel-Cantelli type lemma
for continuous time processes :

LEMMA 4.2 Let (Zt, t ~ 0) be a real continuous time process such that

(a) For each "1 > 0, there exists a real decreasing function CPr" integrable on
lR+ and satisfying

P(IZtl > "1) s CP1/(t) ,t > °,


(b) The sample paths of (Zd are uniformly continuous with probability 1.
Then
lim ZT = 0 a.s ..
T-+oo
4.3. OPTIMAL AND SUPEROPTIMAL 109

Proof
First let (Tn) be a sequence of real numbers which satisfies
Tn+l - Tn ~ a > 0, n ~ 1 where a is some constant.

Since 'Pry is decreasing we have

thus L:n 'Pry(Tn) < +00 and the classical Borel-Cantelli lemma yields
P (lim;uP{IZTnI > 1]}) = 0,1] > 0 which in turn implies ZT" -+ 0 a.s.

Let now (Tn) be any sequence of real numbers satisfying Tn i +00.


To each positive integer k we may associate a subsequence (T~k)) of (Tn)
defined as follows :

T 1(k) T n, h
were nl = 1,
TJk) Tn2 where Tn2 - Tn, ~ t, Tn2 - 1 - Tn, < t,

The first part of the current proof shows that ZT~k) p=:;: 0 a.s. for each k. Now
let us set
no = {w : t f-+ Zt(w) is uniformly continuous, ZT(k)
p
-+ 0, k ~ I},

clearly p(no) = 1.

Then if wE no and 1] > 0 there exists k = k(1] ,w) such that It - 81 :s k1 im-
plies IZt(w) -Zs(w)1 < ~ . Consider the sequence (T~k)) : for each p and each n

such that np :s n < n p+1 we have ITn - Tn pI < ~, hence iZTn (w) - ZTnp (w)i <
1]
2
Now for p large enough we have iZTnp (w)i < ~ and consequently IZTn (w)1 <
1] for n large enough. This is valid for each 1] > 0 and each tV E no, thus ZTn -+ 0
a .s . •
110 CHAPTER 4. KERNEL DENSITY ESTIMATION

4.3.1 Optimal rates


We make the following assumptions

• A(f,p) holds for some (f,p),

• f is bounded and belongs to C~(C) , r = k + A,


• K E H k ,). and K = K~d where Ko has compact support and continuous
derivative,

• hT = CT ( --r
logT)~ (CT ~ C > 0) .

We first derive upper bounds for P(IZTI > 7]) where


ZT = log~ T CO~T )2iTI (fr(x) - Efr(x)) .

LEMMA 4.3
p - 1 7r +
1) Ifa(u)~,u-{3,u>Owhere{3>max ( 2p_2'~
5d)
then
A
(4 .55) P(IZTI > 7]) ~ T!+j.L ,7] > O,T ~ 1

where A and J.L do not depend on x.

2) If (X t ) is GSM then

B
(4 .56) P(IZTI > 77) ~ TG(logm T)2 ,7]> 0, T ~ 1

where Band C do not depend on x.

Proof
We may and do suppose that CT = 1 and 7] < l.
1) Let us set

(4 .57) Yjn = 11
"6 j6

(j-l)6
Khr(X - Xt)dt; j = 1, .. . ,n

where nl5 = T , n = [T] (T:::: 1) and consequently 2 > 15 :::: l. Thus we have

1 n
(4.58) fr(x) - Efr(x) =- 2:)Yjn - EYjn).
n j=l
4.3. OPTIMAL AND SUPEROPTIMAL 111

In o,d" to apply inoquality (1.26) we haw to ",udy V (t, ljn). Th 'hi>;

aim we may use inequality (4.17) in Lemma 4.1 with po instead of T and p'
instead of p for convenience. We have readily

v (~ (p6 Khr(x _ Xt)dt) ~ ~E [~K2 (~- Xo)]


po 10 poh hT hT T
(4.59) 14(1-.1 )-d
.~ r
po J[0 ,p6j2nr
dsdt + (2 II K II~' op,(r) + 8II (3K- IIZx,1 .r:) hf pohT
:

where q' = p' j(p' - 1).


p' -1
Therefore, since (3 > 2 - - we have
p' - 2

where a = a(K, 11 f 1100 , d, /, (3) does not depend on x.

Consequently

(4.60) v (tYjn) ~ a~
j=l oh T

then noting that IY jn - EYjnl ~ 2 II K1100 hTd we obtain


v2(q) < ~ + 11 K 1100 c E > O.
- poh~ h~'

Now choosing p = [c- 1 o- 1 j we get

2 c
v (q) ~ Ao hd
T

where Ao is a positive constant.

Therefore, substituying in (1.26) we arrive at

P(lh(x) - Eh(x)1 > c) ~ 4exp (- 8~o cqh~ )


( 4.61)
+22 ( 1 +
8 II K
d
II 00 ) 1/2 qa(p) =: UT + VT .
EhT
112 CHAPTER 4. KERNEL DENSITY ESTIMATION

Now we choose 10 = lOT = hT(logm T)T/ (T/ > 0) and we notice that
n EnO lOT
q = 2p 2: -2- = 2' hence

thus
(4.62)

We now turn to the study of VT . Using the elementary inequality


(1 + w)1/2 :s 1 + W 1/ 2 we get
(4 .63)

then, after some easy calculations, we obtain


/2 fJ
VT :s C3T/2+ fJ T-
I 2r13 - 3r - 3d
4r+2d (logT)
r+{3;;1
4r+ (logm T)l + ,

7r + 5d
and since {3 > - - - we have the bound
2r

(4.64) <~
vT_ THJ-L (C4 > 0, /J > 0).

If TJ > 1 it is easy to see that C4 must be replaced by C4TJ HfJ . Collecting (4.62)
and (4.64) we arrive at (4.55).
2) If o{) tends to zero at an exponential rate (4 .62) remains valid but (4.64)
may be improved. From (4.63) we derive the bound

VT :S csQ'Ye-fJ'P + C6cl/2h:;d/2q,e-fJ'P
(4.65)
:S C7exP (-C sT2r'.;.d -<)
where C7 and Cs are strictly positive and ( > 0 arbitrarily small. Consequently
the bound in (4.62) is asymptotically greater than the bound in (4.65), hence
(4.56) . •

The next lemma shows that (ZT) satisfies condition (b) in Lemma 4.2 .

LEMMA 4.4 (ZT) satifies the uniform Lipschitz condition

(4.66) sup IZT(x,w) - Zs(x ,w)1 :S AIT - SI;


xElRd,wEn

T> 1, S> 1; where A does not depend on (x,w, S, T).


4.3. OPTIMAL AND SUPEROPTIMAL 113

Proof
We only prove (4 .66) for

WT = log~T CO~T) ~ fr(x); T> 1


A
and with the constant 2' since the result for EWT is an easy consequence of
this one, because

IEWs - EWTI :S EIWs - WTI


:S sup IWs - WTI·
w ,x

Now we put

logWT = UT + VT

where UT = -logm+l T (2rr+d)


+ d log (T)
+ logT + log T 1

and VT = log (foT K (x ~TXt) dt) where the integral is supposed to be

positive.
The derivative UIr of UT is clearly a 0 (~ ) .
Concerning V';' first we have

Noting that

aKa (Xj - Xt ,j) = _ hr( . _ X -)K' (Xj - Xt ,j)


aT hT h} x J t,J a hT j = 1, .. . ,d

and that for some CK, Kh(u) = 0 if lui:::: CK we obtain

(4 6. 7) IaKa
aT
(Xj - Xt,j)
hT
I
Ihrl
:S h} ahT II Ka 1100 ; J
I -
= 1, -.. , d.
From (4.67) it is easy to deduce that

and finally
114 CHAPTER 4. KERNEL DENSITY ESTIMATION

I Cl C2
(log W T ) ::; T + rT K (
x-x, ) dt .
Jo hT

Using the relation Wr = WT(log WT )' it is then easy to find that

f:
where C is constant. Thus Wr is bounded hence (4.66) . Clearly the result
remains valid if K (x 1:;' ) dt = O. •
We are now in a position to state a first consistency result :

1 7r + 5d)
THEOREM 4.9 If a(u) ::; ,,(u- f3 , "( > 0, f3 > max ( 2 pp _- 2' ~ then

(4.68) -1- - -
logm T log T
(T)rrT.r Ih(x) - f(x)1 --->
T-oo
0 a.s.,

m~l,xElRd

Proof
(4.55) implies
P(IZT(x)1 >1J) = o (Tl+ .. )
and (4.66) implies condition (b) in Lemma 4.2.
Hence (4.68) by using Lemma 4.2 . •

We now state a uniform result

THEOREM 4.10 If (X t ) is GSM then

(4.69) sup IZT(x)I-.O a.s.


IIxll:5ra

m ~ 1,a > O.
Proof
Since K is clearly Lipschitzian we may use a method similar to the method
of the proof in Theorem 2.2 : we t ake as II . IIthe sup norm and we construct
~ ±SH!
a covering of {x :11 x II::; T a } with vf hypercubes where VT '" Ta+~. Thus
we have
(4.70) sup IZT(x)l::;
IIxll:5Ta
sup
l:5j:5 v::'
IZT(xjT) 1+ 0 ((1ogIT)w)
4.3. OPTIMAL AND SUPEROPTIMAL 115

where the XjT'S are the centers of the hypercubes and where w > O. Using
(4.56) we obtain

On the other hand (4.66) shows that T t-t SUPlSjSv~ IZT(xjT) is uniformly
continuous for each w since A does not depend on (x, w). Consequently we may
apply Lemma 4.2 and we obtain (4.69) from (4 .70) . •
COROLLARY 4.6 (Uniform optimal rate.)
If f is ultimately decreasing with respect to II . II, if (Xt} is GSM, if SUPO<t<T II
Xt II is measurable for each Tand if E( sup II X t Ila) < 00 for some-a-> 0
OSt:51
then
(4.71 ) -11 T (T)~
-1T sup Ih(x) - f(x)1 -> 0 a.s.
ogm og xElRd

Proof
Since f is ultimately decreasing we claim that limllull->oo II u I f(u) = O.
To prove this it suffices to note that for R large enough

r
JR/2SlIvIiSR
f(v)dv ~ f(eR)ad Rd

where eR denotes a vector such that IleR11 = 2R and ad is a positive constant.


Hence it is easy to check that

(4.72) _1_ (~) 2r+d sup f(x) -> 0,


logm T logT II x ll>Ta/2

thus from Theorem 4.10 and (4.72) we deduce that it suffices to show that
sup IZT(x)1 -> 0 a.s ..
IIxll>Ta/2

To this aim we first note that sUPOStST II Xt II S T~a and I x II> T 2a imply
x-X T2a
II ~ II> 2hT' OStST.
Now let CK be such that K(u) = 0 if II u II~ CK and let To such that r:; > CK
for, T :2: To.
X - Xt)
We have K ( h:;- = 0 for T:2: To, hence

{ SUP
09ST
II Xt liS T2a,
2
II x II> T2a} => { sup
Ilxll>T2a
IZT(x)1 = o} .
116 CHAPTER 4. KERNEL DENSITY ESTIMATION

Therefore for T large enough

P (SUPllxll>T2a IZT(x) 1> 77) ~ P (sUPO~t~T II X t II> T~a)


~ P (sUPO~t~T II X t II> Ta) , 77 > O.
Now since T -+ sup IZT(x)1 is uniformly continuous for each w, we may
Ilxll >Tn
apply Lemma 4.2, hence (4 .71) . •

4.3.2 Superoptimal rate


We now state a result which shows that a full rate is also reached in the setting
of uniform convergence.

We consider the hypothesis :

then we have
THEOREM 4.11 Under the conditions of Corollary 4.6 except that A(f,.\)
is replaced by H and that hT rv T-'Y where ~ "I < fr
we have for all m ~ 1 id,
(4.73) -11 T
ogm
(T)t
-1
og
T sup Ih(x) - f(x)1
xEIRd
-+ 0 a.s.

Proof
As in the proof of Lemma 4.3 we consider decomposition (4.58) and we
apply inequality (1.26) .
p

The main task is to evaluate the variance of LYJn. First (4.39) yields
j=l

p8V (:8 loPD Khr(x - Xt)dt)

r IKhr(x -
~ 2 JIRd y)Khr(x - z)1 r CXJ Igu(y,z)ldudydz
Jo
~ 2 Jo+CXJ II gu 1100 du ( / IKI) 2 =: M

thus
4.3. OPTIMAL AND SUPEROPTIMAL 117

therefore
v 2 (q) < .J:.... M + II K 1100 c
- p8 h~

then, choosing p = [h~c-1l + 1 we obtain


v 2 (q) S (2M+ II K 1100)ch rd.
Consequently

exp ( - 8V~~q) q) S exp ( - 8(2M + ~I K 1100)ch~q) .


We now choose c = (lO~T) 1/2 (logm T) 1/2TJ (TJ > 0)
n T
and we note that q = -2p = -2r hence
pu

exp ( - 8V~~q) q) S TcI:g~ T (c> 0)

where c depends only on M, II K 1100 and TJ·

Concerning the second term in the bound (1.26) it takes the form

exp ( -c'log ~T!--Yd) (c' > 0).

Finally

P (lOg~ T CO~T ) ! IJr(x) - EJr(x)1 > TJ) S 1/J.,,(T)

where 1/J." is integrable on J1, +00[.


On the other hand it is easy to see that Lemma 4.4 remains valid if ZT is
replaced by

Zl,T = logm T
1 (T)
logT
1/2
(fT(X) - EJr(x))
thus Lemma 4.2 implies that Zl ,T --+ 0 a .s.

The bias is again given by (4.15) thus

1
logm T
(T) logT
1/2
IEJr(x) - f(x)1 S c(r)
T1 /2 - -yr
logm T. (logT)1/2
118 CHAPTER 4. KERNEL DENSITY ESTIMATION

I
which tends to zero since I ;::: 2r'

Finally uniform convergence is obtained by using the same process as in the


proofs of Theorem 4.10 and Corollary 4.6 . •

4.4 Asymptotic normality


We now give a result concerning the limit in distribution of h.

Assumptions are the following

(i) X = (Xt , t E R) is a real measurable strictly stationary strongly mixing


process with mixing coefficient (au) such that, for some a Ell, +00],
~k
L a k(a-I )la < 00 .
k~l

(ii) gu = f(xo,x u ) - f 0 f exists and is continuous for every u f= O. Further-


more u ...... 11 gu 1100 is integrable.
THEOREM 4.12 Suppose that (i) and (ii) are satisfied. Let (Xl, ... , Xk) be a
finite collection of distinct real numbers such that the matrix
L= (/+00 gU(Xi , Xj)dU)
-00 l~i , j~m
is positive definite .
h .2

Then if lim -hITI = 1 and hT ;::: cT- (30 1)(2 a 1) h


were c > 0 zs. a constant,
T->oo T
we have
(4 .74) vT(h(x;) - Eh(Xi) , I::; i ::; m) ~ N(m).

where N(m) denotes a random vector with normal distribution N(O, 2::) .

If in addition f is differentiable with derivative satisfying a Lipschitz con-


dition and hT = o(T-1/4) then

(4.75)

Proof: Cf. [BMPl . •

4.5 Sampling
In continuous time, data are often collected by using a sampling scheme. Vari-
ous sampling designs can be employed . In the following we only consider three
kinds of deterministic designs: dichotomy, irregular sampling, admissible sam-
pling.
4.5. SAMPLING 119

4.5.1 Dichotomy
Consider the data (XjT/N ; j = 1, . . . , N) where N = 2n; n = 1, 2, ... T being
fixed. Such a design may be associated with the accuracy of an instrument
used for observing the process (X t ) over [0, T] .

In some parametric cases estimators based on that sampling are consistent.


A well known example should be the observation of a Wiener process (Wt, t ~ 0)
at times jT / N. The associated estimator of the parameter 0"2 is
N

O"N =
2
TI "L....(WjT/
,
N - W(j-l)T/N)
2

j=l

which is clearly consistent in quadratic mean and almost surely.

Now if (Xb t E R) is a process with identically distributed margins the


density kernel estimator is

(4.76)

The following theorem shows that iN is not consistent.


THEOREM 4.13 let (Xt, t E R) be a zero mean real stationary Gaussian
process with an autocorrelation function p satisfying
0< cuD: ~ 1 - p2(U) ~ c'uD: , 0 < u ~ T

where
o < c ~ c' < 1 and 0 < a ~ 2.
Then if hN = N-"( (0 < I < 1) and if the kernel K satisfies
Ju 4 K(u)du < +00 we have

(4 .77) liminf ViN(O) > 4 - ~ > O.


N~+oo - 7rR(2 - a)(4 - a) 271"

In particular V iN (0) tends to infinity if a = 2.


Proof (sketch) :
We may and do suppose that T = 1 and EXJ = 1. Now let us consider the
decomposition

_ 1 N
where VN = N 2 h 2 L:VK (Xj/N /h N ),
N j=l
120 CHAPTER 4. KERNEL DENSITY ESTIMATION

CN =- N 22h2
N
~1(N -
j=1
j) JK (hU
N
) K (hV
N
) f(u)f(v)dudv,

RN =N
2
L
N-1 (
1- N
j) fj/N(O,O) ,
)=1

and

First, Bochner's lemma implies VN ---+ ° and CN ---+ - f 2 (0).


1 2
Now, since fj/N(O , 0) = -(1 - P (j/N))-' we have
1

21l"

RN :::: -
1 1
- L.
N-1 (
1-
. )
2.
(N)0./2
--:-
1l".J2 N )=1
N J

1
which appears to be a Riemann sum for the function 0(1- u)u- a / 2 . Con-
1l"yc'
sequently

liminf RN:::: 1 (1
r;
1l"y c·1- -
1)
--a- - --a-
2- -
,0 < a:::; 2.
2 2

Finally by using 1 - p2(j/N) :::: c (~ ) 0. and the inequality

le au -11 ::; au (1 + a2u) (a> 0, u > 0) it is easy to check that TN tends to zero.

Collecting the above results one obtains (4.77) . •

Under slightly different hypotheses it may be established that

(4.78)
N-+oo
A 1
lim VJN(O) = -T
7r
lT
0
1- u
( 1 - P2( u ))1/2 du .

In conclusion it appears that the condition hN ---+ is not appropriate in


the dichotomy context. It is then necessary to adopt another point of view by
°
considering iN as an approximation of JT and by letting hN tend to hT . Thus
we have the following.
4.5. SAMPLING 121

THEOREM 4.14 If (Xt, 0 ::::; t ::::; T) has cadlag sample paths, if K is uni-
formly continuous and if hN -+ hT then
(4 .79) - + /T(x) , x E Rd.
iN(X) N_oo

Proof
We have
iN(X) = l KhN(X - U)dJlN(U)

=l
and
/T(x) KhT(X - U)dJlT(U)
N
where JlN =~L 8(X j T/N) and JlT are empirical measures.
N j=1

Now let <p be a continuous real function defined on lR'. , then for all
w inn

i'Rr <pdJlN = T~ . NT ;.... <P(XjT/ N) - +


~ N _oo T
~ iorT <p (Xt)dt
3=1
since t ...... <p 0 Xt(w) is Riemann integrable over [0, TJ .
In particular

u)dJlT (u) = /T(x) .


i'Rr KhT(X - U)dJlN(U) -+
N-oo iorT KhT(X -
On the other hand

l (KhT(X - u) - KhN(X - u)) dJlN(U);::! 0

since K is uniformly continuous. Hence (4.79) . •

Note that convergence in (4.79) is uniform with respect to x .

4.5.2 Irregular sampling


Consider the data X t " . . . ,Xtn where 0 < tl < . . . <tn and
min (tj+l - tj) :2': m > 0 for some m . The corresponding estimator is
1$3$n-l

( 4.80) -
in(x)=nhd~K
1 ~ (X -h X tj )
,x ER .
d
n j=1 n

Then it is not difficult to see that the asymptotic behaviour of 7n is the


same as that of in studied in Chapter 2. Thus all the results in Chapter 2
remain valid with slight modifications.
122 CHAPTER 4. KERNEL DENSITY ESTIMATION

4.5.3 Admissible sampling


We now consider a process (X t , t E JR) with irregular paths observed at sam-
pling instants. In order to modelize the fact that the observations are frequent
during a long time we assume that these sampling instants are 8n , 28n , ... , n8n
where 8n -> 0 and Tn = n8n -> +00.

Here the kernel estimator is defined as

(4.81)

Now we will say that (8 n ) is an admissible sampling if the superoptimal rate


remains valid when the observations are X Cn , X 2Cn , ... , Xnc n with a minimal
sample size n.

More precisely (8n ) is admissible if

(a) For a suitable choice of (h n )

E(f~(x) - f(x))2 = 0 (;n) .


(b) 8n is maximal (Le. n is minimal) that is, if (8~) is a sequence satisfying
(a) then 8~ = O(8n ) .

Note that if (8 n ) and (8~) are both admissible then obviously 8~ !::: 8n ·

In order to specify an admissible sampling we need the following assump-


tions

(1) g8,t = glt-81 exists for s 1= t and II gu 1100::; 1l"(u), U > 0 where (1+u)1l"(u) is
integrable over )0, +oo[ and u1l"(u) is bounded and ultimately decreasing.
Furthermore gu (-, .) is continuous at (x , x).

(2)

These assumptions are satisfied if, for example, (X t ) is an ORNSTEIN-


UHLENBECK process (i.e. a zero mean stationary Gaussian process with
autocorrelation exp(-8u), U> 0 (8 > 0)) .
THEOREM 4.15 If (1) and (2) hold, if f E C!(£) and K E H k ,>.
(k + A = r) then 8n = T;:d/2r is admissible provided h n = T;:1/2r.
4.5. SAMPLING 123

Proof
Let us begin with the following preliminary result :

( 4.82)

00
where Hn(y, z) = L ongi8..(Y, z)
i=l

and Gn(y , z )= ~ (1 - ~) 0ngi8 n (y , z ).

In order to prove (4.82) note first that u7r(u) and 7r(u) are decreasing for U
large enough, U > Uo say.
Therefore

i - 1> 8;;-'UO

On the other hand

Now we have
00 n-1
Hn - G n = ""'
~ ongi8n + ~n~
""' iOngi8
n
i=n i=l

hence for n large enough

II Hn - G n 1100

+ ~
nUn
roo
Juo
U7r(u)du ,
124 CHAPTER 4. KERNEL DENSITY ESTIMATION

hence (4.82) since 7r(u) and (U7r(u)) are integrable and n8n -+ 00.

We now study the variance of f~ by using the classical decomposition

where \In stands for the sum of variance and where

For Vn we have again the well known result

(4.83)

Concerning C n note that

and

If Kh n (x - y)Khn (x - z)[Hn(y, z) - G(y, Z)]dYdZI ::; II Hn - G 1100

where G(y, z) = It:>:) gu(y, z)du.


Consequently assumption (2) and (4.82) entail

Since G is continuous at (x, x) we find

(4.84) n8n C n -+
roo
2 10 gu(x,x)du.

Now the bias is given by (4.15) and (4.16), then by using (4.83) and (4.84)
we obtain
a a"
E (J~(x) - f(x))2::; h d + a'h~r + -8
n n n n
where a, a' and a" are positive constants. Hence

a'" a"
E (J~(x) - f(x))2 ::; n2r/{2r+d) + n8n
4.5. SAMPLING 125

and since n8n = Tn

E (f~(x) - f(x))2 ::; ;n + (8 );!::a


/I
a'" T: '

thus the full rate is obtained by choosing 8n = T;;d/2r as announced above.

It remains to prove that 8n is minimal: let us consider a sequence (8~)


which generates the full rate and let us note that there exists a1 > 0 such that

(4.85)
a
-.!. > ** _
Tn - E(fn f(x))
2
alii
2: n2r/(2r+d) - a
_ '" (Tn6' 2r)
_~
~

where f~* is associated with the sampling (Xj 6:J.


Then (4.85) yields
8'n -< ~T-!.:
a'II n
= 0(8n )

and the proof of Theorem 4.15 is therefore complete . •

The following corollary provides the exact asymptotic quadratic error asso-
ciated with an admissible sampling.

COROLLARY 4.7 If (1) and (2) hold, if f E C 2,1(b) and


if f(x)f"(x) > 0 then the choice 8n = )..T;;1/4 (n > 0)

and h n = (ab)..)1/5 T;;1/4 where a = f(x) J K2 and b = f"2(x)


(Ju) 2
2K(u)du
leads to

(4 .86)

Proof:
Straightforward since the bias is given by (2 .9) . •

Note that if all the sample path (Xt,O ::; t ::; Tn) is available one obtains a
smaller constant, namely

(4.87)

The reason is that a diagonal variance term appears in (4.84) .

The following theorem shows that the superoptimal uniform convergence


rate still remains valid if the sampling is admissible.
126 CHAPTER 4. KERNEL DENSITY ESTIMATION

THEOREM 4.16 Under the conditions of Theorems 4.10 and 4.11 where
= rp-d/2r
<: ,...,
Un .in hd
' n = Un ,
<:
an d r >d:

( 4.88) I
1
T,
(T,)
I:
1/2
sup If~(x) - f(x)1 --> 0 a.s.
ogm n og.in xE lRd nToo

Proof
Let us consider the random variables

Z
In
= K
hn
(x -hn Xj6 n ) _ EK
hn
(x -hn Xj6n )
'
1 <_ J' <_ n.
t
Then
f~(x) - Ef~(x) = .!. Zjn .
n j=l

As in Theorem 4.11 we use inequality (1.26) .


First if POn -+ +00 we may use the proof of Theorem 4.11. We obtain

consequently since h~ = {in ,

2 1
V (q) ~ - +-

POn {in

and choosing P ~ .!.€ we obtain v 2 (q) ~ Un


: .
Finally the first term of the bound in (1.26) has an exponent of the form
€2q €2q{in n T T€ €2q 2
~() ':::: - - but q = -2 = - 2 <: ,..., -z-, therefore z-( ) ':::: T€ .
V q C P PUn Un V q

Now choosing € = Co~;'n) 1/2 (log~ Tn) (a choice which is compatible

with p{in -+ 00 since {in ,..., T;:d/2r and.!. > !i <=} r > d), we get the bound
2 2r

exp (-c(logTn)(logm Tn)2) = lIT '


T~ og", n
4.5. SAMPLING 127

Now, since (Xt ) is GSM, the second term is a

where I is arbitrarily small.

Finally

P logm Tn (Y:)
(
1
log;'n
1/2
If~(x) - Ef~(x)1 > 1/
)

= 0 (T;c'log~ Tn) = 0 (n-c"IOgn)

since n = [Tn~]. Hence the a.s. convergence.

Concerning the bias we have


1/2 y:l/2 h r
1 ( ~) IEf*-f < n n
log log Tn log Tn n I (log Tn) 1/2 log log Tn

o ((lOg Tn)!/; log log Tn) --+ 0


Finally in order to obtain an uniform result it suffices to use the same cov-
ering method as in Theorem 4.10 and to conclude as in Corollary 4.6 . •

Notes

BANON (1978) was the first who considered density estimation in continuous
time. In his pioneer work he studied the case of a stationary diffusion process by us-
ing a recursive estimator. Related results were obtained by 13ANON and NGUYEN
(1978, 1981), NGUYEN (1979), NGUYEN and PRAM (1980, 1981).

Most of the above results are obtained under the so called Rosenblatt G 2 -condition.
The strong mixing case has been investigated by DELECROIX (1980).

Estimators based on 8-sequences are studied in PRAKASA RAO (1979, 1990).


The special case of wavelets appears in LEBLANC (1995, 1997).

CASTELLANA and LEADBETTER (1986) have obtained the surprising ~-rate


(Theorem 4.4, 2).

Results about intermediate rates has been established by BLANKE and BOSQ
(1995-1998) .
128 CHAPTER 4. KERNEL DENSITY ESTIMATION

The limit in distribution (Theorem 4.12) is due to BOSQ, MERLEVEDE and


PELIGRAD (1997) .

Deterministic or random sampled data are considered by NGUYEN and PHAM


(1981), MASRY (1983), PRAKASA RAO (1990).

Most of the results in this chapter seem to be new except of course Theorem
4.4, 2 and some simple results which belong to the folklore of density estimation in
continuous time.
Chapter 5

Regression estimation and


prediction in continuous
time

Despite its great importance in practice, non parametric regression estima-


tion in continuous time has not been much studied up to now . The current
chapter is perhaps the first general work on that topic.

The main results are similar to those obtained in the previous chapter about
density estimation : if the process sample paths are irregular enough then a
parametric rate appears in regression estimation. This fact remains valid for
suitable sampled data and when nonparametric prediction is considered.
Optimal and superoptimal rates are studied in Sections 2, 3 and 5. Section 4
is devoted to limit in distribution. Section 6 deals with sampling and, finally,
applications to forecasting appear in Section 7.

Several of the proofs are not detailed or are omitted since they are easy
combinations of proofs given in Chapters 3 and 4.

5.1 The kernel regression estimator in


continuous time
Let Zt = (Xt, Y t ), (t E R) be a Rd x R d ' -valued measurable stochastic process
defined on a probability space (n, A , P). Let m be a Borelian function of R d '
into R such that (w , t) ...... m 2 (Yi(w)) is P 0 AT-integrable for each positive T
(AT stands for Lebesgue measure on [0, T]).

129
130 CHAPTER 5. REGRESSION ESTIMATION

Assuming that the Zt'S have the same distribution with density fz(x, y),
we wish to estimate the regression function E(m(Yo) I Xo = .) given the data
(Zt, 0:::; t :::; T) .

Consider the following functional parameters :

f(x) =
ifR
r fz(x, y)dy
dl

and
cp(x) =
JRd
r m(y)fz(x, y)dy
l
,x E JRd.

We may use f and r.p for defining a version of the regression by setting

r(x) cp(x)/ f(x) if f(x) > 0


(5.1)
Em(Yo) if f(x) o.
Now let K be a d-dimensional convolution kernel (cf. Chapter 2), the
kernel regression estimator is defined as

rT(x) CPT(X)/ Jr(x) if Jr(x) > 0


(5.2)
~ J: m(yt)dt if Jr(x) =0
where
(5.3) 1 iT
Jr(x) = T 0 Khr(X - Xt)dt

and
(5.4) 1 iT
CPT(X) = T 0 m(yt)Khr(X - Xt)dt

with lim hT
T->oo
= O( +).
Note that TT may be written under the suggestive form

(5.5) rT(x) = iT ptT(x)m(yt)dt

e
where

PtT(X) K ~TXt ) / iT K ( X ~TXt ) dt if Jr(x) >0


(5.6)
1
if Jr(x) = O.
T
In the following, in order to simplify the exposition, we will suppose that
K is a strictly positive kernel, unless otherwise stated.
5.2. OPTIMAL ASYMPTOTIC QUADRATIC ERROR 131

5.2 Optimal asymptotic quadratic error


First we study the case where m(yt) is supposed to be bounded. In fact we
introduce a slightly more general assumption, namely.

Eo - There exists a positive constant M such that

sup E(m 2 (yt) I BT):S M2 a.s.


O~t~T<+oo

where BT = a(Xt , 0 :S t :S T).


Eo is clearly satisfied if m(yt) is bounded but also in some special situa-
tions, for example if the processes (m(yt)) and (X t ) are independent. Another
interesting case should be the model

(5.7) m(yt) = r(Xt ) + et ; t E JR

where r(X t ) is bounded, (ee) is a square integrable strictly stationary process


and where (Xd and (et) are independent.

Now let us set


g:,t = f(z"z,) - fz ® fz , 8 1= t
and
Gs,t(X,X') = ( m(y)m(y')g:t(x, y; x',y')dydy' ,
J{R2d l I

(X, x') E JR2d .

Furthermore we put

~p.(r·) = sup II Gs,t lb·


(8,t)~r·

where r* is some two dimensional Borel set and 2 < p* :s +00.


On the other hand aF) will denote the 2 - a-mixing coefficient of (Zt) that
is
a~2)(u) = supa(a(Zt), a(Zt+u)) , U 2 o.
tEIR

With the above notation we may introduce the following conditions :

A * (r* , p*). There exist r* E BIR 2, containing


D = {(8, t) E JR2 : 8 = t}, and p* E]2, +00] such that

a) 9:,t and G.,t exist for (8, t) (j. r*


132 CHAPTER 5. REGRESSION ESTIMATION

b) t1po (r*) < +00

T--+oo
1
c) lim sup -T r
J[O ,T]2nr O
ds dt = fro < 00.

M*b, (3). ai2)(lt - sl) :S ,It - sl-.6; (s, t) tf. r* where, > 0, (3 > O.

CO. fz E C2,d+d' (b), f E C2,d(b'), t.p E C2,d(b") for some b,b',b".

Note that, if m is bounded, these conditions may be simplified. In partic-


ular if fz E C 2 ,d+d,(b) then f E C 2,d(b) and t.p E C2,d(b I m 1100). Note in
addition that we will also use some assumptions introduced in Chapter 4.

We are now in a position to state the" optimal rate" result

THEOREM 5.1 If B o, A(r,p), A*(r*,p*), C* and M*b,(3) hold with


min(p,p")-1 . / .
(3 > . ( ) 2 then the ch02ce hT = cT- 1 (d+4), c> 0, entazls
mm p, p" -

(5.8) limsupT 4/(d+4) E(rT(x) - r(x))2 :S c(x)


T-+oo

provided f(x) > O. c(x) is an explicit constant.


If p = p* = +00 the condition for (3 becomes (3 > 2 and the rate T- 4 /(d+4)
remains valid for (3 = 2.

Proof
We first derive a preliminary result, namely

(5.9)

where x is omitted.

For that purpose by using (5.5) , the Cauchy-Schwarz inequality and Bo we


obtain

E [r~UT - EJr)2] = E [E (r~UT - EJr)21 BT)]

=E [UT - EJr)2 E [ (JOT PtTm(yt)dt fI BT]

= E [(UT - EJr? J[O,Tj2PSTPtTE(m(Ys )m(yt) I BT) dsdt]

:S E rUT - EJr)2 (JoT PtT [E(m2(yt) I BT)f/2 dtf]


:S M EUT - EJr)2
5.2. OPTIMAL ASYMPTOTIC QUADRATIC ERROR 133

which proves (5.9).

Now from the decomposition


EcpT Efr - h CPT - EtpT
TT - E h = TT E fr + Eh

and (5.9) we get

By Theorem 4.2 we have

(5.11) limsupTh~Vh(x) S fr/(x) jK2 .


T-oo

Concerning V CPT we may use the same method as for V IT in Theorem 4.2, we
obtain
(5.12) limsupTh~VcpT(x) S M 2f r ' / (x)jK 2.
T-oo
It remains to study the "pseudo-bias"

ECPT Efr - I cP - ECPT


T - Eh = T Eh + Eh .

Using C* and classical methods one easily obtains


E CPT 2 2 2 2
(5 .13) hmsuphT ( T
. 4
- --
) < -2T -Xf + -2 -X<p
T_oo Eh - P 4 P 4
where

and X~ is similar.

Finally collecting (5.10), (5.11), (5.12) and (5.13) we get (5.8) with

(5.14)
(
ex) =
2(1 + M2)
cd I(x) (fr + f r ·)
j K
2 c4
+ 2J2(x) (T
2 2
(x)xf(x)
+ X~(x» .•

We now turn to a more general case, replacing Eo by


Eb - E(expalm(yt)l) S M' for some a > 0 and some M' > O.
134 CHAPTER 5. REGRESSION ESTIMATION

First, it can be established that (5.8) remains valid, provided


min(p*,p) - 1 )
(3 > max ( 2 . ( ) 2' 2 + d ; the proof uses arguments similar to those
mm p*,p -
in Theorems 3.1 and 4.2.

Now we introduce the family Z of processes Z = (Zt, t E JR) which satisfy


the above hypotheses uniformly with respect to a, M', r, p, r*, p', /, (3, b,
b' , b" and we consider a kernel of the type K = K~d, then we have

COROLLARY 5.1

(5.15) lim sup T 4 /(d+4) Ez(rr(x) - r(z)(x))2 = Cz


r-+oo ZEZ

where Cz is explicit and r(Z) denotes the regression associated with Z.

The proof of Corollary 5.1 is analogous to the proofs of Corollaries 5.1 and 5.2.
In particular, given a sequence (Un, Vn ), n E Z of LLd. JRd+d' -valued random
variables one may construct the process

then for a suitable choice of Lo and (Un, Vn ), nEZ, (Zt, t E lR) belongs to Z
and satisfies
2 Cz
(5.16) Ez(rr(x) - r(z)(x)) rv T4/(d+4) ,

details are omitted.

(5.16) shows that the "optimal rate" is achieved and (5.15) that better rates
are feasible. These rates are considered in the next section.

5.3 Superoptimal asymptotic quadratic error


We now show that, if the sample paths are irregular enough, the kernel
estimator exhibits a parametric asymptotic behaviour.

In order to express that irregularity we need some notations. Consider

g.,t = f(x.,xtl - f 181 f j s i= t


and suppose that 98,t = 9It-81, then we put

h(x',x") = r
J1o,+oo[
Igu(x' ,x")ldu.
5.3. SUPEROPTIMAL ASYMPTOTIC 135

Similarly if g:,t = gjs-tl' s # t, we put


H(x', x") = f IGu(x',x")ldu
J]O,+oo[
where
Gu(x', x") = f m(y)m(y')g~(x',y;x",y')dydy'.
J'Jf. 2dl
Now the "irregularity" assumption is

II - hand H exist and are continuous at (x,x).

The following theorem gives the parametric rate


THEOREM 5.2 If Bo and h hold and if f(x) is strictly positive then

(5.17) . ( ECPT(X))2
h~~:;!,pTE rT(x) - Efr(x) ~ CI(X) ,

if in addition C' holds then the choice hT = c T- I / 4 , c > 0 entails

(5.18) limsupTE(rT(x) - rex)? ~ C2(X) .


T-oo
CI (x) and C2 (x) are explicit.
Proof
We first study VcpT(X). According to (5.4) we have

(5.19)
TVcpT(X) = h~d loT (1 - f) Cov ( m(Yo)K (x ~TXO) ,
m(Yu)K ( X,,;u )) du,
where the covariance, say IU, may be written

IU = J m(YI)m(Y2)K (X-
~
Xl) K (x - X2) •
~ gu(XI,YI;X2,Y2)dxldX2dYldY2

= L2d K (x ~TXI ) K (x ~TX2) Gu(XI, X2) dXI dX2 ,

therefore

TVcpT(X)::;2 f KhT(X-XI)KhT (x-X2) rooIGU(XI,X2)ldu


J'Jf. 2d Jo
using h we obtain

(5.20) limsupTVcpT(x)
T--+oo
~ 2 1
0
+00
IGu(x,x)ldu .
136 CHAPTER 5. REGRESSION ESTIMATION

Now, by 11, (4.35) is valid, thus

limsupTVJr(x) S 2
T-'>oo
10
+00
!9u(X,x)!du

and (5.10) implies

. (ErpT(X»)2
iImsuPT-'>oo TE rT(x) - EJr(x)
(5.21)
4(M2 + 1) roo
S f(x) io
(!Gu(x , x )! + !9u(X,x)l)du
hence (5.17) .

Concerning (5 .18) it is an easy consequence of (5.13) and of the choice


hT = cT- 1/ 4 • •
Under stronger conditions it is possible to substitute "lim" for "lim sup"
in (5.18). To this aim let us define the function
9;;(X' , x",y) = f(xo,zu)(x',x",y) - f(x')fz(x",y)
and supposing that 9;,~ = g~~s l let us set

Ju(x', x") = r m(y)g~*(x',x",y)dy


jlR df
, u > O.

Now we need the following assumption:

h - 9u, G u , J tt exist, are bounded, continuous at (x, x), and II gu 1100 , II G u 1100
and II J u 1100 are integrable over 10, +00[.

We then have the following result


THEOREM 5.3 If m(Yo) is bounded, if C* and 12 hold and if (Zt) is GSM
then f(x) > 0 and the choice hT = cT- 1/ 4 , C > 0 leads to
(5 .22) T · E(rT(x) - r(x»2 ---+ C 2(x)
where

(5 .23)
5.3. SUPEROPTIMAL ASYMPTOTIC 137

The proof is a combination of the proofs of Theorems 3.1 and 4.4.2 and is
therefore omitted.

Now, to complete this section we state a result which offers intermediate


rates . The main assumption is

A"(p) - G(s, t) exists for s i= t, II Gs,t lip is locally integrable and

lim sup
T-+oo
fr J[O ,Tj2
II Gs,t lip ds dt = Gp < +00.
In A"(p), p belongs to [1, +00] . In the case where G" ,t = Glt-sl, A"(p) is
satisfied as soon as II G u lip is integrable. In particular h implies A" (+00).

Intermediate rates depend on p and are specified in the following statement:


THEOREM 5.4 Under the conditions Eo, A'(p) , A"(p) and C*, and if
hT = cT-p/(4p+2d) and f(x) > 0 then
(5.24) limsupT 2p/(2p +d) E(rT(x) - r(x))2 :s: D(x)
T-+oo

where D(x) is explicit.


Proof
Owing to Theorem 4.6 and formulas (5.10) and (5 .13) we only have to study

VcpT(X) = ;2 r
J[O,Tj2 xIR2d
KhT(X-xt}KhT(X-X2)Gs ,t(;rl,x2)dsdtdxldx2 .

Supposing that 1 < p < 00 and using Holder inequality, we arrive at

VcpT(X) :s: ;2 (r K~T(X


JlRd
- XI)dXI)2/Q r
J[O ,Tj2
II Gs,t lip dsdt

where q = -P_, hence


p-1

Th>}d)/PVr.pT(x) :s: frJ[O,Tj2


II Gs ,t lip dsdt· II K II~
and taking the lim sup on both sides we get

(5.25 ) limsupTh>}d )/PVr.pT(x) :s: Gp II K II;


T-+oo

and the rest is clear. The special cases p = 1 and p =, 00 may be treated
similarly. •

Note that the optimal rate is reached for p = 2 while the superoptimal rate
is achieved for p = +00.
138 CHAPTER 5. REGRESSION ESTIMATION

5.4 Limit in distribution


In order to specify the asymptotic distribution of rT we introduce some
notation (where x is omitted)

V'PT Cov(fT,'PT) ]
AT= [
Cov(fT,'PT) Vh
moreover we suppose that d = 1 and that ThTAT -> L a constant regular
matrix.

On the other hand we set

lIT = [v w] AT [ : ] where v and ware real numbers.

Then we have the following weak convergence result :

THEOREM 5.5 IfC' hold, f(x) > 0 and a(u) = G(e-,IL) b> 0) then the
choice hT = cT->' (C > 0, ~ < A < ~) entails

rT(x) - rex) ~ N
(5.26)
y'(u'ATu) (x)

where N has a standard normal distribution.

(5.26) is an extension of a Schuster's result obtained in the Li.d. case. Proof is


omitted.

A confidence interval can be constructed from the following corollary :

COROLLARY 5.2

fT(X) w
(5.27) Vr(x) (rT(x) - rex»~ ----> N

where
(5.28) 1 1
VT(x) = hex) ThT
iT
0
2
m (Yi) K
-
(x---,;;:-
Xt ) 2
dt - rT(x) .

It should be noticed that asymptotic normality of h may be obtained


from (5 .26) or (5 .27) (see [CP]). Compare with Theorem 4.12.
5.5. UNIFORM CONVERGENCE RATES 139

5.5 Uniform convergence rates


We will now discuss uniform convergence. For this purpose we use a kernel
K = K~d where Ko has compact support and continuous derivative. Then,
if the functional parameters are twice continuously differentiable the obtained
rates appear to be the same as in the density case as soon as the sup norm is
taken on a compact set, say fl, such that infxEA f(x) > o.

We summarize the results about optimal and superoptimal rates in the


following theorem :
THEOREM 5.6 Suppose that m(Yo) is bounded, C· hold and (Zt) is GSM
then
1) If A(f, p) and A * (f* ,p*) hold, the choice hT ~ T- 1/ (4+d) entails for each
k~ 1

(5.29)
1
Lo gk T
(T T
Log
)2/(4+d)
sup IrT(x) - r(x)1 --> 0 a.s. .
xEA

1 1
2) If h hold, if d = 1, and if hT ~ T-"'( where 4: ::; , < 2 then for each
k 1
(T)
~
1 1/2
(5.30) -L T L
T sup IrT(x) - r(x)1 -----* 0 a.s ..
ogk og xEA

Proof (sketch)
Let us consider the decomposition
Et.pT Eh-h t.pT-Et.pT
rT - E h = rT Eh + Eh

and let us set M = max(l, II m(Yo) 1 00) and T] = infxEA f(x), then for T large
enough we have

Et.pT(X) I
sUPxEA IrT(x) - E h(x)
(5 .31 )

::; 2t;( (SUPxEAlh(x) - Eh(x)1 + sUPxEAIt.pT(X) - Et.pT(X)1)


Now, under the conditions in 1) , Theorem 4.11 implies

-LT
1 (T)
T
L
2/(4+d)
sup Ih(x) - Eh(x)1 --> 0 a .s._
ogk og xEA
A similar result may be established for t.pT . This can be done by using the
same scheme as in the density case (cf. Lemma 4.2, Lemma 4.3 and Lemma
140 CHAPTER 5. REGRESSION ESTIMATION

4.4) .
One finally obtains

---
1 (T
LogkT
---
LogT
)2/(Hd)
xE.c.
I
sup TT(X) -
E'PT(x)
( )
Efr x
I ---> 0 a.s.

and (5.29) follows from C'.

The proof of (5 .30) is similar . •

5.6 Sampling
This section will be short because the reader can easily guess that regression
and density estimators behave alike when sampled data are available. Conse-
quently the results in section 4.4 remain valid.

In particular if data are constructed by dichotomy, that is by considering


X T / 2n, X 2T/ 2",
... , XT the kernel regression estimator is not consistent un-
der natural assumptions.

If the data are X tp ' .. , Xtn with 0 < tl < ... < tn and min (tj+l -tj) 2:
l:'SJ:'Sn-l
m > 0 then the asymptotic quadratic and uniform errors are the same as that
of Tn studied in Chapter 3.

We now consider a process (Zt. t E JR), with irregular paths, observed at


times bn , 2bn , .. . , nbn where bn --+ 0 and Tn = nbn --+ 00. The associated kernel
estimator is

(5.32)

In the same way as in subsection 4.4.3 we will say that (b n ) is an admissible


sampling if
(a) for a suitable choice of (h n )

(b) bn is maximal (Le. n is minimal) that is, if (b~) is a sequence satisfying


(a) then 8~ = O(8n ).
5.7. NONPARAMETRIC PREDICTION 141

Then under conditions similar to these of Theorems 4.13 and 5.6 it may be
proved that On = T;;d/4 is admissible provided h n ~ T;;I/4, and that

1 rp (LT:,) 1/2 sup ITn(X) - r(x)1 ---70 a.s.


-L
ogk.Ln og.Ln xEll.

where /::;. is any compact set such that inf f(x) > o.
xEll.

5.7 Nonparametric prediction in


continuous time
Let (~t, t E R) be a strictly stationary measurable process. Given the data
(~t,0::; t ::; T) we would like to predict the non-observed square integrable real
random variable (T+H = m(€T+H) where the horizon H satisfies 0 < H < T
and where m is measurable and bounded on compact sets.

In order to simplify the exposition we suppose that (€t) is a real Markov


process with sample paths which are continuous on the left.

Now let us consider the associated process

and the kernel regression estimator based on the data (Zt, 0 ::; t ::; T - H) .
The nonparametric predictor is

that is

(5.33)

o
where the kernel K has a compact support SK, is strictly positive over SK
and has continuous derivative. Note that these conditions together with left
continuity of paths entails that the denominator in (5.33) is strictly positive
with probability 1.

We now study the asymptotic behaviour of (T+H as T tends to infinity,


H remaining fixed. As usual (T+H is an approximation of r(~T) = E((T+H I
€s,s::; T) = E((T+H I €T) .

If the sample paths of (~d are regular, the rates are similar to those obtained
in Chapter 3, specifically in Theorem 3.5 and Corollary 3.1. We therefore focus
142 CHAPTER 5. REGRESSION ESTIMATION

our attention on the superoptimal case in order to exhibit sharper rates.

Let us first indicate the almost sure convergence rate.

COROLLARY 5.3 If 12 and C* hold, (~d is GSM and if one chooses


1 1
hT : : : : T--Y where -4 < 'V
- I
< -2 then

(5 .34) LogkT
1 (T) LogT
1/ 2
[rT(~T) - r(~T)] IeTE~ ~ 0

for each integer k 2: 1 and each compact set b. such that


infxED. f(x) > o.
Proof (sketch)
We have

hence (5.34) using the same method as in Theorem 5.6. •

We presently turn to convergence in mean square. First we have the fol-


lowing results :

COROLLARY 5.4 If conditions in Corollary 5.3 hold and if hT : : : : T-l/8


then
(5 .36) E [(rT+H(~T) - r(~T)fI~TED.] = 0(T- 1/ 2 )
for each closed interval b. such that infxED. f(x) > o.

Proof
Using (5 .13), (5.31) and (5.35) it is easy to realize that it is enough to
study the asymptotic behaviour of OT = E (suPXED.lfr(x) - Efr(xW) and
ofr, = E(suPxED.IIPT(x)-
EIPT(XW) . We only consider OT since or can be treated similarly.
Now we may and do suppose that b. = [0,1]' then using the condition
IK(x") - K(x')1 ::; £Ix" - x'i where £ =11 K'O 1100 we obtain

(5 .37) sup Ifr(x) - Efr(x)l::; sup Ifr(xj) - Efr(xj)1 + k h2 '
xED. 1 ~ j~kT T T

where Xj = 1T' 1 ::; j ::; kT and kT = [Tl/2].


5.7. NONPARAMETRlC PREDICTION 143

Now (5.37) implies

(5.38) OT ::; 2E ( sup l!r(xj) - E!r(XjW) + k~eh24


l~j~kT T T

which in turn implies

From 12 and (4.39) we infer that

kT
OT ::; 2y;
1+00 II
OTT
gu 1100 du + k28eh4
2

thus
OT = O(T- 1/ 2 )
and since the bias term is a O(T- 1 / 2 ) too, (5 .36) follows . •

The last result requires a stronger assumption: let us suppose that (~t)
is 'Prev-mixing (cf. subsection 3.3.3) and consider the predictor defined for T
large enough by
(j'+H = rT'(~T)
where T' =T - H - LogT . Log2T.

Then we have the following superoptimal rate :

COROLLARY 5.5 If 12 and C' hold, if (~t) is 'Prev-mixing with


'Prev(P) ::; apT' (a> 0, 0 < p < 1) then the choice hT ::::: T- 1 / 4 entails

(5.39)

for each compact set .6. .


Proof
First we have
DT E [(rT'(~T) - r(~T ))2IeTE~ ]
fIR E [(rT' (~T) - r(~T))2IeTE~ I ~T = x ] dPeT(x)
thus
DT = L E [(rT'(x) - r(x))2 I ~T = x] f( ;r)dx .
144 CHAPTER 5. REGRESSION ESTIMATION

Now Lemma 3.1 entails


DT(X) : E [(rT' (x) - r(x))2 I ~T = xl
(5.40)
:s E[rT'(x) - r(x) ]2 + 8suPxE6Im(x)l<Prev(T - T') ,

x E 6, except on a P~o-null set.

Consequently

(5.41 ) DT(X) :s sup E[rT,(x) - r(x)]2 + 8 sup Im(x)lap(T-T')


xE6 xE6

Using the bound (5.41) in (5.40) we get

DT :s sup E[rT'(x) - r(x)]2 + 8 sup Im(x)lap(T-T') .


xE6 xE6

Now from 12 we may obtain uniform majorizations as in the proof of The-


orem 5.2, hence
DT = 0 (;,) + O(p(T-T' »)

O(j)
which proves (5.39) . •

Note that the nonparametric predictor (T+H reaches a parametric


rate. Once more this fact proves the value of non parametric methods.

Notes

All the results in this chapter are new or recent.

N. CHEZE-PAYAUD (1994) has proved Corollaries 5.1 and 5.2, and Theorems
5.3 and 5.5. Results about quadratic error associated with admissible sampling may
be found in [CPl.
The other results have been obtained by the a uthor of the present work.
Chapter 6

The local time density


estimator

In this Chapter we use local time for constructing an unbiased estimator


of density when continuous sample is available. This estimator appears to be
natural since it is the density of empirical measure.

In Section 1 we define local time and study its possible existence. The asso-
ciated estimator is defined in Section 2. Its consistency under mild ergodicity
conditions is presented in Section 3. Section 4 deals with parametric rates of
convergence, asymptotic normality and law of the iterated logarithm. Finally
a short discussion compares the local time estimator with the kernel estimator.

6.1 Local time


6.1.1 Definition
Let X = (X t , t E R) be a real measurable continuous time process defined on
a probability space (fl, A , P) .

Let us consider the occupation measure l/T defined as

(6.1)

If l/T is a .s. absolutely continuous with respect to Lebesgue measure>. then


a local time (LT) for X is defined as a measurable random function CT(X ,W)
such that CT("W) is a version of d:r for almost all win n.
145
146 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR

In the following" a.s." is in general omitted and we often write CT (x) instead
of CT(X,W).
By definition we have

(6.2) faT IB(Xt )dt = Is CT(x)dx, BE BIR.

1:
Then, by linearity and monotone convergence we get

(6.3) faT <p(Xt)dt = 00


<p(x)CT(x)dx, <p E M+

where M+ denotes the class of positive Borelian functions.

In the framework of continuous semi martingales one defines another


type of local time, say L T , by using the MEYER and TANAKA formula (cf.
[YR]) :

(6.4) LT(x) = IXT - xl -IXo - xl - faT sgn(Xt - x)dXt , x E lR

where sgn(u) = llR+(u) -llR_(u), U ER

1:
By using (6.4) it is easy to check that

(6 .5) faT <p(Xt}d[X]t = 00


<p(x)LT(x)dx, <p E M+

where [X ] denotes the quadratic variation of the semimartingale X .

Comparing (6.5) with (6.3) we see that LT is an occupation density with


respect to d[XJt, Therefore if [X]T = faT u 2 (t , Xddt where u 2(-,.) is strictly
positive then CT does exist and is given by

(6.6) CT(x) = loT (J-2(t , x)d£X(t), x E R

In particular if X is a diffusion process defined as

(6 .7) X t = Xo + l S(Xs)ds + 1t u(Xs)dWs , t ~0


where Sand (J satisfy the usual conditions and where W is a standard Wiener
process, (cf. [CH-W]) then (6.6) becomes
LT(X)
(6.8) CT(X) = u 2(x) , x ER

Finally if X is a standard Brownian motion CT = LT'


6.1. LOCAL TIME 147

6.1.2 Existence
The following statement gives two classical existence criteria for local time.

THEOREM 6.1
1) Let X = (Xt , t E [0,1)) be a measurable real process with absolutely
continuous sample paths. Then the condition

(6.9) P(X'(t) = 0) = a for almost allt E: [0,1]

is necessary and sufficient for existence of LT and

(6.10)

where Ix = {t : X t = x}.
2) Let X = (Xt, t E [0,1)) be a measurable real process. Then X admits a
square integrable local time (i .e. £1 E L2(>\ @ P)) if and only if

(6.11) liminf ~ { P(IX t - Xsi ~ €)dsdt < 00.


<:10 € i[0,112

This theorem is due to GEMAN and HOROWITZ ([GH1], [GH2]). Proof is


omitted.

In the sequel we will use conditions which are slightly stronger then (6.11)
namely:

(A) fs,t(Y, z) is defined and measurable over (DC n [0, T])2) x U where U is
an open neighbourhood of D = {(x,x), x E JR}.

(B) The function


FT(y,z) = { fs ,t(y ,z)dsdt
i[O,Tj2

is defined in a neighbourhood of D and is continuous at each point of D.

Let us set Kl = {K : JR --; JR, K is a bounded density, a.e. continuous and


with compact support SK} ; and consider the random functions

(6.12)Zh(X) = ZhK (x) = liT (x-


h 0 K - hXt)
- dt, x E JR, h > 0, K E K 1 ·

We now state a theorem concerning existence of local time with some reg-
ularity properties.
148 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR

THEOREM 6.2 If (A) and (B) hold then X has a local time PT such that
2
sup E(Zf(x)-PT(x)) -+0, a<b,KEKI (C).
a$x$b h_O

Moreover x f-> PT(x) is continuous in mean square.


Proof
We first prove that if K(I) and K(2) E Kl we have

K(I) K(2») 2
(6.13) sup E ( Zh (x) - Zh' (x) (h, h/)~ (0,0) 0, a < b.
a$x$b

To this aim we consider

E ( K~I) (x - Xs)K~;) (x - Xt)dsdt


JrO,Tj2
By using Fubini theorem for positive functions together with (A) and (B) we
obtain for h and hi small enough

I hh
,
-1
K(l), ,K(2) (x ) -
JR2
Kh(1) ( x-u ) K h(2)
, ( x-v ) rT
D (
u,v )dUdv.

Hence

!I~~~),K(2) (x) - FT(x,x)! = 112 K(1)(U)K(2)(v)'Ph ,h,(u,V) dUdV I

where 'Ph,h'(U,V) = FT(x - hu,x - hlv) - FT(x,x).


Consequently
K(I),K(2)
!Ih,h' (x) - FT(x,x)
!~ sup l'Ph,h'(U,v)l·
u E SK(i)
V E SK(2)

Now it is easy to see that FT is uniformly continuous over D n (SK(l) x SK(2»).


Therefore
K(I) K(2)
!Ih,h' ! 0 . K(l) K(2) y
sup , (x) -FT(X,X) (h,h')-=:(O,O)' , E "--1
a$x$b

hence, (6.13) since


6.2. ESTIMATION BY LOCAL TIME 149

Now (6.13) means that

(6.14) II ZhK(I) - ZhI;'(2) liB


(h,h')~ (0,0)

where B = LOO([a, b], L2(n, A, P)) equipped with the norm

II 9 IIB= ess . sup


a$x$b
(rin l(x,W)dP(W)) 1/2, 9 E B .

Since B is complete (cf. [MAD there exists f!f. E B such that

II zt; - f!f. liB h~O 0, K E K 1.

Actually f!f. does not depend on K since (6.14) implies that Zt;<I) and Zt;<2)
have the same limit whatever K(l) and K(2) in K 1 .
We now prove that a version of fT =: f!f. is a version of local time.
For this purpose we choose K = 1[_! ,+!] and consider (h n ) ! 0. To each
integer k we may associate a sequence (v(n, k), n ~ 1) of integers such that
v( n, k) i 00 and
zt;(,., n, k)(X,W) n ----->
---+ 00
fT(X,W)

W E nk , Ixi ~ k ; where p(n k ) = l.


Furthermore we can suppose that (v(n, k + 1)) is a subsequence of (v(n, k)) for
all k.
Therefore, if no = nk~lnk and v(n) = : v(n , n) we get

(6 .15)

Now by using a lemma in [GH2] (p. 13,2°) and recalling that K = 1[_!,+!] we
conclude that fT is a (measurable) version of local time, defined over IR x no.
Finally it is straighforward to establish continuity in mean square for zt; and,
since zt; converges uniformly over compact sets, we deduce that fT is contin-
uous in mean square. •

6.2 Estimation by local time


Since fT is the density of VT = T· J.lT it is natural to define a density estimator
by setting
(6.16) O( X
jT )= T
fT(X) , x E JR .
150 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR

The following lemma shows that iT is an unbiased density estimator.


This is an astonishing property : recall that no unbiased density estimator
exists in discrete time !

LEMMA 6.1 (unbiasedness)


Let (Xt , 0:::; t :::; T) be a real measurable process such that Px, = p" 0 :::; t :::; T
and which admits a local time iT. Then

1) P, is absolutely continuous with density f and Ef~ is a version of f.

2) If (A) and (B) hold and if iT is defined by (C) then Ef~ is a continuous
version of f ·

Proof

1) From (6.2) we get

P,T(B) = 1f~(x)dx , BE BJR

and FUBINI theorem entails

(6.17) p,(B) = 1Ef~(x)dx, BE BJR

which means that p, has a density f and that Ef~ is a version of f.

2) By Theorem 6.2, x ,.... f~(x) is continuous in mean square hence x ,....


Ef~(x) is continuous as announced . •

f~ has other interesting properties, like recursivity and invariance,


which are not satisfied by the kernel estimator f f (cf. [BO-D)).

6.3 Consistency of the local time density


estimator
We now show that under mild conditions f¥ is a consistent estimator. The
results of this section are due to Y. DAVYDOV.

THEOREM 6.3 Let X = (X t , t E lR) be a strictly stationary real ergodic


process with local time. Then as T --+ 00

1) For almost all x


(6.18) f~(x) --+ f(x) a.s.
6.3. CONSISTENCY OF THE LT DENSITY ESTIMATOR 151

2) with probability 1
(6.19)
hence
(6.20)
and
(6.21) sup IfLT(B) - fL(B)1 -> 0 a.s.
BEB.

3) If the local time is square integrable

(6.22) E(fr(x) - f(x))2 -> 0 a.e

Proof
Without loss of generality we may suppose that

Xt = UtXO , t E JR
where UT~(W) = ~(Ttw), wEn and (Tt , t E JR) is a measurable group of
transformations in (n, A, P) which preserves P (cf. [DO]) .

On the other hand since iT is increasing with T we may and do suppose


that T is an integer.
Let us set
gk(X) = UkI'O(X) , k E Z, x E JR
where eo is a fixed version of the local time over [0,1] .
For all x the sequence (gk(X) , k E Z) is stationary and

Now the ergodic theorem of BIRKHOFF-KHINCHINE entails for all x


(6 .23) f~(x) -> g(x) a.s.
where g(x) = EIgo(x) and I is the a-algebra of invariant sets with respect to
T1 .
Since ergodicity of X does not imply triviality of I we have to show that
g(x) = lex) A 0 P a.e.
For this purpose we apply a variant of Fubini theorem : 9 being nonnegative
we have

1
g(x)dx = 1 EIgo(x)dx = EI 1 go (x)dx = 1 (a .s.)
152 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR

i.e. for almost all w, x ....... g(x) is a density.


Now Schefl"e lemma (cf. [BI2]) and (6.23) entail

(6.24) II I~ - 9 11 £1(>,)--+ 0 a.s.

which means that, almost surely, the empirical measure J-Ln converges in varia-
tion to the measure v with density g .
On the other hand the ergodic theorem implies

(6.25) --+ EIB(Xo) = J-L(B),


a.s.

for every B E BJR.


From (6.25) we deduce that if (B j ) is a sequence of Borel sets which generates
BJR there exists flo such that P(flo) = 1 and

Thus, with probability 1, v = J-L that is 9 = 1 .A 0 P a.e.


Hence (6.23) and (6.24) give (6 .18) and (6.19). (6.20) is a straightforward
consequence of (6.19) since II 10 - 1 1
1£1(>,):::: 2. Therefore (6.19) and Schefl"e
Lemma imply (6.21).
Finally if the local time is square integrable we have Elgo(x)12 < 00 a.e. Thus
(6 .23) implies El/n(x) - g(xW < 00 hence (6.22) since 9 = 1 .A 0 P a .e. •

Note that, in discrete time, (6.21) is not possible since the empirical mea-
sure J-Ln is orthogonal to J-L .

In order to obtain uniform convergence of I¥ we need an additional assump-


tion concerning f. T :

Let us recall that the modulus of continuity of a real function 9 is defined


as
wg(B , h) = sup Ig(y) - g(x)l, BE BJR, h > O.
x ,y E B
Ix - yl < h
Then our assumption is :

(L) limE [wlT([a , b],h)] =0; a<b; T>O .


h!O

Note that (L) implies the existence of a continuous version of f. T . An ex-


6.3. CONSISTENCY OF THE LT DENSITY ESTIMATOR 153

ample of a process satistying (L) is given by a Gaussian process, locally non-


deterministic in BERMAN's sense (cf. [BMJ) and such that for some 'Y > 0

For such a process there exists a version of fT and a positive constant CT


such that
EWlT(h) :s; cTh"Y
(cf. [GH2J proposition 25.11 and [IHJ tho 19 p. 372) .

THEOREM 6.4 If X is a strictly stationary ergodic process with a local time


satisfying (L), then f is continuous and lor each bounded interval [a, bJ we have

(6.26) II fr - f IIC([a,bJ) T~ 0 a.s.

Proof
Continuity of f comes directly from (L). Now let us consider E: > 0 and
a =Xo < Xl < . . . <XN = b such that Wj(fli' 6) < E: ; i = 0, . .. ,N - 1 ; where

,
fli = [Xi, XHlJ and 6 = max(xH1 - Xi).

Supposing that T is an integer we may write

1 n-1
II f~ - f IIC[a,bl= max sup - Lf(j)(x) - I(x)
0$t$N-1 XE~i n j=O

1 n-1
:s; E: + max sup - L f(j)(x) - f(Xi)
0$i$N-1 XE~i n j=O

1 n-1
where Tn = max
0$i$N-1
L
;;:
j=O
f(j)(Xi) - f(xd and where '(jl denotes local time

over [j,j + 1].


Hence

1 n-1
II in - I IIC[a,bl :S E: + max
O$i$N - 1 XE~i
sUP;;: L
j=O
Wl(j) (fli' 6) + Tn ·
154 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR

By Theorem 6.3, Tn --> 0 a.s . On the other hand the sequence (Wi{j) (t.i' 8) ;
j = 0,1, . . .) is stationary. Then the ergodic theorem gives
-.- 0 T
limn I fn - f IIC[a,b]::::: E + O~X},~-l E Wf(o) (t.i' 8) a.s.

Letting tend E to 0 and using our condition (L) we find


(6.27)
-
limn I 0
fn - f IIC[a,b]= 0 a.s.
Now if T is not an integer we may write

o
Ifr(x) 0
- f[T] (x)1 ::::: T I
1 tT(x) - t[T] (x ) I+ T[T] If[T](X)
0
- f(x) I + -T-f(x),
T - [T]

x E [a,b].

Since (L) entails that t([T]) is bounded over [a, b] we get

II f To - fO
[T]
I C[a,b] -< I t([T]) IIC[a ,b] + lifO _
[T] [T]
f II
C[a ,b]
+ II f IIC[a,b]
[T]
and finally (6.27) together with the ergodic theorem applied to ( II t(n) IIC[a,b])
give
II ff}. - f~] IIC[a,b] ---+ 0 a s. ., hence (6.26) . •
T-->oo

6.4 Rates of convergence


In this section we show that the LT estimator can reach parametric rates. We
always suppose that the Xt's have a common distribution but strict stationarity
is not used for obtaining consistency in mean square.

6.4.1 Quadratic error


As in Chapter 4 we set
9s ,t = fs ,t - f ® f.
The following lemma gives the exact variance of ff}. :
LEMMA 6.2 (Variance of ff}.)
Suppose that (A) and (B) hold, then

(6.28) V ff}.(x) = ;2 r J[O ,T j2


9s,t( X, x)dsdt, x ER
If in addition 9s ,t = 9It-.I , we have
6.4. RATES OF CONVERGENCE 155

Proof
Let K E K 1 , from Theorem 6.2 we get

E (Zf:(x))2 h-=-:E £~(x)


thus
E£~(x) = FT(x, x) = f fs,t(x, x)dsdt.
J[O,TF
On the other hand

(EZf( (X))2 --+ (E£T(X)? = T2 f(x) = f J2(x)dsdt


J[O,TJ2
hence (6.28).

Now, condition g.,t = g!t-.! implies

~ f
T J[O,TF
2
g.,t(x,x)dsdt = -T ((1 -
Jo
U
-T ) gu(x,x)du

hence (6.29) . •

The parametric rate appears in the following statement :


THEOREM 6.5 Suppose that (A) and (B) hold for all positive T then

1) If G(x) . Il
= hmsup -T
T-oo [O,TJ2
g.,t(x, x)dsdt < 00

we have
(6.30) lim sup TV f~(x) S G(x).
T ..... oo

2) If g.,t = g!t-.! and 1 00


Igu(x, x)ldu < 00

we have
+00

(6.31) TV f~(x) ---> / -00 gu(x, x)du.

Proof: clear . •

(6.31) is analogous to (4.43) : f~ and h have the same asymptotic effi-


ciency. Note however that f~ converges at parametric rate for a class of pro-
cesses larger than h . In fact, whatever be hT 1 0, it is possible to construct a
continuous density fo associated with a suitable process X such that
TEx(fT(X) - fo(x)) ---> +00
156 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR

hence h does not reach the full rate at fo .


Now, in the particular but important case where X is a diffusion process
[cf. (6.7)] Y. KUTOYANTS has studied the unbiased estimator

(6.32) fT(X) = Ta;(x) faT l{x, <x}dXt ,

where a 2 (.) is supposed to be known.

MEYER and TAN AKA formula shows that

fT(X) = Ta~(x) [~LT(X) - (XT - x)- + (Xo - X)-]

and from (6.8) we get


IT(x) = I~(x) + ~T(X)
where
2
~T(X) = Ta 2(x) [(Xo - x)- - (XT - x)-j .

Note that E~}(x) = 0 (;2). Thus IT and I~ have the same asymptotic
efficiency and 4.48 and Theorem 4.5 show that I~ is asymptotically minimax.

It is preferable to use I~ since it allows to suppress the assumption "a 2 ( . )


known" .

We now give some converses of Theorem 6.2 and 6.5. First let us introduce
an additional condition concerning Is ,t :
(A') - (A) holds, Is,t is continuous at each point of D and there exists Xo E JR,
which does not depend on (s,t), s of- t, (s ,t) E DC n [O,T]2, such that

(6.33) max Is,t(Y, z ) = fs ,t(xo, xo).


(y ,z)EU

(A') is satisfied if, for example, X is Gaussian stationary with autocorrelation


P satisfying Ip(u)1 < 1, u of- O.
THEOREM 6.6 If (A ') holds, (B) and (C) are equivalent.
Pmof
Suppose that (C) is satisfied. We have for h small enough

E (Zf:(xO))2 = r
J[O,Tj2 x [If?
Kh(XO - y)Kh( XO- z)Is ,t(y,z)dsdtdydz .
6.4. RATES OF CONVERGENCE 157

Then Fatou lemma and (C) imply

but fs,t is continuous at (xo,xo) (cf. (A')) thus the liminf is a limit, hence

Ef}(xo):::: r
J10,T]2
fs,t(xo,xo)dsdt

consequently, (A') entails

FT(y, z) = r
J10,T)2
fs,t(Y, z)dsdt::; r
J10,T)2
fs,t(xo, J:o)dsdt < 00,

and continuity of FT follows from the dominated convergence theorem: (B) is


satisfied.

Conversely j if (B) holds, then (A') and (B) imply (C) by Theorem 6.2. •

In order to obtain equivalence for the parametric rate we now introduce a con-
dition stronger than (A') :
(A") - f is continuous and bounded. fu(Y, z) is defined for all u f= 0, symmetric
with respect to u, measurable with respect to (u , Y, z), continuous over D and
such that II fu 1100= fu(xo,xo) where xo does not depend on u.

We shall also use the "asymptotic independence" condition introduced in [e-


Ll:

(D) - 1 +00
To
II gu 1100 du < 00 for some positive To ·

In the following IC denotes the class of bounded symmetric densities such


that lim IYIK(y) = 0 and fF denotes the kernel estimator associated with
Iyl-+oo
K.

THEOREM 6.7 If X satisfies (A") and (D) the following conditions are
equivalent :

(1) 1 II
00
gu 1100 du < 00 (C.L. condition)

(2)
158 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR

(3) There exists K OE K. n K.l such that

T · Vfr(x) -+ e~Ko) , x E IR
T_""

(4) (C) holds and the local time estimator satisfies

T· Vf~(x) T-_oo
+ ex , x E IR

• Moreover :

e(K)
x = e(Ko)
x = ex
= u J +OO
9 (x , x)du ,
- 00
x E IR .

Proof

• (1) => (2) is proved in [C-LJ, theorem 3 and

• (2) => (3) is obvious.


c~K) = i: gu(x , x)du, x E IR.

• (3) => (4) : For T > To we have

T· Vfr(x) = 21 TO
(1-~) (CPT(U) -1j;T)du+IT
where
CPT( u) r K~T(X - y)K~T(X -
I/{2 z)f,,(y, z)dydz,

1j;T r K~T(X - y)K~T(X - z)f(y)f(z)dydz


lll.2
and

Ir = 2 ((1 -~) du r K~T(X - y)K~T(X - z )gu(y, z )dydz.


lTD T lll.2
By using D it is easy to see that

Now, by dominated convergence, (y, z ) -+ 1To


+00

gu(Y, z)du is continuous in

1
(x , x), hence
+00

Ir T~oo 2 g,,(x , x)du


To
6.4. RATES OF CONVERGENCE 159

hence
TViTK (x)-Jr-.c xK -2
O O jTo
+OO
gu(x,x)du=:cI

and, since t/JT --+ P(u), we get

But continuity of III. at (x,x) implies

rpT(U) T-=:'oo Iu(x, x) , u>O

then, from FATOU lemma we deduce that

2 foTO fu(x,x}du ~ c' + 2ToI2(x} , xER

thus
{TO
Jo Iu(xo,xo)du < 00,

and
{TO
Jo II gil. 1100 du < 00,

100 II
so that
gil. 1100 du < 00 .
This result, together with A" , imply A and B. So Theorem 6.2 entails C. Finally
(5 .12) is nothing but (5.8) . In particular Cx = [:00 gu(x, x)du, x E R .
• (4) => (1).
Since A" implies A' and C holds then B holds by Theorem 6.6. In particular

Now, if T > To , we have

r~ {~
Jo fu(xo , xo}du ~ (T - To) Jo fu(xo, xo}du
{To {T
~ Jo (T - u)fu(xo, xo)du ~ J o (T - u)fu(xo, xo)du < 00,
160 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR

hence
{TO (TO
Jo II gu 1100 du ~ Jo II fu 1100 du + To II f II~ < 00

1 II
and finally
00
gu 1100 du < 00 .
The proof of Theorem 6.7 is therefore complete . •

COROLLARY 6.1 Let X be a stationary Gaussian process satisfying (6.11)


and such that
r(t) rv aC'Y as t -+ 00

a> 0, "( > 0, where r denotes autocovariance ; then, as T -+ 00

1 1
if "(>-
T 2
LogT 1
(6.34) V h(O) rv if "(=-
T 2
T - 'Y 1
if " « -
2
Proof is straightforward and therefore omitted.

6.4.2 Asymptotic normality


In order to obtain asymptotic normality of f¥ we will use the following as-
sumptions

(a) X is strictly stationary and strongly mixing with

a(u) ~ au-{3, u >0

I:
where {3 > 1 and a > o.

(b) gu exists for u f= 0, is continuous over D and II gu 1100 du < 00.

. 2{3
(c) There eXIsts 6 > (3 _ 1 such that E£i(x) < 00, x ER

Note that (b) implies the existence of £T such that E£}(x) < 00. (c) is satisfied
by diffusion processes (see [BA-YR]) and more generally by Markov processes
(see lemma 6.3 below) under mild regularity conditions. If X is geometrically
strongly mixing, the condition for 6 becomes 6 > 2.
6.4. RATES OF CONVERGENCE 161

THEOREM 6.8 If X satisfies (a), (b) and (c) then

V
(6 .35) VT(f~(X1) - f(x1),. · ·, f~(xk) - f(xk)) --+ Nk rv N(O, r)
T->oo

r= [1+
-00
00
9U (Xi,Xj)dU]
1~i,j~k
.

Proof (sketch) :
As above we may suppose that T is an integer. On the other hand it suffices to
prove (6.35) for k = 1 since for k > 1 we can use the CRAMER-WOLD device
(cf. [BI2]). Finally theorem I. 7 gives the desired result . •

As a by-product of (6.35) we get the relation

1 +00

-00
9u(X,x)du = L
+00

k=-oo
COV(£1(X),£(k)(X)), X E R

6.4.3 Functional law of the iterated logarithm


Let us set

0::; t::; 1, n 2: 3, un(x) = ( 2nLogLogn 1 +00

-00 9u(X,X)du)
\ 1/2
, then we have

THEOREM 6.9 If X satisfies (a), (b), (c) and if J -DO


+(Xl

9u(X,x)du > 0 then

(6.36) lim d(Yn, S)


n->oo
=0 and c(Yn) =S
where d is the uniform distance, cO the set of limit points of (.) and S the

0,1
STRASSEN set defined as S = {rp : [0,1] -> R : rp absolutely continuous,
1
rp(O) = rp/2(t)dt::; I}.

Proof
(6.36) is an easy consequence of STRA88EN ([8]) and RlO ([R2]) results .

162 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR

6.4.4 Parametric rates of pointwise convergence and


uniform convergence
In order to get parametric rates in pointwise convergence we need the following
assumptions :

(i) X is a strongly mixing stationary process with

(6.37)

w here a > 0 and 0 < p < 1.


(ii) X admits a local time £1 which satisfies

(6 .38)

for some b > o.


Note that (6.37) is satisfied if X is a stationary diffusion process (see [DK1]) .
Concerning (6.38) we have the following


LEMMA 6.3 Let X = (Xt , t :::: 0) be a stationary Markov process such that
fs(Y , z) does exist for s > 0 and (y , z) ...... fs(y , z)ds < 00, (y,z) E 1R 2 for
some positive c > 0 and is continuous over 1R 2 . Then X satisfies (6.38) .
Proof is left to the reader.
We now state our theorem :
THEOREM 6.10 If (i) and (ii) hold and if x ...... E£T(x) is continuous then

(6.39) L
JT 0
T L L T1fr(x)-f(x)1 T----+ Da.s., XEIR
og . og og -> 00

where f denotes the continuous version of density.


LogT . LogLogT
In the following we will put CT = JT
Proof
First Ef!j.(x) = f(x) for all x. Now we may suppose that T is an integer
since
[T] 0 T- [T] 0
(6.40) T(f[T[(X) - f(x)) - -T-f(x) ~ fr(x)
and
o [T] + 1 0 f(x)
(6.41) fr(x) ~ -T-U[T[+I(x ) - f(x)) +T
6.4. RATES OF CONVERGENCE 163

thus c;;I(f~(X) - f(x)) -+ 0 implies cTI(f!f.(X) - f(x)) -+ O.


In order to establish (6.39) we apply theorem 1.6 to the stationary sequence
(£(n)(x) - f(x), n 2: 1). It suffices to notice that (ii) entails CRAMER's
conditions :
E£Zi)(X) ::; Ck - 2 k!E£ri) (x) < 00 j
i 2: 1, k 2: 3, for some c > 0 (see [A-Z]) .
Hence (1.42) implies (6.39) . •

We now turn to uniform convergence for which we need two additional condi-
tions :
(iii) inf EI£I(x) - f(x)12 > 0; (a, b) E 1R 2 , a < b
a~x~b

(iiii) Wl(I)([a,b],8)::; VI 8"1 , 8 > 0 where 'Y > 0 and where VI is an integrable
random variable which does not depend on 8.

THEOREM 6.11 If (i) ...... (iiii) hold and if x ...... E£T(a:) is continuous, then
for all (a,b) E ]R2, a < b

(6.42) LTV;: L T sup If!f.(x) - f(x)1 T--+ 0 a.s.


og . og og a~x~b -+ 00

where f denotes the continuous version of density.


Proof
We may and do suppose that [a, b] = [0,1] . On the other hand we may
suppose that T is an integer since (6.40) and (6.41) entail

II fr° - f II::; -[T]T+-1 II °


f[TJ+l - f II [T]
+T II °
f[TJ - f II 2
+T II f II
where II . II denotes the sup norm on e[O, 1] .
First, inequality (1.34) leads to the bound

(6.43) P(c n- Ilfo(x)


n -
f(x)l> 1)) <
-
CI(1))
nLogLogn

1) > 0, °: ; x ::; 1, n 2: 3, where CI does not depend of x since (iii) is satisfied.

Now let us choose 8n = [:13] where f3 > 2~ (cf. (iiii)). We have the decompo-
sition

+If(j8n) - f(x)l, j8n ::; x < (j + 1)8n , n 2: 1


164 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR

we treat each term separately.


First we have

By using (iiii) and the continuity of f~ we get

where V; is the r.v. associated with Wi(i) in (iiii). Note that such a V; does
! -'Yi3
exist since X is stationary. Now €;;:18J rv n L ------> 0 as n -4 00
Logn· ogLogn
1
and the ergodic theorem gives -
n
L V;
n
------>
a.s.
EV1 ·
;=1

Consequently

(6.45) €~1 max sup If~(x) - f~(j8n)1 ------> 0 a.s .


J j6n~x<(j+l)6n

On the other hand (6.43) gives

p (suP€~1If~(j8n)
j
- 2~) ,TJ > 0
f(j8n )1 > TJ) S n i3 nogogn

thus Borel-Cantelli lemma entails

(6.46) sup€~1If~(j8n) - f(jbn)1 ------> 0 a.s.


j

Now we have

hence
(6.47) E~l max sup If(x) - f(j8 n )1 ------> O.
J j6n~x~(j+l)6n

Finally (6.45), (6.46), (6.47) together with (6.44) imply €;;:l II f~ - f 11------> 0
a.s. which in turn implies (6.42) . •
6.5. DISCUSSION 165

6.5 Discussion
We have seen that f¥ has many interesting properties : in particular unbi-
asedness and asymptotic efficiency for a large class of processes. The kernel
estimateur ff is not unbiased and does not reach parametric rate for all con-
tinuous densities. Note that theorem 6.2 and definition of LT imply that

(6.48) L2 and a.s.

Clearly f~ and ff have a theoretical character since approximation is needed


in order to compute them. A good approximation for fjl should be "the em-
pirical local time" based on local time or crossings obtained by interpolating
discrete data. Another possibility is an approximation by the classical kernel
estimator associated with these data.

We now give a result in this direction. Let us set


1
=:;:;: L
n
[~(x) Kh" (x - X iT/ n ) , x E R
i=1

We have the following :


THEOREM 6.12 Let X = (Xt, t E R) be a real measurable strictly station-
ary process with a local time R.T and such that

(a)
EWiT(h):SCTh>', h>O
where WiT is the modulus of continuity of R.T and Cr > 0 and A > 0 are
constants.

(b)
EIXt - X,I :S dTlt - sl', (8, t) E [0, T]2
where dT > 0 and I > 0 are constants.

Suppose that K is a density with bounded derivative and such that L


lui>' K(u)du <
00, then

Consequently ifn -+ 00, T is fixed and h n ~ n-,/(2+>.) we have

(6.50)
166 CHAPTER 6. THE LOCAL TIME DENSITY ESTIMATOR

Proof
First we consider for all x

Since the bound does not depend on x we obtain

E I n I 00_
8(1) < II Th21100
K'
n
~
~
i=1
l iTln

(i-1)Tln
ElK.Tln - X tIdt
·

Now, by using stationarity and assumption (b) we get

thus
(6.51) E 118(1) I < dT II K'lloo~.
n 00_ 1+1 n'Yh~

We now turn to the study of

By using (6.3) and J Kh n = 1 we obtain

~ l Khn(X - Y)[£T(Y) - £T(x)]dy

~
T
rK(Z)[£T(X - hnz) - £T(x)]dz.
iIR
Hence
18~2)(x)1 ::; ~ l K(z)wiT(hnlzl)dz, x E lR

and assumption (a) entails

(6.52)
6.5. DISCUSSION 167

Finally (6.51) and (6.52) give (6.49) . (6.50) is clear . •

Theorem 6.12 shows that the local time point of view gives new insight on
this kernel estimator since the choice of bandwith will be influenced by this
approximation aspect.

Notes

The role of local time in density estimation has been first noticed by NGUYEN and
PHAM (1980). KUTOYANTS (1995,1997) has studied the unbiased estimator (6.32).
See also DOUKHAN and LEON (1994), BOSQ (1997). Concerning approximation
of local time we refer to AZAIS - FLORENS (1987) and DAVYDOV (1997, 1998)
among others. Apart from Theorem 6.1 results in this Chapter are new . They appear
in BOSQ-DAVYDOV (1998) except Theorem 6.12 which is original.
Chapter 7

Implementation of
nonparametric method and
numerical applications

In this final chapter we discuss practical implementation of kernel estima-


tors and predictors and we give numerical examples with some comments. We
only examine the case of discrete data.

Section 1 deals with implementation : we study stabilization of variance,


estimation or elimination of trend and seasonality, and construction of estima-
tors and predictors for stationary processes.

Numerical applications appear in Sections 2 and 3. Comparison between


parametric and non parametric predictors, via numerical results, is considered
in Section 2 when examples of specific applications to Finance and Economics
appear in Section 3. Figures and tables appear in the text and in Annex.

7.1 Implementation of nonparanletric method


In the current section we discuss the practical implementation of the kernel
estimators and predictors.

7.1.1 Stabilization of variance


If the observed process, say ((t) , possesses a marked trend characterized by
a non-constant variance, this one may be eliminated by using a preliminary
transformation of the data.

169
170 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

For positive (t 's an often used method is the so-called BOX and COX trans-
formation defined as

TA((t) = T
(A _ 1
, A> 0

To((t) = Log(t = limA-+o(+) TA((t)

where A has to be estimated (cf. [GU]).

If the variance of (t is known to be proportional to the mean (respectively


the square of the mean) then A = % (respectively A = 0) is adequate.

If the variability of ((t) is unknown one can estimate A by minimizing


n
L (TA((t) - (n)2
t=1

1
L (t·
n
where (n =-
n t=1

7.1.2 Trend and seasonality


Let ('TIt) be a real process with constant variance. It may be represented by the
classical decomposition model

(7.1) 1]t = /Jt + O't + ~t , t E Z

where (/Jt) is a slowly varying function (the "trend component"), (O't) a pe-
riodic function with known period 7 (the "seasonal component") and (~t) a
stationary zero mean process.

If /J and 0' have a parametric form, their estimation may be performed using
least square method. Suppose for instance that

(7.2)

and that
(7.3)

where
O'kt = l{t=k(modr)}i k = 1, . . . ,7.
7.1 . IMPLEMENTATION OF NONPARAMETRlC 171

L
T

Since O"kt = 1 it is necessary to introduce an additional condition which


k=l
should ensure the identifiability of the model. A natural condition is

(7.4)

which expresses the compensation of seasonal effects over a period.

Now, given the data 1]1, . .. ,1]n, the least square estimators of a1,' .. ,
a p , C1, ... ,CT are obtained by minimizing
n

L(1]t - J.tt - O"t)2


t=l

under the constraint (7.4).

The elimination of J.tt and O"t is an alternative technique which seems prefer-
able to estimation because it is more flexible :

In absence of seasonality the trend may be approximated by smoothing


considering for instance the moving average

1 q
(7.5) Pt = 2 + 1 L 1]t+j , q + 1 ::; t ::; n- q
q j=-q

and then eliminated by constructing the artificial data

tt = 1]t - Pt ,q + 1 ::; t ::; n - q.

Another method of elimination is differencing:


Let us consider the first difference operator V and its powers defined as

V1]t = 1]t - 7]t-1

and
Vk7]t = V (V k - l 1]t) ,k ~ 1,
then if J.tt has the polynomial form (7.2) we get

(7.6)

and consequently (V P 17t) is a stationary process with mean p!a p .


172 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

In the general case where both trend and seasonality appear, the first step
is to approximate the trend by using a moving average which eliminates the
seasonality. If the period T is even, one may set q = ~ and put

(7 .7) f.1.; = ~ (~1]t-q + 1]t-q+1 + . .. +1]t+q-1 + ~1]t+q) ,

q ~t~n - q. If Tis odd, one may use (7.5) with q = T; 1.


Now, in order to approximate the seasonal component one may consider
1
L
T
Ck = Vk - -:; Vj j k = 1, . . . ,T
j=1

where Vj denotes the average of the quantities 1]j+iT-ilj+iT , q < j+iT ~ n-q.
Then, considering the artificial data 1]t = Ct (where Ct = Ck if t = k(mod
T)), one obtains a model with trend and without seasonality which allows the
use of (7.5) .

Some details about the above method may be found in [BDl .

Note that differencing may also be used for seasonality. Here the difference
operator is given by
\lT1]t = 1]t - 1]t-T ·

Applying \l T one obtains the non-seasonal model.

Clearly all the above techniques suffer the drawback of perturbating the
data. Thus, if St = f.1.t + at does not vary too much the model (3 .36) may be
considered.

In that case a "cynical" method consists in ignoring St ! The discussion and


result in subsection 3.4.2 show that, in a nonparametric context, this method
turns to be effective.

7.1.3 Construction of nonparametric estimators for sta-


tionary processes
If the observed process, say (~t), is known to be stationary and if one wants
to estimate the marginal density f or the regression TO = E(m(~t+H) I
(~t, . . . ,~t-k+d = .),the construction of a kernel estimator requires a choice of
K and h n .
7.1. IMPLEMENTATION OF NONPARAMETRlC 173

Some theoretical results (cf. [EP]) show that the choice of K does not much
influence the asymptotic behaviour of fn or rn : the naive kernel, the normal
kernel and the Epanechnikov kernel are more or less equivalent.

On the contrary the choice of hn turns to be crucial for the estimator's


accuracy. A great deal of papers have been published on the subject. We
refer to BERLINET-DEVROYE (1994) for a comprehensive treatment and
an extensive bibliography concerning the density. For the regression one may
consults [GHSV] and the books by HARDLE ([HA-I] and [HA-2]). Here for the
sake of simplicity we only discuss the problem for one-dimensional densities.
The general case may be treated similarly.

a) Plug-in method
The best asymptotic choice of hn at a point x is given by (2.7) : if
hn = cn n- 1/ 5 where en -> c > 0 and if assumptions of Theorem 2.1 hold,
then

n 4/ 5E{fn(x) - f(x))2 -> c; 1"2(x) ( / 'lJ,2 K(U)dU) 2


(7.8)
+f~) / K2

thus, the best c at x is

f"2(X))-1/5 ((JU2K(U)dU?)-1/5
(7.9) eo(x) =( f(x) f K2
Now, it may be easily proved that

n 4 / 5E II fn - f 1I1,2(A) -> c; / 1"2 ( / U2K(U)dU) 2


(7.10)
f K2
+--
c
thus, the best c associated with the asymptotic Mean integrated square
error (MISE) is

(7.11)

The estimation of eo(x) and eo(f) requires the construction of prelimi-


nary estimates of f and f".
174 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

1 ,.2
For that purpose we may choose K(u) = /iCe- T and consider the case
y27r
1 x2
where f(x) = /iCe-2,;'I. Then co(f) may be approximated
ay27r
1 ) 1/2
by an = ( ;:;-~(~t - ~n)2 , and a convenient choice of hn is

(7.12)

An alternative choice of h n should be the more robust

(7.13)

where ~(1)' ... '~(n) denotes the order statistics associated with 6, ... '~n.

The above considerations lead to the preliminary estimate

(7.14)

and i~ may be taken as an estimate of f". Note that if the graph of


in is too erratic it should be useful to smooth it by using polynomial
interpolation before performing the derivation.

Now the final estimates f~ and f~* are constructed from in and i~ by
setting

(7.15)

and
(7.16) h~' = (2J?r) 1/5 (J J:?(x)) -1/5
n- 1/ 5

hence

(7.17)

and

(7.18)
**
f (x)
n
1 n
= - - ""'
nh**
n L
1
- - exp (
t=l
1 x - ~t
--(- -
'27r
V"!;7r
2 h**
n
)2) '
x E JR.
7.1. IMPLEMENTATION OF NONPARAMETRlC 175

b) Cross-validation
If the regularity of f is unknown one can employ an empirical maximum
likelihood method for the determination of h.

Let us suppose again that K is the normal kernel and consider the em-
pirical likelihood
n
L(h) = II fn,h(~t) ,h > 0
t=l
where

f n,h (x) = :h ~ K ( x ~ ~s ) ,x E lR

we have sUPh>O L(h) = +00 since

L(h) ~ (:~)) n h~+OO .


It is possible to remove that difficulty by using a leave-out procedure.
Let us set
n
Lv(h) = II fl~l,h(~t)
t=l

-
where
(t) () 1 ~ (~t ~s)
fn - 1,h ~t = (n _ 1)h ~K ---,;- .
s,pt

We now have

and

Then the empirical maximum likelihood estimate iin does exist, hence
the estimate

(7.19) _
fn(x} 1 L
= --- n
-v'27r 1 (x
1 exp (-- _~i 2)
--_. )
,xER
nhn t=l 2 hn

The above methode being based on the maximum likelihood is not robust
and difficult to manipulate when data are correlated.
176 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

Let us now suppose that the observed process (et) is a-mixing. We intend
to specify an optimal hn with respect to the measure of accuracy

ISE(h) = jUn,h(X) - f(x))zm(x)dx

where m is a positive weight function.

For this purpose we define the cross validation criterion

with
t ( ) 1 ~ (x -
fn x = h;:y(t) ~ K - h -
e8 ) ,(t - s)

, is a given function such that ,(0) = 0, ,(x) = 1 if x> I/n , 0 :=:; ,(x) :=:; 1 if
n
x:=:; I/n , where I/n is a positive integer and ;:y(t) = L ,(t - s). Here, defines
8=1
a smooth leave-out procedure.

Now if f has r continuous derivatives we define hn as

hn = arg min C(h)


hE Hn

where Hn is a finite subset of [eln-a, C2n-b] with Cl > 0, Cz > 0, 0 < b :=:;
1 1
--<a<--.
2r + 1 2r + ~
Under some regularity conditions HART and VIEU have proved that hn is
asymptotically optimal in the following sense :

ISE(h n ) ~ 1
infhEHn ISE
we refer to [GHSV] for a complete proof of this result.

Conclusion
Note first that other interesting methods are discussed in [BE-DE], partic-
ularly the double kernel method.
Now, the comparison between the various methods is somewhat difficult. It
should be noticed that the normal kernel (or the EPANECHNIKOV kernel)
and hn = O'nn- 1 / 5 are commonly used in practice and that they provide good
results in many cases for constructing f n or r n '
7.2. PARAMETRIC AND NONPARAMETRIC PREDICTORS 177

7.1.4 Prediction
The nonparametric predictor comes directly from the regression estimator
where (K, h n ) is chosen as indicated in the previous Section.

It remains to choose k : if (~d is known to be a kth order Markov process,


the predictor is given by (3.19). In particular if k = 1 and if the data ~l' ... , ~N
are real the nonparametric predictor of ~N+H has the simple form

(7.20)

In the general case it is necessary to find a suitable k (or kN, see 3.4.1). For
convenience we suppose that (~t) is a real process and H = 1. Now let us
consider
2
L
N
(7.21 ) b.N(k) = (~t -ft(k)) , 1 ~k~ ko
t=no

where no and ko are given and ft(k) stands for the predictor of ~t based on the
data 6, ... , ~t-l and associated with the regression E( ~t I (~t-l' ... , ~t-k) = .).

Minimization of b.N(k) gives a suitable k, say kN .

We finally obtain the predictor defined by (3.34). Note that the above
method remains valid if the process is not stationary provided the data should
be of the form (3.36). Otherwise one can stationarize the process by using the
methods indicated in the previous Sections of the current Chapter.

It is noteworthy that the presence of exogenous variables does not modify


the predictor since these ones can be integrated in the nonparametric model.

7.2 Comparison between parametric and non-


parametric predictors
7.2.1 Parametric predictors
The popular BOX and JENKINS method is based on the ARMA (p, q)
model. Recall that a real process (~t, t E Z) is said to be ARMA (p, q) if it
178 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

satisfies a relation of the form

(~t - m) - (h(~t-l - m) - ... - ¢p(~t-p - m)


(7.22)
= €t - 81 €t-l - . .. - 8q€t - q

where (€t) is a white noise (Le. the €t'S are i.i.d. and such that 0 < (}"2 = E€~ <
00, E€t = 0) and m ; ¢1, "" ¢p ; 8 1, .. " 8q are real parameters.

If the polynomials

¢(z) =1- ¢lZ - .. . - ¢pzp

and
8(z) = 1 - 81 z - ... - 8qz q
have no common zeroes and are such that ¢p8q # 0 and ¢(z)8(z) # 0 for Izl :s: 1
then (7.22) admits a unique stationary solution.

Now the BOX and JENKINS (BJ) method mainly consists of the following
steps:

1. Elimination of trend and seasonality by differencing.

2. Identification of (p,q).

3. Estimation of m ; 81 " " , 8q ; ¢I, . . . ,¢p ; (}"2 .

4. Construction of the predictor by using the estimated model.

For details we refer to [B-J], [G-M] and [B-D].

Improvement of the BJ method are obtained by introducing nonlinear


models, in particular the ARCH model has been considered by ENGLE (1982).
An extensive bibliography concerning nonlinear models appears in GUEGAN
(1994) .

7.2.2 Parametric versus Nonparametric


A systematic comparison between parametric (BJ) and nonparametric (NP)
forecasts has been performed by CARBON and DELECROIX (1993) . They
have considered 17 series constructed by simulation or taken from engineering,
economics and physics.

Let us now specify these series :


7.2. PARAMETRlC AND NONPARAMETRlC PREDICTORS 179

• Series 1 and 2 come from simulated AR(I) processes of the form


et - m = </Jr(et-l - m) + Ct , t EZ
with 1>1 = 0,9 (resp. 0,99) and m = 1000. These processes are "limit
stationary" since 1>1 is close to 1.
• Series 3, 4, 5 and 6 come from various simulated ARMA processes.
• Series 7, 8 and 9 are simulations of ARMA processes with contamination
or perturbation.
• Series 10 to 15 are data sets respectively from: profit margin, cigar
consumption, change in business, inventories, coal , yields from a batch
chemical process, chemical process concentration readings.
• Finally series 16 and 17 are generated by simulated AR processes with Ct
uniform over [-49, +49J .
Here the construction of nonparametric predictor is carried out as follows
• k = k is chosen by using (7.21).
• K(x) = (27r)-k/2 exp (_" ~112), X E IRk .

• h n= ann- 1/(4+k) where an = [~t (et - ~n)2] 1/2 and n =N - H.


t=1
In order to quantify the prediction error two criteria are utilized:
• The EMO defined as

EMO =~ t
t=n-k+I
Iet et- ft I
• The EMP defined as

EMP= ~
k
t
t=n-k+l
i§ii&
where ft is the predictor of et constructed from the data 6, ·· . ,et-l
and £it the empirical quantile associated with the theoretical quantile qt
defined by

The numerical tables 1 to 17 appear in the Annex.

The NP-predictor is better than BJ 12 times out of 1'1 for the EMO and 14
times out of 17 for the EMP.
180 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

7.2.3 ARCH versus robust Nonparametric


A.C.M. ROSA (1993) has compared parametric predictors with some robust
non parametric predictors.
The parametric model she uses is the ARMA model and the ARMA model
with GARCH effect (BOLLERSLEV (1986)). This model is defined by (7.22)
but here (et) is not a white noise; it satisfies the conditions

and

with
q p'

a; = ao + I: aieLi + I: {3ja;_j ,t E Z
;=1 j=l

where ao, aI, ... , a q" {31, ... , {3p' are real positive parameters such that
q p'

I: ai + I: {3j < l.
i=l j=l
If the conditional distribution of et given et-1, et-2, . .. is gaussian, (~t) is
strictly stationary.
Concerning the robust nonparametric predictors they are based on a-truncated
mean and estimators of the conditional median and the conditional mode.
Here we only describe the conditional mode predictor. It is constructed
by considering a kernel estimator of the conditional density of ~t given ~~~)1 =
(~t-1' ... ,~t-k)' namely
n
I: h;;I Ko (h;;l(y - ~t)) K1 (h;;l(x - ~I~)l))

(7.23)
!n(Ylx) = t=k+l n
I: Kl (h;;l(X_~~~)l))
t=k+1
Y E IR, x E IRk

where Ko and Kl are strictly positive continuous kernels respectively defined


over IR and IRk. Now the conditional mode predictor is defined as

(7.24) ~ ( Y I ~n(k)) .
X n• +1 = argm:xfn

The method for choosing parameters is the same as in 7.2.3. The comparisons
appear in figures 2 and 3. Parametric (resp. theoretical) and nonparametric
forecasts are more or less equivalent.
7.2. PARAMETRIC AND NONPARAMETRIC PREDICTORS 181

4S0 I
400
II
BJ prediction~
350 data
)00
I
250

200

150 NP predictions
100

133 134 11S 136 131 138 09 14() 141 142 14) 144

Life insurance
BJ : SARIMA (3,0,0) (0,1, 1h2
NP : Conditional mode k = 6
Figure 2

1008 T
1006 . theoretical
1004 predictions

996

992

1
990
994

988

986 + - - - t - I- - - + 1 - - + 1- - t - I- - - i l - - + I - - t - I- - - i l - - - - I I
91 n m ~ ~ % n fl 99 ~

AR(3) with GARCH(l,l) effect


~t= 100 + O.4~t-l + 0 . 3~t-2 + 0.2~t-3 + C:t
o} = 2 + o.lc:Ll + 0.80r-l
Figure 3
182 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

7.3 Some applications to Finance and Economics


7.3.1 An application to Finance
The use of Statistical models in Finance has become very popular during
the last decade.

These models are, in general, parametric ; a typical example should be


the famous BLACK-SCHOLES option price formula based on the stochastic
differential equation.

(7.25)

where m =I 0 and (J' > 0 are constants and where (Wt ) is a standard Wiener
process. The initial condition Xo is supposed to be constant and strictly posi-
tive.

The solution of (7.23) (cf. [C-W]) is

(7.26)

and the obtained Statistical model depends only on (m, (J'2).

Parametric models are useful but the nonlinear character and the complex-
ity of financial evolutions allow to think that, in some situations, nonparametric
methods are well adapted to their analysis.

As an example we give some indications about a recent work by B. MIL-


CAMPS (1995) which is devoted to the analysis of european intermarkets cor-
relations from generic yields and to the construction of forecasts for these yields.

The author shows that these correlations are not well explained by linear
correlation coefficients, Principal Component Analysis or ARMA models.

This leads him to consider a nonparametric approach by using the tau-b


KENDALL coefficient, rank tests and non parametric predictors.

Here we only give the results concerning the variations of french ten years
yields.
The nonparametric predictor is constructed with the Gaussian kernel (figure
1) or the EPANECHNIKOV kernel (figure 2), kn = 14 (or 15) and hn is chosen
7.3. SOME APPLICATIONS TO FINANCE AND ECONOMICS 183

2l

2{1
r \ ,.. ....\
Il I' '~J
,\ " \~;- - - ".,1 \" . . '
\
' ...... ,
10 " \

·l

· 10

·11

·1 ~ ~ . ~ ---- ~-----~ -~----

May-95 Jun-95 Jul-95 Au-95

Forecasting of french ten years yields with confidence interval


(Gaussian kernel, k = 14)
Figure 4

II

10

-l

- 10

~ Il

· ;20 - •.- .~ __ , ___ ._ . ___ ,..---_


'~--~--~-~-~,--~--~-

May-95 Jun-95 Jul-95 Au-95

Forecasting of french ten years yields


(Epanechnikov kernel, k = 15)
Figure 5
184 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

by using a combination of (7.12) and (7.13) adapted to kn, namely

(7.27)

(7.25) is recommended in [B-D].


The results are quite satisfactory. For a comparison with the BJ-predictor
we refer to p. 2 of the current book.

7.3.2 Others applications


POGGI (1994) has carried out a very complete and precise work about
prediction of global french electricity consumption.
In order to take in account nonstationarity POGGI has made affine trans-
formations of the data before using the nonparametric predictor. The results
are very good (cf. figures 8 and 9).
Another work concerning french car registration shows the good behaviour
of the NP-predictor when the horizon increases (cf. table 18, figures 6 and 7).
Among other works let us indicate [DC-OP-TH] where the air pollution in
Paris is studied by nonparametric methods (including exogenous variables).
The quality of all these predictors may be explained by the robustness
pointed out in 3.4.2. In fact, the non parametric method uses the information
supplied by the history of the process (including seasonality) while the para-
metric technic needs to eliminate trend and seasonality before the construction
of a stationary model.

Notes
Among the great quantity of methods for implementing non parametric estimators
and predictors we have chosen those which are often used in practice.
Subsections concerning stationarization are inspired by GUERRE (1995) and
BROCKWELL and DAVIS (1991) . (7.12) appears in DEHEUVELS and HOMINAL
(1980) when (7.13) comes from BERLINET and DEVROYE (1994)
The smooth leave out procedure is in GYORFI, HARDLE, SARDA and VIEU
(1989) . A discussion concerning cross validation may be found in BRONIATOWSKI
(1993). See also MARRON (1991).
The numerical applications presented here appear in CARBON and DELECROIX
(1993), ROSA (1993), POGGI (1994), MILCAMPS (1995).
7.4. ANNEX 185

7.4 Annex

AR1 Xt = 0.9Xt-l + 1000 + €t €t "-'> N(O, 5)


n = 100, H = 5
B.J. (p, d, q) EMO EMP
(1,0,0) 0.089* 0.136·
(0,0,2) 0.135 0.141
(1,0,1) 0.089· 0.140·
N.P.p
1 0.098 0.128
5 0.077 0.120
10 0.085 0.116·
P = 19 0.062· 0.1213

Table 1

AR1 (limit) Xt = 0.99Xt-l + 1000 + € t € t "-'> N(0 , 5)


n = 100, H = 5

B.J. (p,d , q) EMO EMP

(0,1,1) 0.014 0.022·


(0,1,2) 0.013 0.022·
(0 , 1, 3) 0.012· 0.026
N .P. (d = l)p
1 0.011 0.023
2 0.007 0.023
p=6 0 .003* 0.023
10 0.007 0.019*

Table 2
186 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

MA6 Xt = 6t - 2.8486t_1 + 2.68856t_2 - 1.646456t_3


+2.97261_4 - 2.14926t_5 + 0.677166t_661 rv> N(0 , 5)
n = 100, H = 5
B.J. (p,d , q) EMO EMP
(0,0,6) 3.03 5.33
(0,0, 7) 2.78* 5.31
(1,0,2) 3.01 5.30'
N.P . p
1 3.03 5.28
p=2 2.77* 5.16'
5 3.14 5.62
10 4.44 6.18

Table 3

AR2 Xt = 0.7Xt_l + 0.2Xt-2 + 1000 + 6t 6t rv> N(O, 5)


n = 100, H = 5
B.J. (p, d, q) EMO EMP
(2,0, 0) 0.012' 0.138*
(1 , 0, 0) 0.026 0.154
(3,0,0) 0.013 0.144
(0,1,3) 0.026 0.148
N.P.p
1 0.019 0.136
2 0.014' 0 .143
10 0.024 0.146
p= 30 0 .015 0.074*

Table 4
7.4. ANNEX 187

ARMA(1,1) Xt = 0.8Xt-l + €t + 0.2€t-l + 1000


€t rv> N(0,5) n = 100, H=5
B.J. (p,d,q) EMO EMP
(2, 0,0) 0.177 0.294
(1 , 0,0) 0.149 0.282*
(1,0,1) 0.123* 0.290
(1,0,2) 0.170 0.296
N.P.p
5 0.149 0.326
10 0.098 0.313
20 0.099 0.316
P = 30 0.074* 0.186*

Table 5

AR1 Xt = 0.8Xt_l + 1000 + €t


€t rv> exp(1/300) n = 100, H = 5
B.J. (p,d , q) EMO EMP
(1,0,0) 1.60' 11.0'
(0,0,2) 3.75 11.7
(0,0,1) 4.23 11.5
(2,0,0) 1.60' 11.3
N.P.p
1 2.45 9.65
5 1.77* 12.52
p=7 4.92 12.42
30 2.55 6.55*

Table 6
188 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

AR1 (contaminated) Yt = 0.5(1 - Ot)Yt-l + (1 - 30U4)ct


P(Ot = 1) = P(Ot = 0) = 1/ 2, P(O: = 0) = 2/3 P(O~ = 1) = 1/3
Ct ""> N(O , 1) n = 100, H = 5
B.J. (p, d, q) EMO EMP
(1 , 0,0) 153.8 218.0
(3 , 0, 0) 152.5 214.0
(7, 0,0) 146.5 213.5
(10, 0, 0) 137.8* 198.8*
N .P.p
5 80.1 272.2
p= 10 51.9" 219.4"
20 86.6 288.5
30 64.5 320.6

Table 7

AR1 (contaminated) Zt = Yt + 100


B.J. (p,d ,q) EMO EMP
(1,0, 0) 1.63 3.57"
(0,0, 3) 1.49* 3.58
(7,0, 0) 1.82 3.61
(0 , 0,1) 1.62 3.57"
N .P.p
5 1.99 3.54"
P = 10 1.71 4.12
20 1.69* 6.41
30 1.85 4.51

Table 8
7.4. ANNEX 189

Perturbated sinusoid Xt = 3000 sin(1rt/15) + Ct


Ct rv exp(I/300) n = 200, H=5
B.J. (p,d,q)(P,D,Q)l EMO EMP
(2,1,0)(2,1,0)"U 21.88* 47.10
(2,1,1)(2,1,1)30 25.08 40.70*
N .P.p
P= 15 7.81* 32.76
30 9.82 29.50*
60 13.33 34.35

Table 9

Profit margin (A. Pankratzz)SARIMA


n = 80, H = 5
B.J. (p,d,q)(P,D,Q)1 EMO EMP
(1 , 0,0)(2,1,4)4 4.85' 24.10'
N .P . p
4 8.40 17.41
8 6.92 16.89
12 4.98 17.23
P = 24 1.17' lU9'

Table 10
190 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

Cigar consumption (A. Pankratzz) SARIMA


n = 96, H = 6
B.J. (p,d,q)(P,D,Q)l EMG EMP
(1,1,0)(1,2,0)1~ 13.07 42.70
(2,1,0)(1,1,0)12 8.76* 23.1 *
N.P.p
4 12.26 32.73
17 = 12 5.70* 24.95
24 7.83 24.63*

Table 11

Change in business inventories (A. Pankratzz)


n = 60, H = 10
B.J. (p,d,q) EMG EMP
(1,0,0) 37.0 156.9
(2,0,0) 36.6* 156.5*
(3,0,0) 39.1 172.3
N.P.p
17 = 1 65.7 165.0
10 28.8- 81.5
20 32.8 59.4*

Table 12
7.4. ANNEX 191

Coal (A. Pankratzz) n = 90, H = 10


B.J. (p, d, q) EMO EMP
(1,0,0) 3.83 23.60·
(2,0,0) 3.42 24.20
(1,0,1) 3.52 23.90
(1,0,2) 3.47 24.32
(1,0,3) 3.11" 24.06
N.P.p
1 2.94* 19.53
2 3.14 19.84
p=3 3.22 19.39*
5 4.04 19.63
10 3.50 22.51

Table 13

Yields from a batch chemical process


(G. Box, G. Jenkins) n = 70, H = 5
B.J. (p,d,q) EMO EMP
(1,0,1) 26.75 42.90"
(2,0,0) 26.31 43.14
(0,0,1) 26.26 43.19
(0,0, 2) 25.70* 43.02
N.P.p
p=2 17,88* 44.38"
5 23.10 47.12
10 29.88 55.13
20 35.01 50.18

Table 14
192 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

Chemical process concentration readings


(G. Box, G. Jenkins) n = 197, H = 10
B.J. (p,d,q) EMO EMP
(1,0,1) 2.48 4.01
(1,0,2) 2.38 3.96*
(0,1,1) 1.85' 4.17
N .P . (p,d)
(p = 2,0) 2.72 3.89*
(5,0) 2.71 4.06
(p=I,I) 2.11 4.33
(5,1) 3.07 4.55
(10, 1) 1.91* 4.63

Table 15

ARI Xt = 0.9Xt-l + Ct Ct rv> uniform on [-49,49]


(M. Kendall, A. Stuart) n = 100, H = 5
B.J. (p, d, q) EMO EMP
(1 , 0,1) 41.8 286*
(0 , 0,2) 30.6* 365
(0, 0,3) 43.8 333
(0, 0,4) 36.6 343
N.P. p
1 137.2 445
5 456.0 1274
10 63.5* 47*
P= 18 312 869

Table 16
7.4. ANNEX 193

AR3 Xt = 0.2Xt-l + Xt-2 - 0.3Xt-3 + €t


€t "'> uniform on [-49, 49J (M. Kendall, A. Stuart)
n = 100, H = 5
B.J. (p, d, q) EMO EMP
(1 , 0,0) 52.9 197*
(0, 0,3) 44.6- 462
(1,0,2) 95.9 207
(2,0,0) 53.2 222
(3,0,0) 70.1 204
N.P.p
1 115 .7 ~H6

5 137.3 ~{62

10 529.2 630
20 44.5- fi6'
ft = 25 49.0 fi6'

Table 17
French car registrations (april 1987 - september 1988)

t Xt Xt BJ Xt NPk =36
1 192.1 183.1 197.1
2 156.7 173.8 179.1
3 151.2 170.5 180.7
4 195.9 161.9 167.9
5 146.1 136.3 138.7
6 129.6 134.4 144.1
7 232.3 189.1 195.9
8 197.8 190.0 192.5
9 208.9 193.9 204.6
10 160.6 148.6 156.3
11 160.0 153.9 164.5
12 218.0 206.2 221.3
13 189.0 188.6 197.1
14 184.0 181.3 185.6
15 141.6 179.8 195.5
16 210.0 163.6 160.3
17 157.1 133.9 135.4
18 146.7 134.4 141.1

Table 18
194 CHAPTER 7. IMPLEMENTATION AND APPLICATIONS

-- data
- predictions BJ
. . . predictions NP

240

I'
.:,
, \
l~
\ -
/.'
I
I' ,
"\
.
\
\
\ ....
120 L-~ _____ ~~ ___ ~ ___ ~

French car registrations


Figure 6

- errors BJ
- - - errors NP

150

...
.. ,
, ,, ,
-dO "

French car registrations (cumulated prediction errors)


Figure 7
7.4. ANNEX 195

- data
predictions
- - - ± 3 standard deviation

5~X~1~0__~____r -__~____~____~____r -___r ____' -____r -__~

<1.5
~
ro;: <1
'"g' 3.5 \ ... /

~ - .. " /1
, ' - I I
3

2 . 50L---l~O---2~O---3~O---<1~0--~5~0---5~O~-~7~0~-~5~0~-~9~0~-~100

Prediction : 4th and 5th february 1991


Relative error : 0.8986 % French electricity consumption
Figure 8

- data
predictions
- - - ± 3 standard deviation

/
, / ...
..... - , / " I \,

I ,' ....... , , , I ~~"-""""_


I ~""'"'
' ,- ~ \ / ' I / .. ... .... .. '
/' " "f."".. \ Ii " ,"
/ - - ... . .. .. ......... ~. \~
... \ '/ / - ~ "' ... · .. · · .. .. .. r.
I / ........ _ . .... _)1. I ___ . . _
I I ..... _ - .... 1 '.... I .... '
3 \ \ I I " I
... ...
... I , J

.... - ."
10 20 30 <10 50 60 70 50 90 100

Prediction: 26 th and 27th august 1991


Relative error: 2.159 % French electricity consumption
Figure 9
References

[AD 1ADLER R.J. (1990). An introduction to continuity, extrema, and Re-


lated topics for general Gaussian processes. Inst. of Math. Statist.,
Hayward, Californi

[AN-PO 1 AN'NGOZE P. and PORTIER B. (1994). Estimation of the density and


of the regression functions of an absolutely regular stationary process.
Publ. ISUP 38, 59-87.

[A-O J ANTONIADIS A. and OPPENHEIM G. (1995) editors. Wavelets and


statistics. Springer Verlag.

[A-Z J ARAK T. and ZAIZSEV A. (1988). Uniform limit theorems for sums
of independent random variables. Publ. Steklov math. institute, 1.

[A-G JASH R.B. and GARDNER M.F. (1975). Topics in stochastic processes.
Academic Press.

[AS J AzMs J.M. (1990). Conditions for convergence of number of crossings


to the local time. Probab. and Math. Stat., 11, 1, 19-36.

[A-F J AZAIS J.M. and FLORENS D. (1987). Approximation du temps local


des processus Gaussiens stationnaires par regularisation des trajectoires.
Probab. theory and reI. fields 76, 121-132.

[BA J BANON G. (1978). Nonparametric identification for diffusion processes.


Siam J. Control and Optimisation V16, 380-395.

[BA-NG1 1BANON G. and NGUYEN H.T. (1978). Sur l'estimation recurrente de


la densite et de sa derivee pour un processus de Markov, C.R. Acad. Sci.
Paris t. 286, ser. A, 691-694.

[BA-NG2 J BANON G. and NGUYEN H.T. (1981). Recursive estimation in diffu-


sion model. Siam J. Control and optimisation VIO, 676-685.

[BA-Y J BARLOW M.T. and YOR M. (1981). (Semi) Martingales inequalities


and local times. Z. Fur Wahrscheinlichkeit. und Geb. 55, 237-254.

[BW-PR J BASAWA I.V. and PRAKASA RAO B.1.S. (1980) . Statistical inference
for stochastic processes. Academic Press.

[BE 1 BENNETT G. (1962). Probability inequalities for sum of independent


random variables. J. Amer. Statis. Assoc. 57, 33-45
198 REFERENCES

[BB 1BERBEE H.C.P.(1979). Random walks with stationary increments and


renewal theory. Math. Centro Tract. Amsterdam.
[BL-DV 1 BERLINET A. and DEVROYE L. (1994). A comparison of kernel
density estimates. Publ. ISUP Vol. 38, 3, 3-59.
[BM 1 BERMAN S.M. (1983) . Local nondeterminism and local times of general
stochastic processes. Ann. Inst. H. Poincare, 19, 189-207.
[BI-MA 1 BIRGE L. and MASSART P. (1995). From model selection to adaptive
estimation. Preprint 95.41 Univ. Paris Sud.
[BII 1 BILLINGSLEY P. (1965). Ergodic theory and information. Wiley.
[BI2 1BILLINGSLEY P. (1968). Convergence of Probability Measures. Wiley.
[BI3 1 BILLINGSLEY P. (1986) . Probability and Measure. (2 nd edition).
Wiley.
[BKI 1 BLANKE D. (1997). Estimation non parametrique de la densite pour
des processus a temps continu : vitesse minimax. C.R. Acad. Sci. Paris
t. 325, I. p. 527-530.
[BK2 1 BLANKE D. (1997). Ph.D thesis - University of Paris 6.
[BK-B01 1BLANKE D. and BOSQ D. (1997). Accurate rates of density estimators
for continuous time processes. Stat. and Proba. letters 33, 2, 185-191.
[BK-B02 1 BLANKE D. and BOSQ D. (1998). A family of minimax rates for
density estimators in continuous time. 32 p. to appear.
[BT-F 1 BOENTE G. and FRAIMAN R. (1995). Asymptotic distribution of
data driven smoothers in density and regression under dependence. The
Canadian Journ. of Statist. 23,4, 383-397.
[BV BOLLERSLEV T . (1986) . Generalized autoregressive conditional het-
eroskedaticity. J. of Econometrics 31, 307-327.
[BOll BOSQ D. (1989). Non parametric estimation of a Non Linear Filter
using a Density Estimator with zero-one "explosive" behaviour in ]Rd.
Statistics and Decisions 7, 229 - 241.
[B02 1 BOSQ D. (1991). Modelization , Nonparametric estimation and pre-
diction for continuous time processes in "Nonparametric functional es-
timation and related topics". Ed. G. ROUSSAS - NATO ASI Series,
509-529.
[B03 1 BOSQ D. (1991). Nonparametric prediction for unbounded almost sta-
tionary processes. Nonparametric functional estimation and related top-
ics. NATO ASI series Vol. 335 - ed. G. ROUSSAS. 389-404.
REFERENCES 199

[B04 1BOSQ D. (1993). Vitesses optimales et superoptimales des estimateurs


fonctionnels pour un processus it temps continuo C.R. Acad. Sci. Paris
317, ser. I, 1075-1078.
[B05 1 BOSQ D. (1993) . Optimal and superoptimal quadratic error of func-
tional estimators for continuous time processes. Preprint, Univ. Paris
VI.
[B06 1BOSQ D. (1993). Bernstein-type large deviation inequalities for partial
sums of strong mixing processs. Statistics,24, 59-70.
[B07 1 BOSQ D. (1995). Sur Ie comportement exotique de l'estimateur it noyau
de la densite marginale d'un processus a temps continuo C.R. Acad. Sci.
Paris t. 320, skI, 369-372.
[B08 1BOSQ D. (1995). Optimal Asymptotic quadratic Error of Density Esti-
mators for strong mixing or chaotic data. Statistics and Probab. Letters
22, 339-347.
[B09 1 BOSQ D. (1997). Parametric rates of nonparametric estimators and
predictors for continuous time processes. Annals of Statistics, 25, 3, 982-
1000.
[BOlO 1 BOSQ D. (1997). Temps local et estimation sans biais de la densite en
temps continuo C.R. Acad. Sci. Paris t. 325, 1, 527-530.
[BO-C 1 BOSQ D. and CHEZE-PAYAUD N. (1995). Optimal asymptotic quadratic
error of nonparametric regression function estimates for a continuous-time
process from sampled data. To appear.
[BO-D 1BOSQ D. and DAVYDOV Y. (1998) . Local time and density estimation
in continuous time. Publ. IRMA. Univ. Lille 1, Vol. 44, W III.
[BO-L 1 BOSQ D. and LECOUTRE J.P. (1987) . TMorie de l'estimation fonc-
tionnelle. Economica.
[BMP 1 BOSQ D., MERLEVEDE F . and PELIGRAD M. (1997). Asymptotic
normality for kernels estimators of densities in discrete and continuous
time. To appear.
[BO-S 1BOSQ D. and SHEN J. (1998). Estimation of an autoregressive nonlin-
ear model with exogenous variable. Accepted by Journ. of Stat. infer-
ence.
[B-J 1 BOX G. and JENKINS G. (1970). Time series analysis : forecasting
and control. Holden Day.
[BR1 1 BRADLEY R. (1983). Approximation theorems for strongly mixing
random variables. Michigan Maths. J. 30, 69-81.
200 REFERENCES

[BR2 1 BRADLEY R. (1986) . Basic Properties of strong mixing conditions.


in E.Eberlein and M.S. Taqqu editors, Dependence in Probability and
Statistics, p.165-192. Birkhiiuser.

[B-D 1 BROCKWELL P.J. and DAVIS R.A. (1991). Time series: theory and
methods. Springer Verlag.

[BRO 1 BRONIATOWSKI M. (1993) . Cross validation methods in kernel non-


parametric density estimation: a survey. Pub!. ISUP, 38, 3-4, 3-28.

[C 1 CARBON M. (1993). Une nouvelle inegalite de grandes deviations.


Applications. Pub!. IRMA, Vo!' 32 n° II.

[C-DE 1 CARBON M. and DELECROIX M. (1993). Nonparametric forecasting


in time series: a computational point of view. Applied Stoch. Models
and data analysis 9, 3, 215-229.

[C-L 1 CASTELLANA J .V. and LEADBETTER M.R. (1986) On smoothed


Probability density estimation for stationary processes. Stoch. Proc.
App!., 21, 179-193.

[CP J CHEZE-PAYAUD N. (1994) . Regression, prediction et discretisation


des processus a temps continuo Thesis Univ. Paris 6.

[CH-W J CHUNG K.L. and WILLIAMS R.J . (1990) . Introduction to stochastic


integration (2 nd ed.) 276 p., Birkhiiuser (Boston) .

[CO J COLLOMB G. (1981). Estimation non parametrique de la regression .


Revue bibliographique. Ins. Statist. Rev. 49, 75-93.

[DC-OP-TH J DACUNHA-CASTELLE D., THOMASSONE R. and OPPENHEIM


G.(1995) . Prevision des pointes d'ozone a Paris (preprint).

[Dl J DAVYDOV YA . (1968). Convergence of distributions generated by


stationary stochastic Processes. Theor. Probab. Appl. 13, 691-696.

[D2 1DAVYDOV Y (1976). Local times of random processes. Th. Probab.


and Appl. 21 , 1, p. 172-179.

[D3 J DAVYDOV Y. (1996). Approximation du temps local des processus a


trajectoires regulieres. C.R. Acad. Sci. Paris, t . 322, Serie I, p. 471-474.

[D4 J DAVYDOV Y (1998) . The rate of approximation of local time for


random processes with smooth sample paths. Preprint Univ. Lille I.

[D-H J DEHEUVELS P. and HOMINAL P. (1980). Estimation automatique


de la densite. Rev. Statist. App!. 28, 25-55.
REFERENCES 201

[DEll DELECROIX M. (1980). Sur l'estimation des densites d'un processus


stationnaire a temps continuo Pub!. ISUP, XXV, 1-·2, 17-39.

[DE2 1 DELECROIX M. (1987). Sur l'estimation et la prevision non parame-


trique des processus ergodiques. These de Doctorat d'Etat, Universite de
Lille.

[DE-RO 1DELECROIX M. and ROSA A.C. (1995). Ergodic processes prediction


via estimation of the conditional distribution function. Pub!. ISUP,
XXXIX, 2, 35-56.
[DV 1 DEVROYE, L. (1987). A course in density estimation. Birkhiiuser.

[DV-GY 1DEVROYE L. and GYORFI L. (1985). Nonparametric density estima-


tion. The L1 view. Wiley.

[DO 1 DOOB J.L. (1967). Stochastic Processes. (7 th printing). Wiley.

[DK1 1DOUKHAN P. (1991). Consistency of delta-sequence estimates of a den-


sity or a regression function for a weakly stationary sequence. Seminaire
Univ. Orsay 1989-90, 121-14l.

[DK2 1 DOUKHAN P. (1994). Mixing -Properties and Examples. Lecture


Notes in Statistics. Springer Verlag.

[DK-LN 1DOUKHAN P. and LEON J. (1994). Asymptotics for the local time of
a strongly dependent vector-valued Gaussian random field . Universite de
Paris-Sud, Mathematiques 94.3l.

[E 1 ENGLE R. (1982) . Autoregressive conditional heteroscedasticity with


estimates of the variance of United Kingdom inflation. Economica 50,
987-1007.

[EP 1 EPANECHNIKOV V.A. (1969) . Nonparametric estimation of a multi-


dimensional probability density. Theory Probab. App!. 14, 153-158.

[Fl 1 FAN (1991). On the optimal rates of convergence for nonparametric


deconvolution problems. Annals of Statist. , 19, 1257-1272.

[F2 1 FAN (1991). Asymptotic normality for deconvolving kernel density


estimators, Sankhya, Ser. A 53, 97-110.
[F3 1 FAN (1992). Deconvolution with supersmooth distributions. Canad J .
Statist. 20, 155-169.

[FA 1 FARREL R. (1972). On the best obtainable asymptotic rates of con-


vergence in estimation of a density function at a point. Ann. of Math.
Stat. 43,1,170-180.
202 REFERENCES

[G-H1 1 GEMAN D. and HOROWITZ J. (1973). Occupation times for smooth


stationary processes. Annals of Probab. 1, 1, 131-137.

[G-H2 1 GEMAN D. and HOROWITZ J. (1980). Occupation densities, 8, 1,


1-67.

[G-M 1 GOURIEROUX C. and MONFORT A. (1983). Cours de serie tem-


porelle. Economica.

[GR 1GRENANDER U. (1981). Abstract inference. Wiley.

[GG 1 GUEGAN D. (1994). Series chronologiques non lineaires it temps dis-


cret. Economica.

[GU 1 GUERRE E. (1995). The General behavior of Estimators of the Box-


Cox model for integrated Time Series. Preprint INSEE.

[GY 1GYORFI L. (1997). How far one can learn probability law from data?
Pub!. ISUP, Vo!' 41, Fasc. 3, p. 3-20.

GL GYORFI L. and LUGOSI G. (1992). Kernel density estimation for er-


godic sample is not universally consistent. Comput. Statist. and data
Anal. 14,437-442.

[GHSV 1GYORFI L., HARDLE W., SARDA P., VIEU P. (1989). Nonparametric
Curve Estimation from Time Series. Lecture Notes in Statist. Springer
Verlag.

[HA-l 1 HARDLE W. (1990). Applied nonparametric regression. Cambridge


University Press.

[HA-2 1HARDLE W. (1991). Smoothing Techniques with implementation in S.


Springer Verlag.

[HO 1 HOEFFDING (1963). Probability inequalities for sums of bounded


random variables. J. Amer. Statist. Assoc. 58, 13-30.

[IB 1 IBRAGIMOV LA. (1962). Some limits theorems for stationary Pro-
cesses. Theor. Prob. Appl. 7, 349-382.

[IB-HA 1 IBRAGIMOV l.A. and HASMINSKII R.Z. (1981). Statistical estima-


tion - Asymptotic theory. Springer-Verlag, New York.
[IB-RZ 1IBRAGIMOV LA. and ROZANOV YA. (1978). Gaussian random pro-
cesses. Springer Verlag, New York.
[K-S 1 KENDALL M. and STUART A. (1976). The advanced theory of Statis-
tics. Vol. 3 - C. Griffin and Co., third edition.
REFERENCES 203

[KO-TS I KOROSTELEV A.P. and TSYBAKOV A.B . (1993) . Minimax theory


of image reconstruction. Springer Verlag.
[K I KUTOYANTS Yu.A. (1997). Some problems of nonparametric esti-
mation by observations of ergodic diffusion processes. Stat. and Proba.
letters 32, 311-320.
[LM I LASOTA A. and MACKEY M.C. (1985). Probabilistic Properties of
deterministic systems. Cambridge Univ. Press.
[LEI I LEBLANC F. (1995). PhD thesis, Univ. Paris VI.
[LE2 I LEBLANC F. (1997) . Density estimation for a class of continuous time
process. Maths. Method. of Stat., 6, 2, 171-199.
[M I MAES J. (1994) . Estimation non parametrique de la fonction chaotique
et des exposants de Lyapunov d'un systeme dynamique. c.R. Acad., Sci.
Paris t. 319 ser. 1 p. 1005-1008.
[MA I MARLE CM. (1974). Mesures et probabilites. Hermann (Paris).
[MR I MARRON J .J. (1991). Root n bandwith selection. Nonparametric
functional estimation and related topics. NATO ASI series, Vol 335, ed.
G. ROUSSAS, 251-260.
[MSI I MASRY E. (1983) . Probability density estimation from sampled data.
IEEE transf. inf. tho 29, 696-709.
[MS2 I MASRY E. (1986). Recursive Probability Density Estimation for weakly
Dependent Stationary processes. IEEE Trans. inform. Theory. II, 32, 2,
254-267.
[MS3 I MASRY E. (1988). Continuous-parameter stationary process: Statisti-
cal properties of joint density estimators, J. of multo Anal., 26, 133-165.
[MS4 I MASRY E. (1991). Multivariate Probability Density deconvolution for
stationary random processes. IEEE Trans. inform. Theory II, 37, 1105-
1115.
[MS5 I MASRY E. (1993). Strong consistency and rates for deconvolution of
multivariate densities of stationary processes. Stoch. Proc. and Applic.
47, 53- 74.
[MS6 I MASRY E. (1993). Asymptotic Normality for deconvolution estimators
of multivariate densities of stationary processes. J. Multivariate Analysis,
44,47- 68.
[MS7 I MASRY E. (1994). Multivariate regression estimation with errors in
variables for stationary processes. Nonparametric Statist. 3 in press.
204 REFERENCES

[MIl MILCAMPS B. (1995). Analyse des correlations inter-marcM. Memoire


ISUP (Paris).

[NA 1 NADARAJA E.A. (1964). On estimating regression. Theory Probab.


An., 141-142.

[NG 1 NGUYEN H.T. (1979). Density estimation in a continuous-time sta-


tionary Markov process, Annals of Statist., 7, 2, 341-348.

[NG-PHI 1NGUYEN H.T. and PHAM D.T. (1980) . Sur l'utilisation du temps local
en Statistique des processus, C.R. Acad. Sci. Paris, 290, A, 165-170.

[NG-PH2 1NGUYEN H.T. and PHAM D.T. (1981). Nonparametric estimation in


diffusion model by discrete sampling. Preprint.

[PA 1 PARZEN E . (1962). On the estimation of probability density function


and mode . Ann. Math. Statist. 33, 1065-1076.

[P 1 PANKRATZ A. (1983). Forecasting with univariate Box-Jenkins mod-


els. Concepts and cases. Wiley.

[PH-TIl PHAM D.T . and TRAN L.T. (1985). Some strong mixing properties of
time series models. Stoch. Proc. App!. 19, 207-303.

[PH-T2 1 PHAM T .D. and TRAN L.T. (1991). Kernel Density Estimation under
a locally Mixing condition - in Nonparametric Functional Estimation and
Related Topics. Ed. G. ROUSSAS NATO ASI SERIES V. 335, 419-430.

[PRI 1 PRASAKA RAO B.L.S. (1979) . Nonparametric estimation for continu-


ous time Markov processes via delta-families. Pub!. ISUP, XXIV, 81-97.

[PR2 1 PRASAKA RAO B.L.S. (1983) . Nonparametric functional estimation.


Academic Press.

[PR3 1 PRASAKA RAO B.L.S. (1990) . Nonparametric density estimation for


stochastic process from sampled data. Pub!. ISUP, XXXV, 51-84.

[PO 1 POGGI J.M. (1994) . Prevision non parametrique de la consommation


electrique. Rev. Statist. Appliq. XLII (4), 83-98.

[RA 1 RAO (1992). Probability Theory. Acad. Press.


[RH ] RHOMARI N. (1994) . Filtrage non parametrique pour les processus
non Markoviens. Applications. Ph D. Thesis University Pierre et Marie
Curie (Paris).
[RB ] ROBINSON P.M. (1983). Nonparametric Estimators for time series. J.
Time Series Anal. 4, 185-297.
REFERENCES 205

[RB-ST 1 ROBINSON P.M. and STOYANOV J .M. (1991). Semiparametric and


nonparametric inference from irregular observations on continuous time
stochastic processes in "Nonparametric functional es1;imation and related
topics" . Ed. G. ROUSSAS NATO ASI Series, 553-5.58.

[RIl 1 RIO E. (1993) . Covariance inequalities for strongly mixing processes.


Ann. Inst. Henri Poincare, 29, 4, 587-597.
[RI2 1RIO E. (1995) . The functional law of the iterated logarithm for station-
ary strongly mixing sequences. Annals of Probab. 22, 3, 1188-1203.
RA ROSA A.C.M. (1993). Prevision robuste sous une hypothese ergodique.
Phd Thesis. Un. of Tououse 1.

[ROIl ROSENBLATT M. (1956). Remarks on some non parametric estimates


of a density function . Ann. Math. Statist. 27, 832-837.

[R02 1ROSENBLATT M. (1956). A central limit theorem and a strong mixing


condition. Proc. Nat. Ac. Sc. USA 42, 43-47.
[R03 1 ROSENBLATT M. (1985) . Stationary Sequences and Random fields.
Bir khiiuser .
[RS1 1ROUSSAS G. (1967). Nonparametric Estimation in Markov Processes.
Ann. Inst. Stat. Math. 21, 73-87.

[RS2 1 ROUSSAS G. (1988). Nonparametric Estimation in mixing sequences


of random variables. J. Stat. Plan. and Inf. 18, 135-149.

[RS3 1 ROUSSAS G. (1990) . Nonparametric regression estimation under mix-


ing conditions. Stoch. Processes Appl. 36, 107-116.

[RS-IO 1 ROUSSAS G. and IOANNIDES D. (1987) . Moment inequalities for


mixing sequences of random variables. Stochastic Anal. Appl. 5(1),
61-120.

[RS-TR 1 ROUSSAS G. and TRAN L.T. (1992) . Asymptotic normality of the


recursive kernel regression estimate under dependence conditions. Annals
of Statist. 20, 1,98-120.

[RU 1 RUELLE D. (1989) . Chaotic Evolution and Strange Attractors. Cam-


bridge Univ. Press.
[SA 1SARDA P. (1993). Smoothing parameter selection for smooth distribu-
tion function . J. Statist. Plan. Infer 35, 65-75.
[SC 1SCHIPPER M. ( 1997). Sharp asymptotics in nonparametric estima-
tion. Ph.D. Thesis. Utrecht.
206 REFERENCES

lSI I SILVERMAN B.W. (1986). Density estimation for Statistics and Data
Analysis . Chapman and Hall.
[ST-TR I STONE C.J. and TRUONG YK. (1992). Nonparametric function esti-
mation involving time series, Annals of Statist., 20, 1, 77-97.
[SU I STOUT W .F . (1974). Almost sure convergence. Academic Press.
[SR I STRASSEN V. (1964) . An invariance principle for the law of the iterated
logarithm. Z. fUr Wahrscheinlichkeit. unt Geb. 3, 211-226.
[SE I STUTE W. (1982) . Alaw of the logarithm for kernel density estimators.
Ann. Probab. 10, 414-422.
[TR1 I TRAN L.T. (1989) . The £1 convergence of kernel density estimates
under dependence. The Canad. J. of Statist. 17, 2, 197-208.
[TR2 I TRAN L.T. (1990). Kernel density and regression estimation for depen-
dent random variables and time series. Techn. report . Univ. Indiana.
[TR3 I TRAN L.T . (1993). Nonparametric function for time series by local
average estimators. Ann. Statist. 21(2), 1040-1057.
[T-S I TRUONG YK. and STONE C.J . (1992) . Nonparametric function esti-
mation involving time series. Ann. Statist. 20, 77-98.

[V I VERETENNIKOV A. YU. (1996). On a Castellana-Leadbetter's con-


dition for a diffusion density estimation. Preprint Univ. du Maine.
[VI I VIEU P. (1994) .Quelques n~sultats en estimation fonctionnelle. Memoire
d'Habilitation, Univ. P. Sabatier, Toulouse (France).
[WA I WATSON G .S. (1964). Smooth regression analysis. Sankhya Ser. A,
26, 359-372.
[WE I WENTZEL A.D. (1975) . Stochastic processes. Nauka (in russian).
Moscou.
[Y I YAKOWITZ S. (1985) . Markov flow models and the flood warning
problem. Water resources research, 21, 81-88.
[YO I YOKOYAMA R. (1980) . Moments bound for stationary mixing se-
quences. Z. Wahrsch. Gebiete, 52, 45-57.
[YR I YOR M. (1995). Local times and excursions for Brownian motion. Un.
Centro de Venezuela.
Index

Absolute regularity 18
Adaptive methods 14
Admissible sampling 14, 122, 140
a-mixing 7, 18
ARCH 1, 178
ARMA process 1, 177, 179, 180
Asymptotic normality 9, 11, 36, 54, 75, 80, 89, 118, 138, 160-161
Autoregressive processes (infinite dimensional) 14

B
tJ-mixing 18
Berbee's lemma 19
Bernstein's inequality 24
Billingley's inequality 22
Black-Scholes formula 182
Bochner's lemma 44, 100
Borel-Cantelli lemma in continuous time 108
Box-Cox transformation 170
Box-Jenkins (method) 1, 177
Bradley's lemma 20

C
Cadlag 90, 121
Cars registrations 184
Central limit theorem 36
Chaos 86
Chaotic data 57
Conditional mode predictor 180
Consistency of local time density estimator 150
Coupling 19
Covariance inequalities 20, 21, 22
Cramer's conditions 24
Cross validation 175-176
Cynical method 172
208 INDEX

D
Davydov's inequality 21-22
Density kernel estimator 3, 42, 90
Deseasonalization 14, 170-172
Dichotomy 119, 140
Differencing 13
Differentiable sample paths 104
Diffusion process 101
Double kernel method 176
Dynamical system 86

E
Electricity consumption 184
Elimination of trend and seasonality 171-172
Empirical measure 12, 42, 68
Epanechnikov kernel 42, 176, 182
Ergodic 150, 153
Ergoduc theorem 151, 152, 154
Errors in variables (processes with) 64-65, 86
Exogeneous variables 15, 177
Exponential type inequalities 7, 24-33

F
<p- mixing 18
<prev-mixing 80, 143
Forecasting : see Prediction
Full rate 101
G
GARCH 180
Gaussian process 10, 19, 74, 100, 104, 122, 153, 160
General stationary processes (prediction for) 81
Geometrically strongly mixing (GSM) processes 46, 90

H
Histogram 3
Hoeffding's inequality 24

I
Implementation of nonparametric method 169-177
Intermediate rates 102-107, (minimaxity of) 107-108
Interpolator 85
Irregular sampling 121
Iterated logarithm (functional law of) 161
INDEX 209

K
Kernel 39
Kernel of order (k, A) 90
Kolmogorov extension theorem 4
Kutoyants theorem 102 L
Large deviations inequalities 7, 24-33
Law of large numbers 34-35
Linear process 18, 46
Local time 145, 146
Local time estimator 149
Local time for semimartingales (existence) 146
Logistic trend 83-84

M
Markov process 76, 141
Markov process of order k 76
Martingale 82
m-dependent 19
Minimax 9, 46, 97, 101, 102, 107, 108
Minimaxity of intermediate rates 107-108
MISE 91
Mixing 7,17
Mixture 4

N
Naive kernel 3, 40
Nonparametric predictor 1, 6, 76, 82, 141, 172, 177
Nonstationary process (prediction for) 82

o
Optimal rate 8, 93
Occupation measure 145
Ornstein-Uhlenbeck process 122
Outliers 84
p

p-adic Process 58
Parametric rate 10
Parametric predictors 177, 180
Periodic 83
Plug-in method 173
Pollution 184
210 INDEX

Pseudo-regression 84

Q
Quadratic error (asymptotic) 43, 69, 91, 125, 155

R
Rate 33
Regression kernel estimator 69, 130
Regression with error 86
Regressogram 5
Rio's inequality 20
Robust 5, 14, 180

s
Sampling 14, 15, 118, 140
SARIMA process 1
Seasonality 13, 170
Semi parametric 14
Similarity 13
Singular distribution 61
Stationary process 7
Statistical error of prediction 6
Superoptimal rate 10, 98, 104, 116, 136, 155, 157

T
Trend 170
Two-a-mixing 19

u
Unbiased density estimator 3, 12, 150, 156
Uniform convergence 8, 10, 11, 46, 72, 108, 139, 153, 162

v
Variance (stabilization of) 169

w
Wavelets 15, 127
y

Yields 2, 182.
Lecture Notes in Statistics Vol. 80: M. Fligner, J. Verducc i (Eds.), Probability Models
and Statistical Analyse$ for Ranking Data. xxii, 306 pages,
For infonnation about Volumes 1 to 61 1992.
please contact Springer-Verlag
Vol . 81 : P.Spirtes, C. Glymour, R. Scheines, Causation,
Vol. 62: J.e. Akkerboom, Testing Problems with Linear or Prediction, and Search. xxiii, 526 pages, 1993 .
Angular Inequality Constraints. xii, 291 pages, 1990.
Vol. 82: A. Korostelev and A. Tsybakov, Minimax Theory
Vol. 63 : J. Pfanzagl, Estimation in Semi parametric Models : of Image Reconstruction. xii, 268 pages, 1993.
Some Recent Developments. iii, 112 pages, 1990.
Vol. 83: e. Gatsonis, J. Hodges, R. Kass, N. Singpurwalla
Vol. 64: S. Gabler, Minimax Solutions in Sannpling from (Editors), Case Studies in Bayesian Statistics. xii, 437 pages,
Finite Populations. v, 132 pages, 1990. 1993.

Vol. 65: A. Janssen, D.M. Mason, Non-Standard Rank Vol. 84: S. Yamada, Pivotal Measures in Statistical
Tests. vi, 252 pages, 1990. Experiments and Sufficiency. vii, 129 pages, 1994.

Vol 66: T. Wright, Exact Confidence Bounds when Vol. 85 : P. Doukhan, Mixing: Properties and Examples. xi,
Sannpling from Small Finite Universes. xvi, 431 pages, 142 pages, 1994.
1991.
Vol. 86: W . Vach, Logistic Regression with Missing Values
Vol. 67: M.A. Tanner, Tools for Statistical Inference: in the Covariates. xi, 139 pages, 1994.
Observed Data and Data Augmentation Methods. vi, 110
pages, 1991. Vol. 87: J. Milller, Lectures on Random Voronoi
Tessellations.vii, 134 pages, 1994 .
Vol. 68: M. Taniguchi, Higher Order Asymptotic Theory for
Time Series Analysis. viii, 160 pages, 1991. Vol. 88: 1. E. Kolass., Series Approximation Methods in
Statistics. Second Edition, ix, 183 pages, 1997.
Vol. 69: N.J.D. Nagelkerke, Maximum Likelihood
Estimation of Functional Relationships. V, 110 pages. 1992. Vol. 89: P. Cheeseman, R.W. Oldford (Editors), Selecting
Models From Data: AI and Statistics IV. xii, 487 pages,
Vol. 70: K. lida, Studies on the Optimal Search Plan . viii, 1994.
130 pages, 1992.
Vol. 90: A. Csenki, Dependability for Systems with a
Vol. 71: E.M.R.A. Engel, A Road to Randomness in Partitioned State Space: Markov and Semi-Markov Theory
Physical Systems. ix, 155 pages, 1992. and Computational Implementation. x, 241 pages, 1994.

Vol. 72: J.K. Lindsey, The Analysis of Stochastic Processes Vol. 91 : J .D.Malley, Slatistical Applications of Jordan
using GUM. vi, 294 pages, 1992. Algebras. viii , 101 pages, 1994.

Vol. 73: B.C. Arnold, E. Castillo, 1.-M. Sarabia, Vol. 92: M. Eerola, Probabilistic Causality in Longitudinal
Conditionally Specified Distributions. xiii, 151 pages, 1992. Studies. vii, 133 pages, 1994 .

Vol. 74 : P. Baronc, A. Frigessi, M. Piccioni, Stochastic Vol. 93: Bernard Van Cutsem (Editor), Classification and
Models, Statistical Methods, and Algorithms in Image Dissimilarity Analysis. xiv, 238 pages, 1994.
Analysis. vi, 258 pages, 1992.
Vol. 94 : Jane F. Gentleman and G .A. Whitmore (Editors),
Vol. 75 : P.K. Goel, N.S. Iyengar (Eds.), Bayesian Analysis Case Studies in Data Analysis. viii, 262 pages, 1994.
in Statistics and Econometrics. xi, 410 pages, 1992.
Vol. 95: Shelemyahu Zacks, Stochastic Visibility in
Vol. 76: L. Bondesson, Generalized Gamma Convolutions Random Fields. x, 175 pages, 1994.
and Related Classes of Distributions and Densities. viii, 173
pages, 1992. Vol. 96: Ibrahim Rahimov, Random Sums and Branching
Stochastic Processes. viii , 195 pages, 1995.
Vol. 77: E. Mannmen, When Does Bootstrap Work?
Asymptotic Results and Simulations. vi, 196 pages, 1992. Vol. 97: R. Szekli, Stochastic Ordering and Dependence in
Applied Probability. viii, 194 pages, 1995.
Vol. 78: L. Fahnncir, B. Francis, R. Gilchrist, G.Tutz
(Eds.), Advances in GUM and Statistical Modelling: Vol. 98: Philippe Barbe and Patrice Bertail, The Weighted
Proceedings of the GUM92 Conference and the 7th Bootstrap. viii, 230 pages, 1995 .
International Workshop on Statistical Modelling, Munich.
13-17 July 1992. ix, 225 pages, 1992. Vol. 99: C.e. Heyde (Editor), Branching Processes:
Proceedings of the First World Congress. viii, 185 pages,
Vol. 79: N. Schmitz, Optimal Sequentially Planned Decision 1995.
Procedures. xii, 209 pages, 1992.
Vol. J00: Wlodzimierz Bryc, The Nonnal Distribution:
Characterizations with Applications. viii, 139 pages, 1995.
Vol. 101: H.H . Andersen, M.H.jbjerre, D. S.rensen, Vol. 119: Masanao Aoki , Arthur M. Havenner, Applications
P.S.Eriksen, Linear and Graphical Models: for the of Computer Aided Time Series Modeling. ix, 329 pages,
Multivariate Complex Normal Distribution. x, 184 pages, 1997 .
1995.
Vol. 120: Maia Berkane, Latent Variable Modeling and
Vol. 102: A.M. Mathai, Serge B. Provost, Takesi Hayakawa, Applications to Causality. vi, 288 pages, 1997.
Bilinear Forms and Zonal Polynomials. x, 378 pages, 1995 .
Vol. 121 : Constantine Gatsonis. James S. Hodges, Robert E.
Vol. 103: Anestis Antoniadis and Georges Oppenheim Kass, Robert McCulloch, Peter Rossi , Nozer D.
(Editors), Wavelets and Statistics. vi, 411 pages, 1995. Singpurwalla (Editors), Case Studies in Bayesian Statistics,
Volume Ill. xvi, 487 pages, 1997.
Vol. 104: Gilg U.H. Secber, Brian J. Francis, Reinhold
Hatzinger, Gabriele Steckel-Berger (Editors), Statistical Vol. 122: Timothy G. Gregoire, David R. Brillinger, Peter J.
Modelling: 10th International Workshop, Innsbruck, July Diggle, Estelle Russek-Cohen, William G. Warren, Russell
10-14th 1995. x, 327 pages, 1995 . D. Wolfinger (Editors), Modeling Longitudinal and
Spatially Correlated Data. x, 402 pages, 1997 .
Vol. 105 : Constantine Gatsonis, James S. Hodges. Robert E.
Kass, Nozer D. Singpurwalla(Editors), Case Studies in Vol. 123: D. Y. Lin and T. R. Fleming (Editors),
Bayesian Statistics, Volume II. x, 354 pages, 1995 . Proceedings of the First Seattle Symposium in Biostatistics:
Survival Analysis. xiii, 308 pages, 1997.
Vol. 106: Harald Niederreiter, Peter Jau-Shyong Shiue
(Editors), Monte Carlo and Quasi-Monte Carlo Methods in Vol. 124: Christine H. MOller, Robust Planning and
Scientific Computing. xiv, 372 pages, 1995. Analysis of Experiments. x, 234 pages, 1997 .

Vol. 107: Masafumi Akahira, Kei Takeuchi, Non-Regular Vol. 125: Valerii V. Fedorov and Peter Hackl , Model-
Statistical Estimation. vii, 183 pages, 1995. oriented Design of Experiments. viii , 117 pages, 1997.

Vol. 108: Wesley L. Schaible (Editor), Indirect Estimators in Vol. 126: Geert Verbeke and Geert Molenberghs, Linear
U.S. Federal Programs. viii, 195 pages. 1995 . Mixed Models in Practice: A SAS-Oriented Approach . xiii,
306 pages, 1997.
Vol . 109: Helmut Rieder (Editor), Robust Statistics, Data
Analysis. and Computer Intensive Methods . xiv. 427 pages. Vol. 127: Harald Niederreiter, Peter Hellekalek, Gerhard
1996. Larcher, and Peter Zinterhof(Editors), Monte Carlo and
Quasi-Monte Carlo Methods 1996, xii, 448 pp., 1997.
Vol. 110: D.Bosq, Nonparametric Statistics for Stochastic
Processes, Second Edition. xxvii, 214 pages, 1998 . Vol. 128: L. Accardi and c.c. Heyde (Editors), Probability
Towards 2000, x, 356 pp. , 1998.
Vol. III: Leon Willenborg, Ton de Waal, Statistical
Disclosure Control in Practice. xiv, 152 pages, 1996. Vol. 129: Wolfgang Hardie, Gerard Kerkyacharian,
Dominique Picard, and Alexander Tsybakov , Wavelets,
Vol. 112: Doug Fischer, Hans-J. Lenz (Editors), Leaming Approximation, and Statistical Applications, xvi, 265 pp.,
from Data. xii, 450 pages, 1996. 1998 .

Vol. 113: Rainer Schwabe, Optimum Designs for Multi- Vol. 130: Bo-Cheng Wei. Exponential Family Nonlinear
Factor Models . viii, 124 pages, 1996. Models, ix, 240 pp .. 1998.

Vol. 114: c.c. Heyde , Yu. V. Prohorov, R. Pyke, and S. T. Vol. 131: Joel L. Horowitz, Semiparametric Methods in
Rachev (Editors), Athens Conference on Applied Econometrics, ix, 204 pp., 1998.
Probability and Time Series Analysis Volume I: Applied
Probability In Honor ofJ,M. Gani. viii, 424 pages, 1996. Vol. 132: Douglas Nychka, Walter W. Piegorsch, and
Lawrence H. Cox (Editors), Case Studies in Environmental
Vol. 115: P.M. Robinson , M. Rosenblatt (Editors), Athens Statistics, viii, 200 pp., 1998 .
Conference on Applied Probability and Time Series
Analysis Volume II: Time Series Analysis In Memory of Vol. 133 : Dipak Dey, Peter MOller, and Debajyoti Sinha
E.J. Hannan. viii, 448 pages, 1996. (Editors), Practical Nonparametric and Semi parametric
Bayesian Statistics, xv , 408 pp. , 1998.
Vol . 116: Genshiro Kitagawa and Will Gersch, Smoothness
Priors Analysis of Time Series. x, 261 pages, 1996, Vol. 134: Yu. A. Kuloyants, Slatisticallnfercncc For Spatial
Poisson Processes, vii, 284 pp ., 1998.
Vol. 117: Paul Glasserrnan, Karl Sigman, David D. Yao
(Editors), Stochastic Networks. xii, 298, 1996. Vol. 135: Christian P. Robert, Discretization and MCMC
Convergence Assessment, x, 192 pp., 1998.
Vol. 118: Radford M. Neal, Bayesian Leaming for Neural
Networks. xv, 183, 1996.

You might also like