You are on page 1of 17

This article was downloaded by: [University of Colorado at Boulder Libraries]

On: 06 January 2015, At: 11:09


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Nonparametric Statistics


Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/gnst20

Asymptotic normality of a
nonparametric estimator of the
conditional mode function for
functional data
a a
M'hamed Ezzahrioui & Elias Ould-Saïd
a
L.M.P.A. J. Liouville , Universitaé du Littoral Côte d'Opale ,
Calais, France
Published online: 18 Feb 2008.

To cite this article: M'hamed Ezzahrioui & Elias Ould-Saïd (2008) Asymptotic normality of a
nonparametric estimator of the conditional mode function for functional data, Journal of
Nonparametric Statistics, 20:1, 3-18, DOI: 10.1080/10485250701541454

To link to this article: http://dx.doi.org/10.1080/10485250701541454

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Journal of Nonparametric Statistics
Vol. 20, No. 1, January 2008, 3–18

Asymptotic normality of a nonparametric estimator of the


conditional mode function for functional data
Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

M’hamed Ezzahrioui and Elias Ould-Saïd*


L.M.P.A. J. Liouville, Universitaé du Littoral Côte d’Opale, Calais, France

(Received 23 November 2005; revised 07 February 2007; accepted 02 December 2007 )

We consider the estimation of the conditional mode function when the covariables take values in some
abstract function space. It is shown that, under some regularity conditions, the kernel estimate of the
conditional mode is asymptotically normally distributed. From this, we derive the asymptotic normality
of a predictor and propose confidence bands for the conditional mode function. Simulations are drawn to
show how our methodology can be implemented.

Keywords: asymptotic normality; conditional density function; kernel estimator; functional data; small
balls probability

2000 AMS Subject Classification: Primary: 62G20; Secondary: 62M09

1. Introduction

The estimation of the mode and conditional mode function received much attention in the past
and there exists an extensive literature for both independent and dependent data. Parzen [1] estab-
lished weak consistency and asymptotic normality in the independent and identically distributed
(i.i.d.) case. Eddy [2] stated the same results under weaker conditions. The corresponding multi-
dimensional versions were obtained by Samanta [3] and Konakov [4]. Romano [5] investigated
the asymptotic behaviour of a kernel estimate of the mode with data-dependent bandwidths and
obtained results under weaker smoothness assumptions on the underlying density. In the condi-
tional and i.i.d. case, Samanta and Thavaneswaran [6] showed that the kernel estimator of the
mode is strongly consistent and asymptotically normally distributed.
In the dependent case, the strong consistency of conditional mode estimator was obtained by
Collomb et al. [7] and Ould-Saïd [8] under φ-mixing and α-mixing conditions, respectively.
These results were then applied to process forecasting. The same result with rates was established
by Quintela Del Rio and Vieu [9]. The asymptotic normality, under α-mixing conditions, was
established by Louani and Ould-Saïd [10]. In the general ergodic framework, a process prediction
was described by Ould-Saïd [11] and a result of strong consistency was stated. In the censored case,
Ould-Saïd and Cai [12] stated the uniform strong consistency with rates of the kernel estimator

*Corresponding author. Email: ouldsaid@lmpa.univ-littoral.fr

ISSN 1048-5252 print/ISSN 1029-0311 online


© 2008 Taylor & Francis
DOI: 10.1080/10485250701541454
http://www.informaworld.com
4 M. Ezzahrioui and E. Ould-Saïd

of the conditional mode function. All these results are derived when the covariables take values
in finite dimensional spaces, where the Lebesgue measure plays an important role.
In infinite dimension spaces there is no Lebesgue measure nor any analogue that may be suitable
to replace it. As a matter of fact, the invariance under translations of the Lebesgue measure plays
an important role when we deal with kernel estimators. An analogous abstract measure with this
property would be a Haar measure, but this does not generally exist in infinite dimension spaces,
not even in Hilbert spaces. Then it becomes interesting to address the estimation problem in
infinite dimensional spaces. Moreover, in the last few years, there has been increasing interest in
estimation based on functional data. For an introduction to this field (in the parametric case), we
Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

refer the reader to the monographs of Ramsay and Silverman [13,14] and Bosq [15].
There are few results in the nonparametric case. Gasser et al. [16] gave an approach to introduce
a nonparametric estimation of the mode. They highlighted the issue of the curse of dimensionality
for functional data and proposed methods to overcome the problem. Dabo-Niang [17] studied
density estimation in Banach spaces with an application to the estimation density of a diffusion
process with respect to Wiener’s measure. The kernel-type estimators of some characteristics of
the conditional cumulative distribution function and the successive derivatives of the conditional
density were introduced by Ferraty et al. [18]. Some asymptotic properties were established with
a particular application to the conditional mode and conditional quantiles. An application to a
chemiometrical data set coming from food industry is also considered. In the kernel regression
estimation framework, the nonparametric estimation was considered by Ferraty and Vieu [19–21]
in the i.i.d. case and Masry [22] established the asymptotic normality under strong mixing condi-
tions. In this case we refer the reader to the recent monograph by Ferraty and Vieu [23]. It should
be noted that problems were already addressed in abstract metric spaces (see Geffroy [24] and
Bertrand-Retali [25]).
Let {(Xi , Yi ), i ≥ 1} be a sequence of i.i.d. random vectors where Xi takes values in some
semi-metric space (S, d(. , .)) and Yi is real valued. In most practical applications, S is a normed
space which can be of infinite dimension (e.g., Hilbert or Banach space) with norm  ·  so that
d(x, x  ) = x − x  .
For x ∈ S, we denote by g(.|x) the conditional density function of Y1 given X1 = x. We assume
that g(.|x) is unimodal and its conditional mode is denoted by (x), which is defined by

g((x)|x) = max g (y|x). (1)


y∈IR

A kernel estimator of (x) is defined as the random variable n (x) which maximizes a kernel
estimator gn (.|x) of g(.|x), that is,

gn (n (x)|x) = max gn (y|x). (2)


y∈IR

Here,
fn (x, y)
gn (y|x) = (3)
n (x)
where
 n    
1 x − Xi  y − Yi
fn (x, y) = K H (1) (4)
nhφ(h) i=1 h h

and
 
1 
n
x − Xi 
n (x) = K (5)
nφ(h) i=1 h
Journal of Nonparametric Statistics 5

where K is a real-valued kernel function, h := hn a sequence of a positive real numbers which


goes to zero as n goes to infinity and φ(·) a function supposed to be strictly positive and which
we describe later. H (1) is the first derivative of a given distribution function H .
Remark that (4) and (5) are consistent estimators of α1 g(y|x)(x) =: α1 f (x, y) and α1 (x),
respectively, where α1 and (x) will be described later.
For j = 1, 2, we define the j th partial derivative with respect to the second component of
fn (. , .) by†

 n    
∂ j fn (x, y) 1 x − Xi  y − Yi
= fn(j ) (x, y) = j +1 H (j +1)
Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

j
K . (6)
∂y nh φ(h) i=1 h h

Note that the estimate n (x) is not necessarily unique and our results are valid for any choice
satisfying (2). We point out that we can specify our choice by taking
 
n (x) = inf t ∈ IR such that gn (t|x) = max gn (y|x) .
y∈IR

An analogous estimator to Equation (4) was already given in Ferraty et al. [18] in the general
setting. As far as we know, there is no result on the conditional mode function for functional data
except in a recent work by Ferraty et al. [18] regarding the uniform strong consistency with rates.
Now, it is well known that the conditional mode provides an alternative prediction method to the
classical regression function approach. There are many cases of conditional densities for which
the regression function vanishes everywhere and is therefore of no use to address the prediction
problem. An example in finite dimensional spaces is given in Ould-Saïd [11] and a practical
example in infinite dimensional space is given in Ferraty et al. [18].
Our main goal is to establish the asymptotic normality of the estimator in Equation (2) after
suitable normalization. As a consequence we get the asymptotic normality of a predictor and
propose confidence bands for the conditional mode function.
The paper is organized as follows: in Section 2 we give the assumptions and main results. The
finite dimensional case is given in Section 3. An application to prediction and confidence bands
are given in Section 4 while simulation study is given in Section 5. Finally, the proof of the main
result is relegated to Section 6 where auxiliary results are stated and proved.

2. Assumptions and main result

To formulate our assumptions, some additional notations are required. Let B(x, h) be the ball
of centre x and radius h and Wi := x − Xi  a random variable such that IP (Xi ∈ B(x, h)) = ◦
IP (Wi ≤◦
h) =: Fx (h), for all fixed x ∈ S. Let C be a compact set of IR such that (·) ∈ C,
where C denotes the interior of C. Furthermore, we assume that (x) satisfies the uniqueness
condition that is for any ε > 0 and μ(x), there exists ξ > 0 such that |(x) − μ(x)| ≥ ε implies
that |g((x)|x) − g(μ(x)|x)| ≥ ξ.
Our assumptions are gathered here for easy reference.
A1 There exist three functions (·), φ(·) (supposed increasing and strictly positive and tending
to zero as h goes to zero) and ζ0 (·) such that
(i) Fx (h) = (x)φ(h) + o(φ(h)),
(ii) for all u ∈ [0, 1], limh→0 φ(uh)
φ(h)
=: limh→0 ζh (u) = ζ0 (u).

† The remarks previously made for Equations (4) and (5) can also be made for Equation (6).
6 M. Ezzahrioui and E. Ould-Saïd

A2 The kernel K is nonnegative, with compact support [0, 1] of class C 1 on [0, 1), K(1) > 0 and
its derivative K  exists on [0, 1) and K  (t) < 0.
A3 The conditional density and its two first derivatives satisfy a Hölder condition with respect to
each variable that is, there exist constants β > 0 and ν ∈ (0, 1] such that:

∀(y1 , y2 ) ∈ IR2 , ∀(x1 , x2 ) ∈ ϑ(x) × ϑ(x),


 (j )  
g (y1 |x1 ) − g (j ) (y2 |x2 ) ≤ Cx |y1 − y2 |ν + x1 − x2 β
Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

for j = 0, 1, 2, where ϑ(x) is a neighbourhood of x, with the convention g (0) (·|·) = g(·|·)
and where Cx is constant that depends on x.
Furthermore, g(.|x) is differentiable up to order 3 and supy |g (3) (y|x)| < +∞ uniformly
on x.
A4 H (1)
is twice differentiable and

(i) IR t 2 H (1) (t) dt < +∞ and IR tH (1) (t) dt = 0,



 2
(ii) IR |t|ν H (2) (t)  dt < +∞ with the same ν as in A3,
(iii) ∀(y1 , y2 ) ∈ IR2 , H (j ) (t1 ) − H (j ) (t2 ) ≤ C |t1 − t2 | for j =1, 3 and H (j ) are bounded for
j = 1, 2, 3.
A5 The bandwidth h satisfies
(i) (nh3 φ(h))/ log n −→ ∞, as n → ∞,
(ii) nh3+2β φ(h) → 0 and nh7 φ(h) → 0 as n → ∞.

Remark 1 Assumption A1(i) plays an important role in our methodology. It is known as (for
small h) the ‘concentration property’ in infinite dimension spaces. For many examples, the small
ball probability IP (W1 < h) can be approximated, around zero, as product of two independent
functions (x) and φ(h). This idea was adopted by Masry [22] who reformulated it from Gasser
et al. [16] one’s, as a condition related to the functional distribution of W1 . The increasing assump-
tion on φ(·) implies that ζh (·) is increasing bounded by 1 and then integrable (all the more so
ζ0 (·) is integrable).
This assumption which has been used by many authors considers (·) as an infinite-dimension
analogue of a probability density, while φ may be interpreted as a volume parameter (usually
no differentiable assumption is required on φ). In the case of finite dimensional spaces, that is
S = IRd , taking  as the density, it can be seen that φ(h) = C(d)hd where C(d) is the volume of the
unit ball in IRd . Furthermore, there exist many examples fulfilling the decomposition mentioned
above.
We quote the following (which can be found in [18]):

(1) Fx (h) = (x)hγ for some γ > 0,


(2) Fx (h) = (x)hγ exp − hCp for some γ > 0 and p > 0,
(3) Fx (h) = |(x)
ln h|
.

Recall that the second case corresponds to the well-known Ornstein–Uhlenbek, general diffusion
and general Gaussian processes.
The function ζh (·) is increasing for all fixed h. Its pointwise limiting ζ0 plays a determinant
role and intervenes in the asymptotic distribution through the variance term. With simple algebra,
we can specify this function in these examples:

(1) ζ0 (u) = uγ ,
(2) ζ0 (u) = δ1 (u) where δ1 (·) is a Dirac function,
(3) ζ0 (u) = 1]0,1] (u).
Journal of Nonparametric Statistics 7

Remark 2 Assumption A3 is the only condition involving the conditional probability density of
Y given X. It means that g(·|·) and its derivatives satisfy the Hölder condition with respect to
each variable. It is sufficiently weak and we do not need to introduce the notion of density for the
functional random variable X. Therefore, the concentration condition A1 plays an important role.
Here we point out that our assumptions are very usual in the estimation problem for functional
regressors (see, e.g., Ferraty et al. [18]).

Remark 3 Assumptions A2, A4 and A5 are classical in functional estimation for finite or infinite
dimension spaces. Observe that Assumption A4 (i) implies that H (1) has finite moment of order ν.
Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

By the definition of the conditional mode function, we have

g (1) ((x)|x) = 0.

Similarly,

gn(1) (n (x)|x) = 0.

Furthermore, we assume that g (2) ((x)|x) = 0 (and therefore < 0) and gn(2) (n (x)|x) = 0. By
a Taylor expansion of gn(1) (.|x) in the neighbourhood of (x), we have

gn(1) ((x)|x)
n (x) − (x) = −
gn(2) (∗n (x)|x)

where ∗n (x) is between n (x) and (x). Using Equations (5) and (6), we can write

fn(1) (x, (x))


n (x) − (x) = − , (7)
fn(2) (x, ∗n (x))

if the denominator does not vanish. In what follows, we use the notations f (j ) (x, y) :=
(x)g (j ) (y|x) for j = 0, 1, 2.
To prove our result, we show that the numerator when suitably normalized is asymptotically
normally distributed whereas the denominator converges in probability. Slutsky’s theorem permits
to conclude. In order to state our main result, we define the set  = {x : x ∈ S, f (x, (x)) = 0}.

THEOREM 1 Under Assumptions A1–A5, we have for any x ∈ 


 1/2
nh3 φ(h)(α1 f (2) (x, (x)))2 D
(n (x) − (x)) −→ N (0, 1) as n → ∞ (8)
σ 2 (x, (x))

D
where −→ denotes the convergence in distribution,

 2
σ (x, (x)) = α2 f (x, (x))
2
H (2) (t) dt
IR

and
1  
αl = K l (1) − K l (u) ζ0 (u)du for l = 1, 2.
0
8 M. Ezzahrioui and E. Ould-Saïd

3. Finite dimension

In the finite dimensional case, that is S = IRd (without loss of generality we set d = 1) and if
the random variable X has a probability density (with respect to Lebesgue’s measure) denoted by
(·) and is of C 1 class then φ(h) = h.
Here we point out that by choosing a kernel as a density the
quantities α1 and α2 becomes 1 and IR K 2 (u) du, respectively. In this case our result becomes as
follows:

COROLLARY 1 Under Assumptions A2–A5, we have


Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

 1/2
nh4 (f (2) (x, (x))2 ) D
(n (x) − (x)) −→ N (0, 1) as n → ∞
σ 2 (x, (x))

where

 2
σ 2 (x, (x)) = f (x, (x)) K(u)H (2) (t) du dt
IR IR

and f (·, ·) is the joint probability density of (X, Y ).

This result is identical to that obtained by Samanta and Thavaneswaran [6]. We may notice here
that our assumptions are slightly weaker than those imposed in the latter paper.

4. Some perspectives for predictive confidence band

4.1. Prediction

For n ∈ IN∗ , let Xi (t), i = 1, . . . , n be functional random variables with t ∈ IR. For each curve
Xi (t), we have a real response variable Yi which corresponds to some modality of our problem. The
question is: given a new curve Xn+1 (t) = xnew (t) =: xnew (t), can we predict the corresponding
response ynew ? This is a prediction problem for infinite dimensional explanatory random variables.
This problem is usually addressed by means of regression. As mentioned before, the conditional
mode gives an alternative approach. The predictor estimator is obtained by computing the quantity

ŷnew := n (xnew ) = arg max gn (y|xnew ).


y

The theoretical predictor of the response associated toxnew is clearly defined by Equation (1),
that is
ynew := (xnew ).

Applying Theorem 1, we have the following.

COROLLARY 2 Under the assumptions of Theorem 1, we have


 1/2
nh3 φ(h)(α1 f (2) (xnew , (xnew ))2 ) D
(n (xnew ) − (xnew )) −→ N (0, 1) as n → ∞.
σ 2 (xnew , (xnew ))
Journal of Nonparametric Statistics 9

4.2. Predictive confidence bands

Defining consistent estimators of normalization constants αl for l = 1, 2, given by


n  
1 x − Xi 
αl,n = K l
nF̂x (h) i=1
h


where F̂x (h) = (1/n) ni=1 1{Wi ≤x} . A plug-in estimate for the asymptotic variance σ 2 (x, (x)),
is readily obtained using the estimators n (·), fn (·, ·) and α2,n of (x), f (·, ·) and α2 , respectively,
Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015


 2
that is σn2 (x, n (x)) = α2,n fn (x, n (x)) IR H (2) (t) dt.
Clearly, σn2 (x, n (x)) is a consistent estimator of σ 2 (x, (x)). Then using the above estima-
tor and the estimates α1,n and fn(2) (·, ·) of α1 and f (2) (·, ·), respectively, yields the following
asymptotic (1 − ζ )-confidence bands for (x).
 1/2
σn 2 (x, n (x))
n (x) ± μ1−ζ /2 ×
nh3 φ(h)(α1,n fn(2) (x, n (x)))2

where μ1−ζ /2 denotes the 1 − ζ /2-quantile of the standard normal distribution.


Here we point out that, using Equations (4) and (6), the above formula does not depend on φ(h).

5. Simulation study

In this section, we implement our methodology and see how accurate are the confidence bands
obtained by using the asymptotic normality property. To this purpose, we consider the classical
nonparametric functional regression model

Y = R(X) + ε

where ε is a centred r.v. independent of X.


We consider a sample of stochastic processes defined on [0, 1] by Xt (W ) = cos(W − π(2t −
1)) where W is uniformly distributed on [0, 1]. We carry out the simulation with 100-sample of
the curve X as follows:
We generate 200 random variables Wi =: W (i), i = 1, . . . , 200, uniformly distributed,
then we construct for any W (i) a vector X(i, j ) = cos (W (i) − π((2j/100) − 1)) for j ∈
{1, 2, . . . , 100} as the discretized version of the function Xt (Wi ) = cos(Wi − π(2t − 1)) which
is represented by Figure 1.

3/4   2
The regression function is given by R(x) = 2π 1
1/2 x (t) dt.
For the response sample, we genere εi → N (0, 1) and then
3/4
Yi = 2π sin2 (Wi − 2π(2t + 1)) dt + εi i = 1, . . . , 200.
1/2

We consider the quadratic kernel defined by:

3
K(x) = (1 − x 2 )1[0,1) and K(1) > 0
2
10 M. Ezzahrioui and E. Ould-Saïd
Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

Figure 1. Curves of Xi (t) =: X(i, j ) i = 1, . . . , 200, j = 1, . . . , 100.

and the distribution function H (·) is defined by


x
15 2 
H (x) = t 1 − t 2 1[−1, 1] (t) dt.
−∞ 4

We choose our optimal bandwidth by minimizing the mean square error between the predicted
and the true values over a set of the known bandwidth values.
We point out that H (·) satisfies our hypotheses. In practice, the semi-metric choice is based on
the regularity of the curves X(·) which are under study. In our case the semi-metric is defined by
the L2 -distance between the second derivatives of the curves. In order to construct conditional
confidence bands we proceed by the following algorithm:

Step 1 We split our data into randomly chosen subsets:


• (Xj , Yj )j ∈J training sample,
• (Xi , Yi )i∈I test sample.
Step 2 We calculate the estimator n (Xj ) for all j ∈ J .
Step 3 For each Xi in the test sample, we set: i∗ := Argminj ∈J d(Xi , Xj ).
Step 4 For all i ∈ I we define the confidence bands by
⎡ ⎛ ⎞⎤
2
⎢ ⎜ σn (Xi ∗ , n (Xi ∗ )) ⎟⎥
⎣n (Xi∗ ) ± μ0.975 × ⎝  2 ⎠⎦
(2)
J h3 φ(h) α1,n fn (Xi ∗ , n (Xi ∗ ))

where μ0.975 is the 5% quantile of a standard normal distribution.


Step 5 We present our results by plotting the extremities of the predicted values versus the true
values in Figure 2. The values appearing on the ordinates axis are multiplied by 10−1 .

We see clearly that our predicted values fit the real values very well and the latter are all in the
confidence interval.
Journal of Nonparametric Statistics 11
Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

Figure 2. The 95% conditional predictive bands. The solid curve connects the true values. The crossed curve joins the
predicted values. The dashed curves connects the lower and upper predicted values.

6. Proofs

The proof of our main result is split up into several Lemmas. The first one plays the role of the
classical Böchner Lemma in a finite dimension.

LEMMA 1 Suppose that Assumptions A1 and A2 hold. For all fixed x, we have:
  
1 x − Xi 
IE K l −→ αl (x) as n −→ ∞, for l = 1, 2.
φ(h) h

Proof Integrating by parts and by A1(i) and A2, we have


   h u
1 x − Xi  1
IE K l = Kl dIPX1 −x (u)
φ(h) h φ(h) 0 h
1
K l (1)Fx (h) 1  l 
= − K (u) Fx (hu)du
φ(h) φ(h) 0
 1 
 l 
= K (1) −
l
K (u) ζh (u)du [(x) + o(1)] . (9)
0

From Assumption A1 (ii) it follows that the term in the right-hand side of Equation (9) converge
to αl (x) as n → ∞. 

The following Lemma deals with the asymptotic behaviour of the bias term of the first derivative
when suitably normalized.

LEMMA 2 Under Assumptions A1, A2, A3, A4 (i) and A5 (ii), we have:
!  " #
nh3 φ(h) IE fn(1) (x, (x)) −→ 0, as n −→ ∞.
12 M. Ezzahrioui and E. Ould-Saïd

Proof We have,
    
" # 1 x − X1  (2) (x) − Y1
IE fn(1) (x, (x)) = IE K H
h2 φ(h) h h
     
1 x − X1  (2) (x) − Y1
= 2 IE K IE H |X1 .
h φ(h) h h
An integration by parts, together with a change of variables, leads to
    
(2) (x) − Y1 (2) (x) − z
|X1 = g(z |X1 )dz
Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

IE H H
h IR h

" #
= h2 H (1) (t) g (1) ((x) − th |X1 ) − g (1) ((x) − th |x) dt
IR

+ h2 H (1) (t)g (1) ((x) − th |x) dt
IR

=: I1 + I2 .

By a two-order Taylor expansion of g (1) ((x) − th|x) around (x), the fact that (x) is the
conditional mode function, we get
     
" # 1 x − X1  1 x − X1 
IE fn(1) (x, (x)) = 2 IE K I2 + IE K I1
h φ(h) h φ(h) h
   
1 x − X1 
= IE K H (t) − th g (2) ((x) |x)
(1)
φ(h) h IR
  
t 2 h2 (3)  ∗ 1 x − X1 
+ g  (x) |x dt + 2 IE K I1
2 h φ(h) h
where ∗ (x) is between (x) and (x) − th. Then, by Assumption A4 (i) we have
!   
" (1) # ! 1 x − X1 
nh φ(h)IE fn (x, (x)) = nh φ(h)
3 7 IE K
2φ(h) h


× t 2 H (1) (t)g (3) ∗ (x) |x dt
IR
!   
1 x − X1 
+ nh3 φ(h) IE K I1 .
h2 φ(h) h
On the other hand, by A3 and the fact that H (1) is a probability density we have
       
 1 x − X1   1 x − X1 
 IE K I  ≤ IE K |I1 |
 h2 φ(h) h
1 
h2 φ(h) h
  
1 x − X1 
≤ IE K H (1) (t) X1 − xβ dt
φ(h) h IR
  
hβ x − X1 
= IE K .
φ(h) h
Making use of the last part of A3, A4 (i) A5 ii) and using Lemma 2.1 for l = 1 we get,
! " #
nh3 φ(h)IE fn(1) (x, (x)) = o(1)
which gives the result. 
Journal of Nonparametric Statistics 13

LEMMA 3 Under Assumptions A1, A2, A3 and A4 (i), for j = 1, 2, we have:


" #
IE fn(j ) (x, y) −→ α1 f (j ) (x, y) as n → ∞.

Proof As the proof is identical for j = 1 and j = 2, we give only the first case. Here and after,
we suppose that X1 ∈ B(x, h). As previously we have
    
y − Y1 y−z
IE H (2) |X1 = H (2) g(z |X1 ) dz
h IR h

Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

=h 2
H (1) (t)g (1) (y − th |X1 ) dt
IR

" #
=h 2
H (1) (t) g (1) (y − th|X1 ) − g (1) (y|x) dt
IR

+h 2
H (1) (t)g (1) (y|x) dt
IR

" #
= h2 H (1) (t) g (1) (y − th|X1 ) − g (1) (y|x) dt + h2 g (1) (y|x) .
IR

Using the fact that H (1) (·) is a probability density. Hence, from Equation (6)
  
" (1) # 1 x − X1  " #
IE fn (x, y) = IE K H (1) (t) g (1) (y − th |X1 ) − g (1) (y |x) dt
φ(h) h IR
  
1 x − X1 
+ IE K g (1) (y |x) . (10)
φ(h) h
Using A3 we get
 
 " #  " #
 H (1) (t) g (1) (y − th |X1 ) − g (1) (y |x) dt  ≤ Cx H (1) (t) |t|ν hν + X1 − xβ dt
 
IR IR

≤ Cx h ν
|t|ν H (1) (t) dt + Cx hβ .
IR

Now by Lemma 1 and A4 (i) the first term of the right-hand side of Equation(10) goes to zero as
n goes to infinity, and by Lemma 1 again, one gets
  
1 x − X1 
IE K g (1) (y |x) −→ α1 (x)g (1) (y |x).
φ(h) h
Finally
" #
IE fn(1) (x, y) −→ α1 f (1) (x, y), as n → ∞. 
(j )
The following result gives the uniform almost complete convergence of fn (x, y) for j = 0, 2.
We point out that this result and the continuity of f (2) (·, ·) give the convergence in probability of
the denominator in Equation (7) to α1 f (2) (x, (x)). Furthermore, this result in conjunction with
Lemma 1 improves that of Ferraty et al. [18].

LEMMA 4 Under Assumptions A1, A2, A3, A4 (i), (iii) and A5 (i), for j = 0, 2 and n large
enough, we have
$ 
 (j )   β log n
sup fn (x, y) − α1 f (x, y) = O h + O (h ) + O
(j ) ν
, a.co.
y∈C nhj +1 φ(h)
14 M. Ezzahrioui and E. Ould-Saïd

Proof Proceeding as in Lemma 3, we get


 " # 
sup IE fn(j ) (x, y) − α1 f (j ) (x, y) = O(hβ ) + O(hν ). (11)
y∈C

Now we deal with the variance term expressed in the case j = 2 (the proof is similar for j = 0).
Consider a cover of C by a finite number of intervals Ck = (yk − hη ; yk − hη ), k = 1, · · · , ln
where η > 4 (η > 2 for j = 0). Since C is bounded there exists M > 0 such that ln ≤ Mh−η .
Clearly, we have
 " #  
sup f (2) (x, y) − IE f (2) (x, y)  ≤ max sup f (2) (x, y) − f (2) (x, yk )
Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

n n n n
y∈C 1≤k≤ln y∈Ck

 " #
+ max fn(2) (x, yk ) − IE fn(2) (x, yk ) 
1≤k≤ln
 " # " #
+ max sup IE fn(2) (x, yk ) − IE fn(2) (x, y) 
1≤k≤ln y∈Ck


=: J1 + J2 + J3 . (12)
Now, as J1 and J3 can be treated in the same manner, we deal only with the former. Making use
of A4 (iii) we get
n 
      
1  (3) y − Yi 
(3) yk − Yi  x − Xi 
J1 ≤ 3 sup H − H × K
nh φ(h) y∈C i=1  h h  h
   
1 
n
x − Xi 
≤ Che ta−4 K
nφ(h) i=1 h

=: Chη−4 n (x).
Writing n (x) = (n (x) − IE [n (x)]) + IE [n (x)], it follows by Lemma 1 that the the expecta-
tion term is bounded, whereas the variance converges almost completely to zero by Hoeffding’s
inequality under A5 (i). Finally, we get
J1 = o(1). (13)
For J2 , first observe that simple calculation gives
" " ##  (2) " # (3)
IE fn(2) (x, yk ) − IE fn(2) (x, yk ) = 0 and f (x, yk ) − IE f (2) (x, yk )  ≤ 2K H
n n 3
nh φ(h)
where K and H (3) are the upper bounds of K and H (3) respectively.
Then, from Hoeffding’s inequality (see Shorack and Wellner [26], p. 855) we have, for any
ε>0 % &
 (2) " (2) # ε2 n2 h6 φ 2 (h)
 
IP fn (x, yk ) − IE fn (x, yk ) > ε ≤ exp − .
2(K H (3) )2
Consequently
% &
ε2 n2 h6 φ 2 (h)
IP {J2 > ε} ≤ ln exp −
2(K H (3) )2
ε 2 n2 h6 φ 2 (h)
= M(nh)−η nη−C log n (14)
!
where C is a positive constant. Taking ε = ε0 log n/nh3 φ(h) with ε0 > 0, the r.h.s. of
Equation (14) is the general term of a convergent series. Therefore by Borel–Cantelli’s lemma
Journal of Nonparametric Statistics 15

we get
$ 
log n
J2 = O a.co.
nh3 φ(h)
In conjunction with Equations (11) and (13) this concludes the proof of the lemma. 

The following lemma deals with the asymptotic behaviour of the variance term of f (1) (x, y).

LEMMA 5 Under Assumptions A1, A2, A3 and A4, we have



Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

 (1)  (2) 2
nh φ(h)Var fn (x, y) −→ α2 f (x, y)
3
H (u) du.
IR

Proof As the variables are i.i.d., one get from Equation (6)
'    2 (
 (1) 1 2 x − X1  (2) y − Y1
nh φ(h)Var fn (x, y) =
3
IE K H
hφ(h) h h
    
1 x − X1  y − Y1
− IE2 K H (2)
hφ(h) h h
=: An − Bn . (15)
By simple algebraic calculation, we get
⎡ )  x−X    * ⎤2
(2) y−Y1
IE K h
1
H h
Bn = h3 φ(h) ⎣ ⎦
h2 φ(h)
" " ##2
= h3 φ(h) IE fn(1) (x, y) −→ 0 as n→∞
by Lemma 3. Now,
'   '  2 ((
1 2 x − X1  (2) y − Y1
An = IE K IE H |X1 . (16)
hφ(h) h h

We have
'  2 
−   (2) 2
y Y 1 
IE H (2)
|X1  = h H (t) [g(y − th|X1 ) − g(y|x)] dt + h g(y|x)
h  IR

 (2) 2
× H (t) dt. (17)
IR

We get, under A3
 
  (2) 2   (2) 2 " ν ν #
 
H (t) [g (y − th |X1 ) − g (y |x)] dt  ≤ Cx H (t) |t| h + X1 − xβ dt

IR IR

 2 " #
≤ Cx H (2) (t) |t|ν hν + hβ dt
IR

 2
= Cx hν |t|ν H (2) (t) dt + Cx hβ
IR

 (2) 2
× H (t) dt.
IR
16 M. Ezzahrioui and E. Ould-Saïd

By A4 (ii), we get

 (2) 2
h H (t) [g(y − th|X1 ) − g(y|x)] dt −→ 0, as n −→ ∞. (18)
IR

Therefore, applying Lemma 1, we obtain from Equations (16), (17) and (18)
)  *
2 x−X1 
 (2) 2 IE K h
lim An = lim g(y|x) H (t) dt
n→∞ n→∞ IR φ(h)

Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

 (2) 2
= α2 (x)g(y|x) H (t) dt
IR

 (2) 2
= α2 f (x, y) H (t) dt,
IR

which gives the result. 

Now, for any (x, y) ∈ S × IR, let us consider the following centred i.i.d. random variables
    
1 x − Xi  y − Yi
Zin (x, y) = √ K H (2)
nhφ(h) h h
     
x − Xi  y − Yi
−IE K H (2) 1 ≤ i ≤ n.
h h
Simple algebra shows that

n !  " #
Zin (x, y) = nh3 φ(h) fn(1) (x, y) − IE fn(1) (x, y) .
i=1

Hence, from Lemma 5, we have


 n 
  (1)  (2) 2
Var Zin (x, y) = nh φ(h)Var fn (x, y) −→ α2 f (x, y)
3
H (t) dt. (19)
i=1 IR

n
From Lemmas 2, 4 and 5, all that is left to be shown is that i=1 Zin (x, y) satisfies the Lindeberg
Theorem, which is given by the following Lemma.

LEMMA 6 Under the assumptions of Lemma 5 and A5 (i), for y ∈ {t | g(t|x) = 0}, we have
n

∀ε > 0, n
2
Zin dIP(Xi ,Yi ) −→ 0, as n → ∞.
i=1 { 2
Zin (x,y)>ε 2 Var ( i=1 Zin (x,y))}

Proof On the one hand we have


   2
2 x − Xi  y − Yi
2
Zin ≤ K2 H (2)
nhφ(h) h h
    
2 x − Xi  (2) y − Yi
+ 2
IE K H . (20)
nhφ(h) h h
Note that from Lemma 3 that the second term in the right-hand side of Equation (20) goes to
zero as no goes to infinity. On the other hand, by Equation (19) there exists n0 ∈ IN∗ , such that:
Journal of Nonparametric Statistics 17

∀ n ≥ n0 we have
 

n
α2 f (x, y)  2
Var Zin (x, y) ≥ H (2) (t) dt. (21)
i=1
2 IR

Now denote by
   2     
x − Xi  y − Yi x − Xi  (2) y − Yi
A(Xi , Yi ) := K 2
H (2)
+ IE 2
K H
h h h h
Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

clearly we have
2A(Xi , Yi )
2
Zin (x, y) ≤ .
nhφ(h)
Using Equation (21), for n ≥ n0 we have
% n &  
 α2 f (x, y)  (2) 2
2 2
Zin (x, y) > ε Var Zin (x, y) ⊂ Zin 2
(x, y) > ε2 H (t) dt
i=1
2 IR
2 

:= Zin (x, y) > 2ε

⊂ A(Xi , Yi ) > ε nhφ(h)
%    2 &
2 x − Xi  (2) y − Yi ε nhφ(h) +
⊂ K H >
h h 2
       +
x − Xi  (2) y − Yi ε nhφ(h)
IE 2
K H > =: 1n 2n .
h h 2
It is easy to see that for n large enough 2n is empty, under A5 (i).
In the same way, since K and H (2) arebounded, we have for n large enough that 1n is empty,
2 n
under A5 (i). Therefore, Zin > ε2 Var i=1 Zin is empty, for n large enough, which completes
the proof. 

Proof of Theorem 1 Note first that by simple algebra, we can show that

|g(n (x)|x) − g((x)|x)| ≤ 2 sup |gn (y|x) − g(y|x)| . (22)


y∈C

The uniqueness hypothesis of the conditional mode function, Lemma 4 and simple algebra permit
us to state the convergence of n (x) to (x) almost completely.
Now, we have
!  " #
! nh3 φx (h) fn(1) (x, (x)) − IE fn(1) (x, (x))
nh φx (h) (n (x) − (x)) = −
3
fn(2) (x, ∗n (x))
! " #
nh3 φx (h)IE fn(1) (x, (x))
− . (23)
fn(2) (x, ∗n (x))
IP
By Lemma 4 for j = 2, we get fn(2) (x, ∗n (x)) −→ α1 f (2) (x, (x)). Then by Lemma 2, we
conclude that the last term of the right-hand side of Equation (23) goes to zero as n goes to
infinity. Now, Lemma 6 permits us to conclude. 
18 M. Ezzahrioui and E. Ould-Saïd

Acknowledgements
We are grateful to three anonymous referees and an associate editor, whose careful reading gave us the opportunity to
improve the quality of the paper. Many thanks to one of the three referees whose appropriate comment led us to clarify
some assumptions.

References

[1] E. Parzen, On the estimation of a probability density function and mode,Ann. Math. Statist. 33 (1962), pp. 1065–1076.
[2] W.F. Eddy, The asymptotic distribution of kernel estimators of the mode, Z. W. Giebete 59 (1982), pp. 279–290.
Downloaded by [University of Colorado at Boulder Libraries] at 11:09 06 January 2015

[3] M. Samanta, Nonparametric estimation of the mode of a multivariate density, South African Statist. J. 7 (1973),
pp. 109–117.
[4] V.D. Konakov, On the asymptotic normality of the mode of multidimensional distributions, Theory Probab. Appl. 19
(1974), pp. 794–799.
[5] J.P. Romano, On weak convergence and optimality of kernel density estimates of the mode, Ann. Statist. 16 (1988),
pp. 629–647.
[6] M. Samanta andA. Thavaneswaran, Non-parametric estimation of conditional mode, Comm. Statist. Theory Methods
16 (1990), pp. 4515–4524.
[7] G. Collomb, W. Härdle, and S. Hassani, A note on prediction via conditional mode estimation, J. Statist. Plann.
Inference 15 (1987), pp. 227–236.
[8] E. Ould-Saïd, Estimation nonparamétrique du mode conditionnel. Application à la prévision, C. R. Acad. Sci. Paris.
Sér. I 316 (1993), pp. 943–947.
[9] A. Quintela del Rio and P. Vieu, A nonparametric conditional mode estimate, Nonparametr. J. Statist. 8 (1997),
pp. 253–266.
[10] D. Louani and E. Ould-Saïd, Asymptotic normality of kernel estimators of the conditional mode under strong mixing
hypothesis, J. Nonparametr. Statist. 11 (1999), pp. 413–442.
[11] E. Ould-Saïd, A note on ergodic processes prediction via estimation of the conditional mode function, Scand. J.
Statist. 24 (1997), pp. 231–239.
[12] E. Ould-Saïd and Z. Cai, Strong uniform consistency of nonparametric estimation of the censored conditional mode
function, J. Nonparametr. Statist. 17 (2005), pp. 797–806.
[13] J. Ramsay and B. Silverman, Applied Functional Data Analysis: Methods and Case Studies, Springer Verlag, New
York, 2002.
[14] ———, Functional Data Analysis, 2nd ed., Springer Verlag, New York, 2005.
[15] D. Bosq, Linear Processs in Function Spaces, Lectures Notes in Statistics, vol. 129, Springer Verlag, New York,
2000.
[16] T. Gasser, P. Hall, and B. Presnell, Nonparametric estimation of the mode of a distribution of random curves, J. Roy.
Statist. Soc. Ser. B 60 (1998), pp. 681–691.
[17] S. Dabo-Niang, Kernel density estimator in an infinite dimensional space with a rate of convergence in the case of
diffusion process, Appl. Math. Lett. 17 (2004), pp. 381–386.
[18] F. Ferraty, A. Laksaci, and P. Vieu, Estimating some characteristics of the conditional distribution in nonparametric
functional models, Statist. Inference Stoch. Process 9 (2006), pp. 47–76.
[19] F. Ferraty and P. Vieu, The functional nonparametric model and application to spectrometric data, Comput. Statist.
Data Anal. 17 (2002), pp. 545–564.
[20] ———, Curves discrimination: a nonparametric functional approach, Comput. Statist. Data Anal. 44 (2003),
pp. 161–173.
[21] ———, Nonparametric models for functional data, with application in regression times series prediction and curves
discrimination, J. Nonparametr. Statist. 16 (2004), pp. 111–127.
[22] E. Masry, Nonparametric regression estimation for dependent functional data: Asymptotic normality, Stoch. Proc.
Appl. 115 (2005), pp. 155–177.
[23] F. Ferraty and P. Vieu, Nonparametric Functional Data Analysis. Theory and Practice, Springer, New York, 2006.
[24] J. Geffroy, Sur l’estimation d’une densité dans un espace métrique, C. R. Acad. Sci. Paris, Sér. A 278 (1974),
pp. 1449–1452.
[25] M. Bertrand-Retali, Convergence uniforme d’un estimateur de la densité par la méthode du noyau, Publ. Inst. Statist.
Univ Paris 22 (1977), pp. 1–42.
[26] G.R. Shorack and J.A. Wellner, Empirical Processes with Applications to Statistics, Wiley, New York, 1986.

You might also like