You are on page 1of 18

Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference


journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i

Rate of uniform consistency for nonparametric estimates with functional


variables
Frédéric Ferratya , Ali Laksacib , Amel Tadjc, ∗ , Philippe Vieua
a
Université Paul Sabatier, Toulouse, France
b
Université Djillali Liabes, Sidi Bel Abbes, Algeria
c
Université de Sidi Bel Abbès, BP 89 Sidi Bel Abbès 22000, Algeria

A R T I C L E I N F O A B S T R A C T

Article history: In this paper we investigate nonparametric estimation of some functionals of the conditional
Received 16 April 2008 distribution of a scalar response variable Y given a random variable X taking values in a semi-
Received in revised form metric space. These functionals include the regression function, the conditional cumulative
21 July 2009
distribution, the conditional density and some other ones. The literature on nonparametric
Accepted 21 July 2009
functional statistics is only concerning pointwise consistency results, and our main aim is to
Available online 6 August 2009
prove the uniform almost complete convergence (with rate) of the kernel estimators of these
Keywords: nonparametric models. Unlike in standard multivariate cases, the gap between pointwise and
Uniform almost complete convergence uniform results is not immediate. So, suitable topological considerations are needed, implying
Kernel estimators changes in the rates of convergence which are quantified by entropy considerations. These
Functional data theoretical uniform consistency results are (or will be) key tools for many further develop-
Entropy ments in functional data analysis.
Semi-metric space © 2009 Elsevier B.V. All rights reserved.

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
2. Topological considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
2.1. Kolmogorov's entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
2.2. Same examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
3. Estimation of the regression function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
4. Conditional cumulative distribution estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
5. Conditional density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
6. Some direct consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
6.1. Estimation of the conditional hazard function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
6.2. Conditional mode estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
7. Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
7.1. Impact of the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
7.2. General comments on the hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
7.3. Comments on convergence rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Appendix A. Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

∗ Corresponding author.
E-mail address: ameltdz@yahoo.fr (A. Tadj).

0378-3758/$ - see front matter © 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.jspi.2009.07.019
336 F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

1. Introduction

Studying the link between a scalar response variable Y given a new value for the explanatory variable X is an important
subject in nonparametric statistics, and there are several ways to explain this link. For examples, the conditional expectation, the
conditional distribution, the conditional density and the conditional hazard function. The purpose of this paper is to give some
contribution to the nonparametric estimation of these various conditional quantities when the explanatory variable is functional.
This investigation is motivated by the fact that there is an increasing number of examples coming from different fields of applied
sciences for which the data are curves and there are many nonparametrical statistical problems which occur in the functional
setting (see, Ferraty and Vieu, 2006, for an extensive discussion on nonparametric statistics for functional data). Note that the
modelization of functional variable is becoming more and more popular since the publication of the monograph of Ramsay
and Silverman (1997) on functional data analysis. However, the first results concerning the nonparametric models (mainly the
regression function) were obtained by Ferraty and Vieu (2000). They established the almost complete pointwise consistency1
of kernel estimators of the regression function when the explanatory variable is functional and the observations are i.i.d. Their
study is extended to nonstandard regression problems such that time series prediction (see, Ferraty et al., 2002). Dabo-Niang
and Rhomari (2003) stated the convergence in Lp norm of the kernel estimator of this model, and Delsol (2007) states the exact
asymptotic expression for Lp errors. The asymptotic normality result for the same estimator in the strong mixing case has been
obtained by Masry (2005) and extended by Delsol (2009). The kernel type estimation of some characteristics of the conditional
cumulative distribution function and the successive derivatives of the conditional density have been introduced by Ferraty et
al. (2006). They established the almost complete consistency for i.i.d. observations. The strong mixing case has been studied
by Ferraty et al. (2005). Recently, Ferraty et al. (2007) gave the asymptotic expansion of the mean squared error of the kernel
estimator of the regression function. Pointwise asymptotic properties of a kernel estimate of the conditional hazard function
have been investigated by Ferraty et al. (2008a). Among the lot of papers concerning the nonparametric models related with the
conditional distribution of a real variable given a random variable taking values in infinite dimensional spaces, we only refer to
papers by Dabo-Niang and Laksaci (2007) and Ezzahrioui and Ould-Sa¨d (2008).
While this literature is only concerning pointwise asymptotic, our interest in this paper is to establish the uniform almost
complete convergence of the nonparametric estimates of the various conditional quantities mentioned above. Uniform con-
sistency results have been successfully used in standard nonparametric setting in order to derive asymptotic properties for
data-driven bandwidth choice, additive modelling or multi-step estimation. So, it is natural in this setting of functional data
analysis (FDA) to investigate in a systematic way uniform consistency properties. Indeed, because the FDA topic youth one can
expect in a near future that all these results will be useful in numerous functional statistical methodologies like data-driven
procedures or additive modelling (see, Section 7 for more detailed motivations or related bibliography). Moreover, this work
completes the results obtained in Ferraty and Vieu (2006) where the pointwise almost complete consistency with rate of these
models is given. It is worth noting that the uniform convergence is not a direct extension of the previous pointwise results.
Indeed, it requires additional topological conditions, expressed here in terms of Kolmogorov's entropy. We will see that, unlike in
standard nonparametric statistics, these infinite dimensional topological considerations may lead in some cases (see for instance,
Example 3 in Section 2.2) to rates which are slower for uniform than for pointwise results. At last, all these asymptotic results
are established under conditions related to some concentration properties expressed in terms of small balls probabilities of the
underlying explanatory variable. We note that our hypotheses and results unify both cases of finite and infinite dimension of the
regressors, which permits to overcome the curse of dimensionality problem. Section 2 focuses on topological considerations via
Kolmogorov's entropy, whereas Section 3 deals with a general regression model. In Section 4, we studies the conditional cumula-
tive distribution and the conditional density estimation is developed in Section 5. In Section 6, we emphasize the consequence of
the previous results to the estimation of the conditional mode and conditional hazard function. Finally, in Section 7, we comment
the obtained results and their potential impact on the statistical literature as key tools for many further advances in FDA.
Throughout this paper, we consider a sample of independent pairs (Xi , Yi )1 ⱕ i ⱕ n identically distributed as (X, Y) which is a
random vector valued in F × R, where F is a semi-metric space, d denoting the semi-metric and we will use the notation
B(x, h) = {x ∈ F/d(x , x) ⱕ h}.

2. Topological considerations

2.1. Kolmogorov's entropy

The purpose of this section is to emphasize the topological components of our study. Indeed, as indicated in Ferraty and Vieu
(2006), all the asymptotic results in nonparametric statistics for functional variables are closely related to the concentration
properties of the probability measure of the functional variable X. Here, we have moreover to take into account the uniformity
aspect. To this end, let SF be a fixed subset of F; we consider the following assumption:

(H1) ∀x ∈ SF , 0 < C (h) ⱕ P(X ∈ B(x, h)) ⱕ C  (h) < ∞.


1
Let (zn )n∈N∗ be a sequence of real variables; we say that zn converges almost completely (a.co.) to zero if and only if, ∀ > 0, ∞ n=1 P(|zn | > ) < ∞. Moreover,

let (un )n∈N∗ be a sequence of positive real numbers; we say that zn = O(un ) a.co. if and only if ∃ > 0, ∞n=1 P(|z n | > u n ) < ∞. This kind of convergence implies both
almost sure convergence and convergence in probability.
F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352 337

We can say that the first contribution of the topological structure of the functional space can be viewed through the function
 controlling the concentration of the measure of probability of the functional variable on a small ball. Moreover, for the uniform
consistency, where the main tool is to cover a subset SF with a finite number of balls, one introduces an other topological concept
defined as follows:

Definition 1. Let S be a subset of a semi-metric space F, and let  > 0 be given. A finite set of points x1 , x2 , . . . , xN in F is called

an -net for S if S ⊂ Nk=1 B(xk , ). The quantity S () = log(N (S)), where N (S) is the minimal number of open balls in F of radius
 which is necessary to cover S, is called Kolmogorov's -entropy of the set S.

This concept was introduced by Kolmogorov in the mid-1950s (see, Kolmogorov and Tikhomirov, 1959) and it represents a
measure of the complexity of a set, in sense that, high entropy means that much information is needed to describe an element
with an accuracy . Therefore, the choice of the topological structure (with other words, the choice of the semi-metric) will
play a crucial role when one is looking at uniform (over some subset SF of F) asymptotic results. More precisely, we will see
thereafter that a good semi-metric can increase the concentration of the probability measure of the functional variable X as well
as minimize the -entropy of the subset SF . In an earlier contribution (see, Ferraty et al., 2006) we highlighted the phenomenon of
concentration of the probability measure of the functional variable by computing the small ball probabilities in various standard
situations. We will devote Section 2.2 to discuss the behaviour of Kolmogorov's -entropy in these standard situations. Finally,
we invite the readers interested in these two concepts (entropy and small ball probabilities) or/and the use of Kolmogorov's
-entropy in dimensionality reduction problems to refer to, respectively, Kuelbs and Li (1993) or/and Theodoros and Yannis
(1997).

2.2. Same examples

We will start (Example 1) by recalling how this notion behaves in un-functional case (that is when F = RP ). Then,
Examples 2 and 3 are covering special cases of functional process. More interestingly (from statistical point of view) is Ex-
ample 4 since it allows to construct, in any case, a semi-metric with reasonably “small” entropy.

Example 1 (Compact subset in finite dimensional space). A standard theorem of topology guaranties that for each compact subset
S of Rp and for each  > 0 there is a finite -net and we have for any  > 0,
 
1
S () ⱕ Cp log .


More precisely, Chate and Courbage (1997) have shown that, for any  > 0 the regular polyhedron in Rp with length r can be

covered by ([2r p/ ] + 1)p balls, where [m] is the largest integer which is less than or equal to m. Thus, Kolmogorov's -entropy
of a polyhedron Pr in Rp with length r is
 √  
2r p
∀ > 0, Pr () ∼ p log +1 .


Example 2 (Closed ball in a Sobolev space). Kolmogorov and Tikhomirov (1959) obtained many upper and lower bounds for the
-entropy of several functional subsets. A typical result is given for the class of functions f (t) on T = [0, 2p) with periodic boundary
conditions and
 2  2
1 1 2
f 2 (t) dt + f (m) (t) dt ⱕ r.
2 0 2 0

The -entropy of this class, denoted W2m (r), is

 1/m
r
W2m (r) () ⱕ C .


Example 3 (Unit ball of the Cameron–Martin space). Recently, van der Vaart and van Zanten (2007) characterized the Cameron–Martin
space associated to a Gaussian process viewed as map in C[0, 1] with the spectral measure  satisfying

exp(||)(d) < ∞

by
 
H = t : Re e−it h() d() : h ∈ L2 () ,
338 F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

and they show that Kolmogorov's -entropy of the unit ball BCMW of this space with respect to the supremum norm · ∞ is
  2
1
BCMW () ∼ log as  → 0.
· ∞ 

Example 4 (Compact subset in a Hilbert space with a projection semi-metric). The projection-based semi-metrics are constructed in
the following way. Assume that H is a separable Hilbert space, with inner product ., . and with orthonormal basis {e1 , . . . , ej , . . .},
and let k be a fixed integer, k > 0. As shown in Lemma 13.6 of Ferraty and Vieu (2006), a semi-metric dk on H can be defined as
follows:


k
dk (x, x ) = x − x , ej 2 . (1)
j=1

Let be the operator defined from H into Rk by

(x) = ( x, e1 , . . . , x, ek ),

and let deucl be the Euclidian distance on Rk , and let us denote by Beucl (., .) an open ball of Rk for the associated topology. Similarly,
let us note by Bk (., .) an open ball of H for the semi-metric dk . Because is a continuous map from (H, dk ) into (Rk , deucl ), we
have that for any compact subset S of (H, dk ), (S) is a compact subset of Rk . Therefore, for each  > 0 we can cover (S) with
balls of centers zi ∈ Rk :


d
k
(S) ⊂ Beucl (zi , r) with dr = C for some C > 0. (2)
i=1

For i = 1, . . . , d, let xi be an element of H such that (xi ) = zi . The solution of the equation (x) = zi is not unique in general, but
just take xi to be one of these solutions. Because of (1), we have that
−1
(Beucl (zi , r)) = Bk (xi , r). (3)

Finally, (2) and (3) are enough to show that Kolmogorov's -entropy of S is
 
1
S () ≈ Ck log .


3. Estimation of the regression function

In this section, we consider the problem of the estimation of a generalized regression function defined as follows:

m (x) = E[ (Y)|X = x], ∀x ∈ F, (4)

where is a known real-valued Borel function. Model (4) has been widely studied, when the explicative variable X is real and
(Y) = Y while Deheuvels and Mason (2004) provide recent advances for general function . This model covers and includes
many important nonparametric models such as the classical regression function, the conditional distribution, etc.
 (x)
From the kernel estimator of the classical regression function (see, Ferraty and Vieu, 2006), we propose the estimate of m
of m (x) defined as
n
K(h−1
K d(x, Xi )) (Yi )
 (x) = i=1
m n −1
, ∀x ∈ F,
i=1 K(h K d(x, Xi ))

where K is a kernel function and hK = hK,n is a sequence of positive real numbers which goes to zero as n goes to infinity.
Our aim is to establish the uniform almost complete convergence of m  on some subset SF of F. To do that we denote by C
or/and C  some real generic constants supposed strictly positive and we assume that:

b
(H2) There exists b > 0 such that ∀x1 , x2 ∈ SF , |m (x1 ) − m (x2 )| ⱕ Cd (x1 , x2 ).
(H3) ∀m ⱖ 2, E(| (Y)|m |X = x) < m (x) < C < ∞ with m (·) continuous on SF .
(H4) K is a bounded and Lipschitz kernel on its support [0,1), and if K(1) = 0, the kernel K has to fulfill the additional condition
−∞ < C < K  (t) < C  < 0.
(H5) The functions  and S are such that:
F

(H5a) ∃C > 0, ∃ 0 > 0, ∀ < 0,  (
) < C, and if K(1) = 0, the function (·) has to fulfill the additional condition:

∃C > 0, ∃ 0 > 0, ∀0 < < 0 , (u) du > C ( ).
0
F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352 339

(H5b) For n large enough, (log n)2 /n(hK ) < S (log n/n) < n(hK )/log n.
F
(H6) Kolmogorov's -entropy of SF satisfies

  
log n
exp (1 − )S < ∞ for some > 1.
F n
n=1

Conditions (H2)–(H4) are very standard in the nonparametric setting. Concerning (H5a), the boundness of the derivative of 
around zero allows to consider  as a Lipschitzian function. In addition, from a theoretical point of view, one has to separate
the case when K(·) is a continuous kernel (i.e. K(1) = 0) and the case when K(·) is not continuous (which contains, for instance,
the uniform kernel). The case when K(1) = 0 is more delicate and one has to introduce an additional assumption acting on the
behaviour of  around zero (see the proof in Appendix). Hypothesis (H5b) deals with topological considerations by controlling
the entropy of SF . For a radius not too large, one requires that S (log n/n) is not too small and not too large. Moreover,
F
(H5b) implies that S (log n/n)/(n(hK )) tends to 0 when n tends to +∞. As remarked in Section 2, in some “usual” cases, one
F
has S (log n/n) ∼ C log n and (H5b) is satisfied as soon as (log n)2 = O(n (hK )). In a different way, Assumption (H6) acts on
F
Kolmogorov's -entropy of SF . However, if one considers the same particular case as previously, it is easy to see that (H6) is
verified as soon as > 2.

Theorem 2. Under hypotheses (H1)–(H6), we have


⎛  ⎞
log n
⎜ SF ⎟
 (x) − m (x)| = O(hbK ) + Oa.co. ⎜ ⎟
n
sup |m ⎜ ⎟. (5)
x∈SF ⎝ n(hK ) ⎠

4. Conditional cumulative distribution estimation

In this section, we assume that the regular version of the conditional probability of Y given X exists and we study the
uniform almost complete convergence of a kernel estimator of the conditional cumulative distribution function, denoted by F x . A
straightforward way to estimate the function F x is to treat this function as particular case of m with (t) = I]−∞,y] (t) (for y ∈ R).
Thus, we estimate F x by


n

F x (y) = Wni (x)I{Yi ⱕ y} , ∀y ∈ R, ∀x ∈ F,
i=1

where

K(h−1 d(x, Xi ))
Wni (x) = n K −1 .
i=1 K(hK d(x, Xi ))

The estimate of the conditional cumulative distribution function has been investigated, in the real case, by several authors
(see, Roussas, 1969; Samanta, 1989, among others). In the functional case, Ferraty et al. (2006) established the almost complete
convergence of a double kernel estimator of the conditional cumulative distribution function.
Clearly, the previous result stated in Section 3 allows to conclude the almost complete convergence of  F x uniformly in the
functional argument x. Indeed, it suffices to apply Theorem 2 to get:

Corollary 3. Under hypotheses (H1), (H2) and (H4)–(H6), we have


⎛  ⎞
log n
⎜ SF ⎟
⎜ n ⎟
sup |
F x (y) − F x (y)| = O(hbK ) + Oa.co. ⎜ ⎟.
x∈SF ⎝ n(hK ) ⎠

But, in order to derive the uniform consistency on both (functional and real arguments), we fix a compact subset SR of R and
we consider the following additional assumptions:

(H2 ) ∀(y1 , y2 ) ∈ SR × SR , ∀(x1 , x2 ) ∈ SF × SF , |F x1 (y1 ) − F x2 (y2 )| ⱕ C(d(x1 , x2 )b1 + |y1 − y2 |b2 ).


(H6 ) Kolmogorov's -entropy of SF satisfies

  
log n
n1/2b2 exp (1 − )S < ∞ for some > 1.
F n
n=1
340 F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

Theorem 4. Under hypotheses (H1), (H2 ), (H4), (H5) and (H6 ), we have
⎛  ⎞
log n
⎜ SF ⎟
⎜ n ⎟
sup sup |
b
F x (y) − F x (y)| = O(hK1 ) + Oa.co. ⎜ ⎟. (6)
x∈SF y∈SR ⎝ n(hK ) ⎠

5. Conditional density estimation

In this section, similar results will be derived for the kernel estimator of the conditional density of Y given X. We assume that
the conditional probability of Y given X is absolutely continuous with respect to the Lebesgues measure on R and we will denote
by f x the conditional density of Y given X = x. We define the kernel estimator fx of f x as follows:
n
h−1 −1 −1
i=1 K(hK d(x, Xi ))H(hH (y − Yi ))
fx (y) = H n −1
, ∀y ∈ R, ∀x ∈ F, (7)
i=1 K(hK d(x, Xi ))

where the functions K and H are kernels and hK = hK,n (resp. hH = hH,n ) is a sequence of positive real numbers. Note that a similar
estimate was already introduced in the special case when X is a real random variable by Rosenblatt (1969) and by Youndjé (1996)
among other authors.
In order to establish the uniform almost complete convergence of this estimator, we consider the following additional
assumptions:

(H2 ) ∀(y1 , y2 ) ∈ SR × SR , ∀(x1 , x2 ) ∈ SF × SF , |f x1 (y1 ) − f x2 (y2 )| ⱕ C(db1 (x1 , x2 ) + |y1 − y2 |b2 ).


(H5b ) For some ∈ (0, 1), limn→+∞ n hH = ∞, and for n large enough:
 
(log n)2 log n n1− (hK )
< SF < .
n1− (hK ) n log n

(H6 ) Kolmogorov's -entropy of SF satisfies


  
+1)/2 log n
n(3 exp (1 − )S < ∞ for some > 1.
F n
n=1

 
(H7) H is bounded Lipschitzian continuous function, such that |t|b2 H(t) dt < ∞ and H2 (t) dt < ∞.

Theorem 5. Under hypotheses (H1), (H2 ), (H4), (H5a), (H5b ), (H6 ) and (H7), we have
⎛  ⎞
log n
⎜ SF ⎟
⎜ n ⎟
sup sup |
x b b
f (y) − f x (y)| = O(hK1 ) + O(hH2 ) + Oa.co. ⎜ ⎟. (8)
x∈SF y∈SR ⎝ n1− (hK ) ⎠

6. Some direct consequences

6.1. Estimation of the conditional hazard function

This section is devoted to the almost complete convergence of the kernel estimator of the conditional hazard function of
Y given X uniformly on fixed subset SF × SR of F × R. We return to Ferraty et al. (2008) for the pointwise almost complete
convergence of this model, in the functional case. Recall that the conditional hazard function is defined by

f x (y)
hx (y) = , ∀y, F x (y) < 1, ∀x ∈ F.
1 − F x (y)

Naturally, the conditional hazard function estimator is closely linked to the conditional survival function estimate. Consider the
kernel estimates of the functions F x and f x defined in the previous section, and adopt the kernel estimator hx (y) of the conditional
hazard function hx defined by


f x (y)
hx (y) = .
1 − F x (y)
F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352 341

In addition to the previous assumptions, used to establish the convergence rates of (6) and (8), the following is needed:

(H8) ∃1 > 0, 2 > 0 such that ∀x ∈ SF , ∀y ∈ SR , F x (y) < 1 < 1 and f x (y) ⱕ 2 .

Theorem 6. Under the hypotheses of Theorem 5 and if (H6 ) and (H8) hold, then
⎛  ⎞
log n
⎜ SF ⎟
⎜ n ⎟
sup sup | b b
hx (y) − hx (y)| = O(hK1 ) + O(hH2 ) + Oa.co. ⎜ ⎟. (9)
x∈SF y∈SR ⎝ n1− (hK ) ⎠

6.2. Conditional mode estimation

Let us now study the almost complete convergence of the kernel estimator of the conditional mode of Y given X = x, denoted by
(x), uniformly on fixed compact subset SF of F. For this, we assume that (x) satisfies on SF the following uniform uniqueness
property (see, Ould-Sa¨d and Cai, 2005, for the multivariate case):

(H9) ∀0 > 0 ∃ > 0, ∀r : S → SR ,

sup |(x) − r(x)| ⱖ 0 ⇒ sup |f x (r(x)) − f x ((x))| ⱖ .


x∈SF x∈SF

Moreover, we suppose also that there exists some integer j > 1 such that ∀x ∈ SF the function f x is j-times continuously
differentiable with respect to y on SR and

(H10) f x(l) ((x)) = 0 if 1 ⱕ l < j and f x(j) (·) is uniformly continuous on SR such that |f x(j) ((x))| > C > 0,

where f x(j) is the jth order derivative of the conditional density f x .


We estimate the conditional mode (x) with a random variable  (x) such that


(x) = arg sup 
x
f (y).
y∈SR

From Theorem 5 we derive the following corollary.

Corollary 7. Under the hypotheses of Theorem 5 and if the conditional density f x satisfies (H9) and (H10), we have
⎛  ⎞
  log n
⎜ SF ⎟
⎜ n ⎟
sup |
b b
(x) − (x)|j = O(hK1 ) + O(hH2 ) + Oa.co. ⎜ ⎟.
x∈SF ⎝ n1− (hK ) ⎠

7. Comments

7.1. Impact of the results

This paper has stated uniform consistency results in functional setting. They are not only nice extensions of pointwise results
but they have great impacts (both from theoretical and practical point of view). First of all, the natural practical interest of getting
uniform consistency is for prediction. Look for instance at Theorem 2. The fact to be able to state results on the quantity

 (x) − m (x)|
sup |m
x∈SF

allows directly to obtain result on quantity

 (X) − m (X)|,
|m

where X is a new random functional element valued in SF . The same kind of remark can be done for the other problems treated
in Theorems 8, 11 and 14 (i.e. conditional cumulative distribution, conditional density and conditional hazard function). More
generally, as in multivariate statistics, it can be useful for estimating the solution of general equations with applications for
detecting peaks, valleys, change points (see, for instance, Boularan et al., 1995). Secondly, and this is maybe the main point,
342 F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

uniform consistency results are indispensable tools for the study of more sophisticated models in which multi-stage procedures
are involved. This occurs in a very wide scope of situation in standard multivariate nonparametric analysis. In functional setting,
the needs of uniform consistency results have been recently pointed out in some multi-stage models such as additive modelling
(Ferraty and Vieu, 2009), partial functional linear model (Aneiros Perez and Vieu, 2006), and single functional models (Ait Sa¨di
et al., 2009). Other functional data methodologies are also using such kind of uniform tools, such as for instance data-driven
bandwidth choice (Benhenni et al., 2007), or bootstrapping (Ferraty et al., 2008b). The scope of functional applications of our
theoretical uniform results will increase in the near future following the progress of FDA.

7.2. General comments on the hypotheses

In addition to the comments given in Section 3, we go back to complete this discussion by comparing the structural assumptions
of the uniform convergence to those of the pointwise ones studied by Ferraty and Vieu (2006).
On the functional variable: Unlike the pointwise case, the uniform consistency requires a concentration property of the
probability measure uniformly over SF (see, (H1)). So, it is important to give here general situations when such an assumption
is fulfilled. From a probabilistic point of view, this can be done by introducing the Onsager–Machlup function (see, Onsager and
Machlup, 1953) defined as
 
P(B(x, h))
∀(x, z) ∈ SF , FX (x, z) = log lim .
h→0 P(B(z, h))

Then, (H1) is verified if the Onsager–Machlup function of the probability measure of the functional variable is such that

∀x ∈ SF , |FX (x, 0)| ⱕ C < ∞. (10)

The Onsager–Machlup function has been intensively studied in the literature as well as the quantities P(X ∈ B(0, h)). Their
respective explicit expression for several continuous time processes can be found in Bogachev (p. 186, 1999), which produces
thereby examples of subsets and functional variables that satisfy (H1). This pure probabilistic point of view focuses on small
ball probabilities with standard topologies. But, from a statistical point of view, the practitioner can choose the semi-metric. In
particular, Example 4 in Section 2 gives an interesting family of semi-metrics allowing to fulfill (H1) for a large set of functional
variables (see, Ferraty and Vieu, 2006, Lemma 13.6). In fact, as a statistician, an important task consists in building a semi-metric
adapted to the functional variable. Here, the word “adapted” can mean that this semi-metric allows to satisfy (H1) but other
properties for the semi-metric can be required and this issue will be certainly investigated in further works.
On the regularity constraints of the model: Regularity-type conditions on the functional objects to be estimated are given via
assumptions (H2), (H2 ), (H2 ). In comparison with the pointwise case (see, Ferraty et al., 2006), we do not assume that the
constants depend on the conditioning point. Moreover, these assumptions are sufficient but not necessary. For instance, one can
replace (H2) by a new one thanks to the function

Lx (z) = E[ (Y) − m (x)|X = z].

Consider F as a vector semi-normed space and let us assume that it exists a linear operator Ax such that

Lx (z) = Ax (z − x) + o( x − z ),

where Ax is bounded uniformly over SF (which amounts to assume that Lx is differentiable since Lx (x) = 0). From an asymptotic
point of view, the only change is appearing in the bias. So, by using the first-order expansion of Lx , we can get also under this
alternative assumption that

sup |Eg (x) − m (x)| = O(h).


x∈SF

As a conclusion, there are several ways to introduce regularity constraints in the functional nonparametric models. The alternative
assumption used here preserves the same rate of convergence. However, one has to keep in mind that considering other
smoothness conditions on the model can lead to different convergence rates for the bias.

7.3. Comments on convergence rates

It is well known that in finite dimensional case (that is F = Rp ), the uniform rates of convergence (over compact sets) are the
same as the pointwise ones. The main point of this paper is to show how this is not so obvious in functional settings. To fix ideas,
let us just look at the regression case (the other ones could be discussed similarly).
For fixed point x ∈ F, the pointwise result (see, e.g. Ferraty and Vieu, 2006, Theorem 6.11) is stated (for ≡ Identity) as
 
log n
 (x) − m (x) = O(hbK ) + O
m a.co.,
n(hK )
F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352 343

while in Theorem 2 of this paper we have stated result like


⎛  ⎞
log n
⎜ SF ⎟
 (x) − m (x)| = O(hbK ) + O ⎜ ⎟
n
sup |m ⎜ ⎟ a.co.
x∈SF ⎝ n(hK ) ⎠

To see how the uniform point of view leads to a deterioration of the rates of convergence, one may look to the special examples
discussed in Section 2.2. For instance, looking at Example 3, if we have standard Gaussian process with usual metric topology the
loss is of log n since we have that
 
log n
SF = O((log n)2 ).
n

However, if we look at Example 4, we can see the interest of a semi-metric modelling. Indeed, by suitable projection semi-metric,
one arrives at entropy function satisfying
 
log n
SF = O(log n).
n

So, such a new topological choice (as described in Example 4) allows to avoid for the deterioration of the rates of convergence.

Acknowledgements

The authors would like to thank both referees whose comments and suggestions have improved significantly the presentation
of this work. All the participants of the working group STAPH on Functional and Operatorial Statistics in Toulouse are also gratified
for their continuous supports and comments.

Appendix A. Proofs

In the following, we will denote, for all i = 1, . . . , n, by

Ki (x) = K(h−1
K d(x, Xi )) and Hi (y) = H(h−1
H (y − Yi )).

First of all, according to (H1) and (H4), it is clear that if K(1) > C > 0,

∀x ∈ SF , ∃0 < C < C  < ∞, C (hK ) < E[K1 (x)] < C  (hK ). (11)

In the situation when K(1) = 0, the combination of (H1) and (H5a) allows to get the same result (see, Ferraty and Vieu, 2006, p.
44, Lemma 4.4). From now on, in order to simplify the notation, we set  = log n/n.

Proof of Theorem 2. We consider the decomposition:

1 1 m (x)
 (x) − m (x) =
m [g (x) − Eg (x)] + [Eg (x) − m (x)] + [1 − 
f (x)] , (12)

f (x) 
f (x) f (x)

where

1 
n

f (x) = K(h−1
K d(x, Xi ))
nE[K(h−1
K d(x, X1 ))] i=1

and

1 
n
g (x) = K(h−1
K d(x, Xi )) (Yi ).
nE[K(h−1
K d(x, X1 ))] i=1

Therefore, Theorem 2 is a consequence of the following intermediate results. 

Lemma 8. Under hypotheses (H1) and (H4)–(H6), we have


⎛  ⎞
log n
⎜ SF ⎟
⎜ n ⎟
sup |
f (x) − 1| = Oa.co. ⎜ ⎟.
x∈SF ⎝ n(hK ) ⎠
344 F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

Corollary 9. Under the hypotheses of Lemma 8, we have


 
 1
P inf f (x) < < ∞.
x∈SF 2
n=1

Lemma 10. Under hypotheses (H1), (H2) and (H4)–(H6), we have

sup |Eg (x) − m (x)| = O(hbK ).


x∈SF

Lemma 11. Under the assumptions of Theorem 2, we have


⎛  ⎞
log n
⎜ SF ⎟
⎜ n ⎟
sup |g (x) − Eg (x)| = Oa.co. ⎜ ⎟.
x∈SF ⎝ n(hK ) ⎠

Proof of Lemma 8. Let x1 , . . . , xN (SF ) be an -net for SF (see, Definition 1) and for all x ∈ SF , one sets k(x) = argmink∈{1,2, ...,N (SF )}
d(x, xk ). One considers the following decomposition:

sup |
f (x) − E
f (x)| ⱕ sup |
f (x) − 
f (xk(x) )| + sup |
f (xk(x) ) − E
f (xk(x) )| + sup |E
f (xk(x) ) − E
f (x)| .
x∈SF x∈SF x∈SF x∈SF
 ! "  ! "  ! "
F1 F2 F3

• Let us study F1 . By using (11) and the boundness of K, one can write
# #
1  ## #
n
1 1 #
F1 ⱕ sup # Ki (x) − Ki (xk(x) )#
x∈SF n # E[K1 (x)] E[K1 (xk(x) )] #
i=1

1
n
C
ⱕ sup |Ki (x) − Ki (xk(x) )|1B(x,hK )∪B(xk(x) ,hK ) (Xi ).
(hK ) x∈SF n
i=1

Let us first consider the case K(1) = 0. Because K is Lipschitz on [0,1] in this case, it comes

C
n

F1 ⱕ sup Zi with Zi = 1B(x,hK )∪B(xk(x) ,hK ) (Xi ),
x∈SF n hK (hK )
i=1

with, uniformly on x,

     
  2
Z1 = O , EZ 1 = O and Var(Z1 ) = O .
hK (hK ) hK hk (hK )
2

A standard inequality for sums of bounded random variables (see, Ferraty and Vieu, 2006, Corollary A.9) with (H5b) allows
to get

    
  log n
F1 = O + Oa.co. ,
hK hK n(hK )

and it suffices to combine (H5a) and (H5b) to get


⎛ ⎞
SF ()
F1 = Oa.co. ⎝ ⎠.
n(hK )

Now, let K(1) > C > 0. In this situation K is Lipschitz on [0,1). One has to decompose F1 into three terms as follows:

F1 ⱕ C sup (F11 + F12 + F13 ),


x∈SF
F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352 345

with
1 
n
F11 = |Ki (x) − Ki (xk(x) )|1B(x,hK )∩B(xk(x) ,hK ) (Xi ),
n(hK )
i=1

1 
n
F12 = Ki (x)1B(x,h )∩B(x ,h ) (Xi ),
n(hK ) K k(x) K
i=1

1 
n
F13 = Ki (xk(x) )1B(x,h )∩B(x ,h ) (Xi ).
n(hK ) K k(x) K
i=1

One can follow the same steps (i.e. case K(1) = 0) for studying F11 and one gets the same result:
⎛ ⎞
SF ()
F11 = Oa.co. ⎝ ⎠.
n(hK )

Following same ideas for studying F12 , one can write

C
n
1
F12 ⱕ Wi with Wi = 1B(x,hK )∩B(x (Xi ),
n (hK ) k(x) ,hK )
i=1

and by using again (H5a) and the same inequality for sums of bounded random variables, one has
   
  log n
F12 = O + Oa.co. .
(hK ) n(hK )2
Similarly, one can state the same rate of convergence for F13 . To end the study of F1 , it suffices to put together all the
intermediate results and to use again (H5b) for getting
⎛ ⎞
SF ()
F1 = Oa.co. ⎝ ⎠.
n(hK )

• Now, concerning F2 , we have, for all > 0,


⎛  ⎞ ⎛  ⎞
SF () SF ()
P ⎝F2 > ⎠ = P⎝ max |
f (xk(x) ) − Ef (xk(x) )| > ⎠
n(hK ) k∈{1, ...,N (SF )} n(hK )
⎛  ⎞
SF ()
ⱕ N (SF ) max P ⎝| f (xk ) − Ef (xk )| > ⎠.
k∈{1, ...,N (SF )} n(hK )

Let
1
ki = (K (x ) − E[Ki (xk )]).
E[K1 (xk )] i k
We show, under (H1) and (H4), that ∀k = 1, . . . , N (SF ), ∀i = 1, . . . , n,

ki = O((hK )−1 ) and also Var(ki ) = O((hK )−1 ).


So, one can apply the Bernstein-type inequality (see, Ferraty and Vieu, 2006, Corollary A.9) which gives directly
⎛  ⎞ ⎛ # #  ⎞
# #
SF () 1 # n # SF ()
 
P ⎝|f (xk ) − Ef (xk )| > ⎠ = P⎝ # ki ## > ⎠
n(hK ) n ## # n(hK )
i=1

ⱕ 2 exp{−C 2 SF ()}.

Thus, by using the fact that S () = log N (SF ) and by choosing such that C 2 = , we have
F
⎛  ⎞
SF ()
N (SF ) max P ⎝|
f (xk ) − E
f (xk )| > ⎠ ⱕ C  N (SF )1− . (13)
k∈{1, ...,N (SF )} n(hK )

∞ 1−
Because n=1 N (SF ) < ∞, we obtain that
⎛ ⎞
SF ()
F2 = Oa.co. ⎝ ⎠.
n(hK )
346 F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

• For F3 , it is clear that F3 ⱕ E(supx∈SF |


f (x) −
f (xk(x) )|) and by following a similar proof to the one used for studying F1 , it comes
⎛ ⎞
SF ()
F3 = O ⎝ ⎠. 
n(hK )

Proof of Corollary 9. It is easy to see that

1 1 1
inf |
f (x)| ⱕ ⇒ ∃x ∈ SF such that 1 − 
f (x) ⱖ ⇒ sup |1 − 
f (x)| ⱖ .
x∈SF 2 2 x∈SF 2

We deduce from Lemma 8 that


   
 1  1
P inf |f (x)| ⱕ ⱕ P sup |1 − f (x)| > .
x∈SF 2 x∈SF 2

Consequently,

 
 1
P inf |
f (x)| < < ∞. 
x∈SF 2
n=1

Proof of Lemma 10. One has


# ⎡ ⎤ #
# n #
# 1 #
|Eg (x) − m (x)| = ## E⎣ Ki (x) (Yi )⎦ − m (x)##
# nE[K1 (x)] i=1 #
# #
# 1 #
ⱕ ## E[K1 (x) (Y1 )] − m (x)##
E[K1 (x)]
1
ⱕ [E[K1 (x)|m (X1 ) − m (x)|]].
E[K1 (x)]

Hence, we get

1
∀x ∈ SF , |Eg (x) − m (x)| ⱕ [EK 1 (x)|m (X1 ) − m (x)|].
E[K1 (x)]

Thus, with hypotheses (H1), (H2) and (11) we have

1 b
∀x ∈ SF , |Eg (x) − m (x)| ⱕ C [EK 1 (x)1B(x,hK ) (X1 )db (X1 , x)] ⱕ ChK ,
E[K1 (x)]

this last inequality yields the proof, since C does not depend on x. 

Proof of Lemma 11. This proof follows the same steps as the proof of Lemma 8. For this, we keep these notations and we use
the following decomposition:

sup |g (x) − Eg (x)| ⱕ sup |g (x) − g (xk(x) )| + sup |g (xk(x) ) − Eg (xk(x) )| + sup |Eg (xk(x) ) − Eg (x)| .
x∈SF x∈SF x∈SF x∈SF
 ! "  ! "  ! "
G1 G2 G3

Condition (H1) and result (11) allow to write directly, for G1 and G3 :
# #
# n #
# 1 1 #
G1 = sup ## Ki (x) (Yi ) − Ki (xk(x) ) (Yi )##
x∈SF # nE[K1 (x)] nE[K1 (xk(x) )] #
i=1

1 
n
ⱕ sup | (Yi )||Ki (x) − Ki (xk(x) )|1B(x,hK )∪B(xk(x) ,hK ) .
x∈SF n(hK )
i=1

Now, as for F1 , one considers K(1) = 0 (i.e. K Lipschitz on [0,1]) and one gets

C
n
 (Yi )
G1 ⱕ Zi with Zi = sup 1 .
n hK (hK ) x∈SF B(x,hK )∪B(xk(x) ,hK )
i=1
F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352 347

The main difference with the study of F1 is that one uses here an exponential inequality for unbounded variables. Note that one
has

E[| m (Y)|] = E[E[| (Y)|m |X]] = m (x) dP X < C < ∞,

which implies that


C m
E(|Z1 |m ) ⱕ .
K (hK )
m−1
hm
(
So, by applying Corollary A.8 in Ferraty and Vieu (2006) with a2 = /hK (hK ), one gets G1 = Oa.co. (  log n/nhK (hK )).
Now, (H5b) allows to get
⎛ ⎞
SF ()
G1 = Oa.co. ⎝ ⎠. (14)
n(hK )

If one considers the case K(1) > C > 0 (i.e. K Lipschitz on [0,1)), one has to split G1 into three terms as for F1 and by using similar
arguments, one can state the same rate of almost complete convergence. Similar steps allow to get
⎛ ⎞
SF ()
G3 = O ⎝ ⎠. (15)
n(hK )

For G2 , similarly to the proof of Lemma 8, we have, ∀ > 0,


⎛  ⎞ ⎛  ⎞
SF () SF ()
P ⎝G2 > ⎠=P⎝ max |g (xk ) − Eg (xk )| > ⎠
n(hK ) k∈{1, ...,N (SF )} n(hK )
⎛  ⎞
SF ()
ⱕ N (SF ) max P ⎝|g (xk ) − Eg (xk )| > ⎠.
k∈{1, ...,N (SF )} n(hK )

The rest of the proof is based on the exponential inequality given by Corollary A.8.ii in Ferraty and Vieu (2006).
Indeed, let
1
ki = [K (x ) (Yi ) − E[Ki (xk ) (Yi )]].
E[K1 (xk )] i k
The same arguments as those invoked for proving Lemma 6.3 in Ferraty and Vieu (2006, p. 65) can be used to show that
E|ki |m = O((hK )−m+1 ) which gives by applying the exponential inequality mentioned above, for all > 0

SF () 2
P(|g (xk ) − Eg (xk )| > ) ⱕ 2N (SF )−C .
n(hK )
Therefore, by suitable choice of > 0, we have
  
log N (SF )
N (SF ) max P |g (xk ) − Eg (xk )| > ⱕ C  N (SF )1− .
k∈{1, ...,N (SF )} n(hK )
∞ 1−
As n=1 N (SF ) < ∞, we obtain that
⎛ ⎞
SF ()
G2 = Oa.co. ⎝ ⎠. (16)
n(hK )

Now, Lemma 11 can be easily deduced from (14)–(16). 

Proof of Theorem 4. Similarly to (12), we have


1 x F x (y) 

F x (y) − F x (y) = [(F N (y) − E
F xN (y)) − (F x (y) − E
F xN (y))] + [Ef (x) − 
f (x)], (17)

f (x) f (x)
where

1 
n
 K(h−1
F xN (y) = K d(x, Xi ))1{Yi ⱕ y} .
nE[K(h−1
K d(x, X1 ))] i=1

Then, Theorem 4 can be deduced from the following intermediate results, together with Lemma 8 and Corollary 9. 
348 F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

Lemma 12. Under hypotheses (H2 ) and (H4), one has

sup sup |F x (y) − E


b
F xN (y)| = O(hK1 ).
x∈SF y∈SR

Lemma 13. Under the assumptions of Theorem 4, we have


⎛  ⎞
log n
⎜ SF ⎟
⎜ n ⎟
sup sup |
F xN (y) − E
F xN (y)| = Oa.co. ⎜ ⎟.
x∈SF y∈SR ⎝ n(hK ) ⎠

Proof of Lemma 12. It is clear that (H4) implies ∀(x, y) ∈ SF × SR ,

1
E[
FNx (y)] − F x (y) = E[(K1 (x)1B(x,hK ) (X1 ))(F X1 (y) − F x (y))]. (18)
E[K1 (x)]

The Lipschitz condition (H2 ) allows us to write that

∀(x, y) ∈ SF × SR , 1B(x,hK ) (X1 )|F X1 (y) − F x (y)| ⱕ ChbK1 ,

then,

|E[
b
∀(x, y) ∈ SF × SR , F xN (y)] − F x (y)| ⱕ ChK1 . 

Proof of Lemma 13. We keep the notation of the Lemma 8 and we use the compactness of SR , we can write that, for some,
t1 , t2 , . . . , tzn ∈ SR ,


zn
SR ⊂ (tj − ln , tj + ln ),
j=1

with ln = n−1/2b2 and zn ⱕ n1/2b2 . Taking

j(y) = arg min |y − tj |.


j∈{1,2, ...,zn }

Thus, we have the following decomposition:


x x x
sup sup |
F xN (y) − E
F xN (y)| ⱕ sup sup |
F xN (y) − 
F Nk(x) (y)| + sup sup |
F Nk(x) (y) − E
F Nk(x) (y)|
x∈SF y∈SR x∈SF y∈SR x∈SF y∈SR
 ! "  ! "
F1 F2
x
+ sup sup |E
F Nk(x) (y) − E
F xN (y)| .
x∈SF y∈SR
 ! "
F3

Concerning (F1 ) and (F3 ), by following the same lines as for studying the terms F1 and F3 , it comes
⎛ ⎞ ⎛ ⎞
SF () SF ()
F1 = Oa.co. ⎝ ⎠ and F3 = O⎝ ⎠. (19)
n(hK ) n(hK )

Concerning (F2 ), the monotony of the functions E[


F xN (·)] and 
F xN (·) permits to write, for all j ⱕ zn and for all x ∈ SF ,
x x x
E
F Nk(x) (tj − ln ) ⱕ sup E
F Nk(x) (y) ⱕ E
F Nk(x) (tj + ln ),
y∈(tj −ln ,tj +ln )
x x x

F Nk(x) (tj − ln ) ⱕ sup 
F Nk(x) (y) ⱕ 
F Nk(x) (tj + ln ). (20)
y∈(tj −ln ,tj +ln )

Next, we use Hölder's condition on F x and we show that, for any y1 , y2 ∈ SR and for all x ∈ SF ,
# #
# 1 1 #
|E
F xN (y1 ) − E
F x (y2 )| = ## E[K1 (x)F X1 (y1 )] − E[K1 (x)F X1 (y2 )]##
E[K1 (x)] E[K1 (x)]
ⱕ C|y1 − y2 |b2 . (21)
F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352 349

Then we get by (20) and (21) and because ln = n−1/2b2 :


⎛ ⎞
S ()
F2 ⱕ F4 + O ⎝ F ⎠,
n(hK )

where
x x
F4 = max max max |
F Nk(x) (sj ) − E
F Nk(x) (sj )|.
k∈{1, ...,N (SF )} 1 ⱕ j ⱕ zn sj =tj −ln ,tj +ln

Thus, it remains to study F4 . By using similar arguments as those invoked for studying F2 and combined with (H6 ), one has
)
F4 = Oa.co. ( S ()/n(hK )), which implies that
F
⎛ ⎞
x x SF ()
sup sup |
FNk(x) (sj(y) ) − E
FNk(x) (sj(y) )| = Oa.co. ⎝ ⎠. (22)
x∈SF y∈SR n(hK )

So, Lemma 13 can be easily deduced from (19) and (22). 

Proof of Theorem 5. The proof is based on the following decomposition:

 1 x f x (y) 
[(f N (y) − E
f N (y)) − (f x (y) − E [Ef (x) − 
x x x
f (y) − f x (y) = f N (y))] + f (x)], (23)

f (x) f (x)

where

1 
n
 x
f N (y) = K(h−1 −1
K d(x, Xi ))H(hH (y − Yi )).
nhH E[K(h−1
K d(x, X1 ))] i=1

Theorem 5 can be deduced from the following intermediate results, together with Lemma 8 and Corollary 9. 

Lemma 14. Under hypotheses (H2 ), (H4) and (H7), we have

sup sup |f x (y) − E


x b b
f N (y)| = O(hK1 ) + O(hH2 ).
x∈SF y∈SR

Lemma 15. Under the assumptions of Theorem 5, we have


⎛  ⎞
log n
⎜ SF ⎟
⎜ n ⎟
sup sup |
f N (y) − E
x x
f N (y)| = Oa.co. ⎜ ⎟.
x∈SF y∈SR ⎝ n1− (hK ) ⎠

Proof of Lemma 14. One has


⎡ ⎤
1 
n
E E⎣ Ki (x)Hi (y)⎦ − f x (y)
x
f N (y) − f x (y) =
nhH EK 1 (x)
i=1
1
= E(K1 (x)[h−1 x
H E(H1 (y)|X1 ) − f (y)]).
EK 1 (x)

Moreover, by change of variable


   
1 y − z X1
h−1
H E(H 1 (y)|X 1 ) = H f (z) dz = H(t)f X1 (y − hH t) dt,
hH R hH R

we arrive at

|h−1
H E(H1 (y)|X1 ) − f (y)| ⱕ
x
H(t)|f X1 (y − hH t) − f x (y)| dt.
R

Finally, the use of (H2 ) implies that



|h−1
b b
H E(H1 (y)|X1 ) − f (y)| ⱕ C
x
H(t)(hK1 + |t|b2 hH2 ) dt.
R

This inequality is uniform on (x, y) in SF × SR and the use of (H7) states Lemma 14. 
350 F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

Proof of Lemma 15. Let us keep the definition of k(x) (resp. j(y)) as in Lemma 8 (resp. in Lemma 13). The compactness of SR
permits to write that


zn
SR ⊂ (tj − ln , tj + ln ),
j=1

+1/2
with ln = n−(3/2) −1/2 and zn ⱕ Cn(3/2) . We have the following decomposition:

|
f N (y) − E
f N (y)| ⱕ sup sup |
f N (y) − 
f Nk(x) (y)| + sup sup |
f Nk(x) (y) − 
f Nk(x) (tj(y) )| + sup sup |
f Nk(x) (tj(y) ) − E
x x x x x x x x
f Nk(x) (tj(y) )|
x∈SF y∈SR x∈SF y∈SR x∈SF y∈SR
 ! "  ! "  ! "
T1 T2 T3

+ sup sup |E


f Nk(x) (tj(y) ) − E
f Nk(x) (y)| + sup sup |E
f Nk(x) (y) − E
x x x x
f N (y)| .
x∈SF y∈SR x∈SF y∈SR
 ! "  ! "
T4 T5

Similarly to the study of the term F1 and by replacing (H5b) with (H5b ), it comes
⎛ ⎞ ⎛ ⎞
SF () SF ()
T1 = Oa.co. ⎝ ⎠ and T5 = O ⎝ ⎠. (24)
n1− (hK ) n1− (hK )

Concerning the term T2 , by using Lipschitz's condition on the kernel H, one can write

1 
n
|
f Nk(x) (y) − 
x x
f Nk(x) (tj(y) )| ⱕ C Ki (xk(x) )|Hi (y) − Hi (tj(y) )|
nhH (hK )
i=1

C
n
ⱕ Zi ,
n
i=1

where Zi = ln Ki (xk(x) )/h2H (hK ). Once again, a standard exponential inequality for a sum of bounded variables allow us to write
    
 ln ln log n
f Nk(x) (y) − 
x x
f Nk(x) (tj(y) ) = O + Oa.co. .
h2H h2H n(hK )

Now, the fact that limn→+∞ n hH = ∞ and ln = n−(3/2) −1/2 imply that
⎛ ⎞ ⎛ ⎞
SF () SF ()
T2 = Oa.co. ⎝ ⎠ and T4 = O ⎝ ⎠. (25)
n1− (hK ) n1− (hK )

By using analogous arguments as for Lemma 8, we can show for all > 0,
⎛  ⎞ ⎛  ⎞
SF () SF ()

P T3 > ⎠ =P ⎝ max max  xk  xk
|f N (tj ) − Ef N (tj )| > ⎠
nhH (hK ) j∈{1,2, ...,zn } k∈{1, ...,N (SF )} nhH (hK )
⎛  ⎞
SF ()
P ⎝|
f k (tj ) − E ⎠.
x x
ⱕ zn N (SF ) max max f k (tj )| >
j∈{1,2, ...,zn } k∈{1, ...,N (SF )}
N N nhH (hK )

Let
1
i = [Ki (xk )Hi (tj ) − E(Ki (xk )Hi (tj ))]
hH (hK )

and apply Bernstein exponential inequality (see, Ferraty and Vieu, 2006, Corollary A.9). For that, we must calculate the asymptotic
behavior of E|i | and E2i . Firstly, it follows from the fact that the kernels K and H are bounded that E|i | ⱕ C(hH (hK ))−1 . Secondly,
the use of the same analytic arguments as of Lemma 14 allows us to get

1
lim E[H12 (y)|X1 ] = f X1 H2 (t) dt,
n→∞ hH R

which implies that

C
E|i |2 ⱕ .
hH (hK )
F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352 351

Thus, we are now in position to apply the Bernstein exponential inequality and we get
⎛  ⎞
SF ()
⎝  xk  xk
∀ j ⱕ zn , P |f N (tj ) − Ef N (tj )| > ⎠
nhH (hK )

ⱕ 2 exp{−C 2 SF ()}.

Therefore, since zn = O(l−1


n ) = O(n
(3/2) +1/2 ), by choosing C 2 = one has
⎛  ⎞
 ()
P ⎝|
f Nk (tj ) − E
S 2
zn N (SF ) max max
x x
f Nk (tj )| > F ⎠ ⱕ C  zn N (SF )1−C .
j∈{1,2, ...,zn } k∈{1, ...,N (SF )} nhH (hK )

By using the fact that limn→+∞ n hH = ∞ and (H6 ), one obtains


⎛ ⎞
SF ()
T3 = Oa.co. ⎝ ⎠. (26)
n1− (hK )

So, Lemma 15 can be easily deduced from (24)–(26). 

Proof of Corollary 16. It is clear that


 *
x
inf inf |1 − F (y)| ⱕ 1 − sup sup F (y)
x
2
x∈SF y∈SR x∈SF y∈SR
 *
⇒ sup sup |
x
F (y) − F x (y)| ⱖ 1 − sup sup F (y) x
2,
x∈SF y∈SR x∈SF y∈SR

which implies that


  * 

P inf inf |1 − 
x
F (y)| ⱕ 1 − sup sup F x (y) 2
x∈SF y∈SR x∈SF y∈SR
n=1
  * 

sup sup |
x
ⱕ P F (y) − F x (y)| ⱖ 1 − sup sup F x (y) 2 < ∞.
n=1 x∈SF y∈SR x∈SF y∈SR

We deduce from Theorem 4 that


  * 

P inf inf |1 − 
F x (y)| ⱕ 1 − sup sup F x (y) 2 < ∞.
x∈SF y∈SR x∈SF y∈SR
n=1

This proof is achieved by taking  = (1 − supx∈SF supy∈SR F x (y))/2 which is strictly positive. 

Proof of Theorem 6. The proof is based on the same kind of decomposition as (23)
 
1 |f x (y)| x
| |
x x
h (y) − hx (y)| ⱕ f (y) − f x (y)| + |F (y) − F x (y)| .
|1 −  x
F (y)| |1 − F (y)|
x

Consequently, Theorem 6 is deduced from Theorems 4 and 5, and from the next result which is a consequence of Theorem 4. 

Corollary 16. Under the conditions of Theorem 6, we have



+ ,

∃ > 0 such that P inf inf |1 −  F x (y)| <  < ∞.
x∈SF y∈SR
n=1

Proof of Corollary 7. By a simple manipulation, we show that

|f x (
(x)) − f x ((x))| ⱕ 2 sup |
x
f (y) − f x (y)|. (27)
y∈S

We use the following Taylor expansion of the function f x :

1
f x (
(x)) = f x ((x)) + f x(j) ( (x))(

(x) − (x))j
j!
352 F. Ferraty et al. / Journal of Statistical Planning and Inference 140 (2010) 335 -- 352

for some  (x) between (x) and 



(x). Clearly, it follows by (H9), (27) and Theorem 5 that

sup |
(x) − (x)| → 0 a.co.
x∈SF

Moreover, by means of (H10), we obtain that



sup |f x(j) ( (x)) − f x(j) ((x))| → 0 a.co.
x∈SF

Hence, as for Corollary 9, we can get  > 0 such that



 

x(j) 
P inf f ( (x)) <  < ∞,
x∈SF
n=1

and we have

sup |
(x) − (x)|j ⱕ C sup sup |
x
f (y) − f x (y)| a.co.
x∈SF x∈SF y∈SR

By combining this result with Theorem 5, we obtain the claimed result. 

References

Ait Sa¨di, A., Ferraty, F., Kassa, R., Vieu, P., 2009. Cross-validated estimations in the single-functional index model. Statistics 42, 475–494.
Aneiros Perez, G., Vieu, P., 2006. Semi-functional partial linear regression. Statist. Probab. Lett. 76, 1102–1110.
Benhenni, K., Ferraty, F., Rachdi, M., Vieu, P., 2007. Local smoothing regression with functional data. Comput. Statist. 22, 353–370.
Bogachev, V.I., 1999. Gaussian Measures. Math Surveys and Monographs, vol. 62. American Mathematical Society, Providence, RI.
Boularan, J., Ferré, L., Vieu, P., 1995. Location of particular points in nonparametric regression analysis. Austral. J. Statist. 37, 161–168.
Chate, H., Courbage, M., 1997. Lattice systems. Physica D 103, 1–612.
Dabo-Niang, S., Rhomari, N., 2003. Estimation non paramétrique de la régression avec variable explicative dans un espace métrique. C. R. Math. Acad. Sci. Paris
336, 75–80.
Dabo-Niang, S., Laksaci, A., 2007. Estimation non paramétrique du mode conditionnel pour variable explicative fonctionnelle. C. R. Math. Acad. Sci. Paris 344,
49–52.
Deheuvels, P., Mason, D., 2004. General asymptotic confidence bands based on kernel type function estimators. Statist. Inference Stochastic Processes 7, 225–277.
Delsol, L., 2007. Régression nonparamétrique fonctionnelle: expression asymptotique des moments. Ann. L'ISUP LI (3), 43–67.
Delsol, L., 2009. Advances on asymptotic normality in nonparametric functional time series analysis. Statistics 43 (1), 13–33.
Ezzahrioui, M., Ould-Sa¨d, E., 2008. Asymptotic normality of nonparametric estimator of the conditional mode for functional data. Journal of Nonparametric
Statistics 20 (1), 3–18.
Ferraty, F., Vieu, P., 2000. Dimension fractale et estimation de la régression dans des espaces vectoriels semi-normés. C. R. Math. Acad. Sci. Paris 330, 139–142.
Ferraty, F., Goia, A., Vieu, P., 2002. Functional nonparametric model for time series: a fractal approach for dimension reduction. TEST 11, 317–344.
Ferraty, F., Rabhi, A., Vieu, P., 2005. Conditional quantiles for functionally dependent data with application to the climatic El phenomenon. Sankhya 67, 378–399.
Ferraty, F., Laksaci, A., Vieu, P., 2006. Estimating some characteristics of the conditional distribution in nonparametric functional models. Statist. Inference
Stochastics Processes 9, 47–76.
Ferraty, F., Vieu, P., 2006. Nonparametric Functional Data Analysis. Theory and Practice. Springer, Berlin.
Ferraty, F., Mas, A., Vieu, P., 2007. Nonparametric regression on functional data: inference and practical aspects. Austral. New Zealand J. Statist. 49, 267–286.
Ferraty, F., Rabhi, A., Vieu, P., 2008a. Estimation non-paramétrique de la fonction de hasard avec variable explicative fonctionnelle. Rom. J. Pure Appl. Math. 53,
1–18.
Ferraty, F., Van Keilegom, I., Vieu, P., 2008b. On the validity of the bootstrap in nonparametric functional regression. Preprint.
Ferraty, F., Vieu, P., 2009. Additive prediction and boosting for functional data. Comput. Statist. Data Anal. 53, 1400–1413.
Kolmogorov, A.N., Tikhomirov, V.M., 1959. -entropy and -capacity. Uspekhi Mat. Nauk 14, 3–86 (Engl Transl. Amer. Math. Soc. Transl. Ser 2 (1961) 277–364).
Kuelbs, J., Li, W., 1993. Metric entropy and the small ball problem for Gaussian measures. J. Funct. Anal. 116, 133–157.
Masry, E., 2005. Nonparametric regression estimation for dependent functional data: asymptotic normality. Stochastic Process. Appl. 115, 155–177.
Onsager, L., Machlup, S., 1953. Fluctuations and irreversible processes, I–II. Phys. Rev. 91, 1505–1515 1512–1515.
Ould-Sa¨d, E., Cai, Z., 2005. Strong uniform consistency of nonparametric estimation of the censored conditional mode function. Nonparametric Statist. 17,
797–806.
Ramsay, J.O., Silverman, B.W., 1997. Functional Data Analysis. Springer, New York.
Rosenblatt, M., 1969. Conditional probability density and regression estimators. In: Krishnaiah, P.R. (Ed.), Multivariate Analysis II. Academic Press, New York,
London.
Roussas, G., 1969. Nonparametric estimation of the transition distribution function of a Markov process. Ann. Math. Statist. 40, 1386–1400.
Samanta, M., 1989. Nonparametric estimation of conditional quantiles. Statist. Probab. Lett. 7, 407–412.
Theodoros, N., Yannis, G.Y., 1997. Rates of convergence of estimate, Kolmogorov entropy and the dimensionality reduction principle in regression. Ann. Statist.
25 (6), 2493–2511.
van der Vaart, A.W., van Zanten, J.H., 2007. Bayesian inference with rescaled Gaussian process priors. Electron. J. Statist. 1, 433–448.
Youndjé, E., 1996. Propriétés de convergence de l'estimateur à noyau de la densité conditionnelle. Rev. Roumaine Math. Pures Appl. 41, 535–566.

You might also like