You are on page 1of 11

:6($675$16$&7,216RQ0$7+(0$7,&6 7RPPDVR/DQGR6HUJLR2UWREHOOL

On the Approximation of a Conditional Expectation


TOMMASO LANDO SERGIO ORTOBELLI
Dipartimento di Scienze aziendali, economiche Dipartimento di Scienze aziendali, economiche
e metodi quantitativi e metodi quantitativi
University of Bergamo University of Bergamo
Via dei Caniana 2, Bergamo Via dei Caniana 2, Bergamo
ITALY ITALY
tommaso.lando@unibg.it sergio.ortobelli@unibg.it

Abstract: - In this paper, we discuss how to approximate the conditional expectation of a random variable Y
given a random variable X, i.e. E(Y|X). We propose and compare two different non parametric methodologies
to approximate E(Y|X). The first approach (namely the OLP method) is based on a suitable approximation of
the ı-algebra generated by X. A second procedure is based on the well known kernel non-parametric regression
method. We analyze the convergence properties of the OLP estimator and we compare the two approaches with
a simulation study.

Key-Words: - Conditional Expectation, Kernel, Non Parametric methods, Regression, Approximation,


Simulation.

1 Introduction Then, it is possible to estimate the density of ݃ሺܺሻ


This paper discusses different methods to estimate with the kernel method. Nevertheless, we observe
the conditional expected value. Letܻ be a random that in several situations it would not be possible to
variable with finite mean, and let ܺ be some other satisfy these sampling assumption, as we only have
random variable defined on the same probability available a bivariate random sample from ሺܺǡ ܻሻ.
space. The conditional expectation ‫ܧ‬ሺܻȁܺሻ is a Therefore, in this paper we attempt to estimate the
random variable and it is a function of ܺ. In distribution of ݃ሺܺሻ simply using a random sample
particular, ‫ܧ‬ሺܻȁܺሻ can be intuitively interpreted as of independent observations ሺ‫ݔ‬ଵ ǡ ‫ݕ‬ଵ ሻǡ ǥ ǡ ሺ‫ݔ‬௡ ǡ ‫ݕ‬௡ ሻ
the function of ܺ that “best” approximates ܻ, in that from the bi-dimensional variable ሺܺǡ ܻሻ. Obviously,
it represents the best approximation as to the value if we had available a sample of independent and
of ܻ, given the only value of the random variable ܺ. identically distributed random variables from ݃ሺܺሻ,
On the one hand, several well known methods are then it would be trivial to estimate the distribution
aimed at estimating the regression function ݃ሺ‫ݔ‬ሻ ൌ of ݃ሺܺሻ. Hence, our idea is that we can use the
‫ܧ‬ሺܻȁܺ ൌ ‫ݔ‬ሻ, which represents just a realization of observations ሺ‫ݔ‬ଵ ǡ ‫ݕ‬ଵ ሻǡ ǥ ǡ ሺ‫ݔ‬௡ ǡ ‫ݕ‬௡ ሻ form ሺܺǡ ܻሻ in
‫ܧ‬ሺܻȁܺሻ, namely: parametric regression methods, order to generate vector of outcomes ሺ݃ଵ ǡ ǥ ǡ ݃௡ ሻ,
semi-parametric and non-parametric regression which approximate the realizations of ݃ሺܺሻ. For
methods such as kernel regression [1],[2], this aim, we propose to use two different methods,
smoothing splines [3], or various generalizations of namely the OLP method, recently introduced by [8],
these models, see e.g. [4]. On the other hand, in and the well known kernel method, as recently
many real world problems (see e.g.[5]) we are suggested by [9]. The OLP method consists in
interested in approximating the random variable approximating the sigma algebra generated by ܺ
݃ሺܺሻ ൌ ‫ܧ‬ሺܻȁܺሻ, and estimating its distribution (denoted by ߪሺܺሻ) with a sigma algebra generated
function. Provided that ݃ሺܺሻ has a density with by a suitable partition of the sample space,
respect to the Lebesgue measure, a method to according to a given number (݇ െ ͳ) of percentiles
estimate the density function of ݃ሺܺሻ have been of ܺ. Hence, by averaging the observed values of ܺ
recently introduced in the literature by Steckey and over the above defined intervals, we can
Henderson [6] (see also [7]). In particular, this approximate the random variable ݃ሺܺሻ and thereby
method is based on a sort of conditional sampling its distribution function. Differently, the kernel non-
which consists in i) sample ܺ; and ii) sample ܻ parametric regression (see [1] and [2]) allows to
from the conditional distribution of ܻ given ܺ ൌ ‫ݔ‬. estimate ‫ܧ‬ሺܻȁܺ ൌ ‫ݔ‬ሻ as a locally weighted

(,661  9ROXPH


:6($675$16$&7,216RQ0$7+(0$7,&6 7RPPDVR/DQGR6HUJLR2UWREHOOL

average, based on the choice of an appropriate Let ܺǣ ߗ ՜ Թ and ܻǣ ߗ ՜ Թ be integrable random


kernel function. Therefore, by applying the kernel variables in the probability space ሺߗǡ Աǡ ܲሻ. When
method to each observation of ܺ (ܺ ൌ ‫ݔ‬ଵ ǡ ܺ ൌ ‫ݔ‬ଶ , Աᇱ ൌ ߪሺܺሻ is the sigma algebra generated by X we
etc.), we obtain n outcomes, which can be similarly write ‫ܧ‬ሺܻȁߪሺܺሻሻ ൌ ‫ܧ‬ሺܻȁܺሻ ൌ ݃ሺܺሻ. Generally, the
used to estimate the distribution of the random distribution of ݃ሺܺሻ is unknown, unless the joint
variable ݃ሺܺሻ. In this paper we compare the two distribution of the random vector ሺܺǡ ܻሻ follows
methods with a simulation analysis. Then we study some special distribution, e.g. the Gaussian
the properties of the OLP estimator and propose distribution or the multivariate t distribution (note
some practical rules to enhance its performance. In that, except for trivial text book examples, an exact
particular, while it is well known that the kernel expression for ݃ሺܺሻ is rare). However, if we assume
method depends on the choice of the kernel function that ܺ and ܻ are jointly normally distributed, i.e.
and the bandwidth parameter, the OLP method ሺܺǡ ܻሻ̱ܰሺߤǡ ȭሻ, (where obviously ߤ ൌ ሺߤ௑ ǡ ߤ௒ ሻ is
depends on the choice of the number of intervals ݇ , the vector of the means, and
ଶ ଶ
used for approximating ߪሺܺሻ. The choice of ݇ is ȭ ൌ ൫ሺߪ௑ ǡ ߩ௑௒ ߪ௑ ߪ௒ ሻǡ ሺߩ௑௒ ߪ௑ ߪ௒ ǡ ߪ௒ ሻ൯ is the
crucial in order to obtain an accurate approximation. variance-covariance matrix1) we can obtain the
First, we propose a rule for determining ݇ under distribution of the random variable ‫ܧ‬ሺܻȁܺሻ quite
general assumptions, and then we compare the easily. Indeed, it is well known that ݃ሺ‫ݔ‬ሻ ൌ

kernel and OLP methods with a simulation study, ‫ܧ‬ሺܻȁܺ ൌ ‫ݔ‬ሻ ൌ ߤ௒ ൅ ߩ௑௒ ఙೊ ሺ‫ ݔ‬െ ߤ௑ ሻ, and thus,

under assumption of normality. Indeed, if we know ఙೊ
the joint distribution of ሺܺǡ ܻሻ and the true ‫ܧ‬ሺܻȁܺሻ ൌ ߤ௒ ൅ ߩ௑௒ ሺܺ െ ߤ௑ ሻ (1)
ఙ೉
distribution ‫ ܨ‬of ‫ܧ‬ሺܻȁܺሻ (which, for instance, can is still Gaussian distributed with mean ߤ௒ and
be easily computed in the Gaussian case), then we ଶ
variance ߩ௑௒ ߪ௒ଶ . Clearly, when the correlation of a
can generate a bivariate random sample from couple of random variables (X,Y) is ߩ௑௒ ൌ േͳǡ then
ሺܺǡ ܻሻ, and finally investigate which estimated ఙ
ܻ ൌ ߤ௒ ൅ ߩ௑௒ ఙೊ ሺܺ െ ߤ௑ ሻ P almost surely and
distribution better fits to ‫ ܨ‬. In the Gaussian case, the ೉
equation (1) holds for any joint distribution of the
performance of the kernel method can be optimized
vector (X,Y). Equation (1) holds also for joint
quite easily (in terms of kernel density function and
Student’s t bivariate vector, as pointed out by [10].
bandwidth parameter). Thus, we compare the
“optimal” kernel method with the OLP method, Basically, if the bivariate random vector ሺܺǡ ܻሻ is
where the number of intervals ݇ is determined Gaussian or t-distributed, we also know the general
without using any information on the joint form of the distribution of ݃ሺܺሻ, and therefore we
distribution. The results show that, even in this can estimate it quite easily. For instance, we could
“adverse” situation for the OLP method, the two approximate ݃ሺܺሻ by estimating the unknown
methods provide comparable outputs. In the last parameters ߤ௒ ǡ ߪ௑௒ and ߪ௑ଶ respectively with the
section we study further properties the OLP sample mean, the sample covariance coefficient and
estimator in the Gaussian case. In particular, we the sample variance.
argue that the performance of the OLP estimator can Unfortunately, in most of the cases we do not know
be further enhanced if the number of intervals used the form of the distribution of ݃ሺܺሻ, thus we cannot
for approximating ߪሺܺሻ is determined according to estimate it with parametric methods. Moreover, we
the correlation between the variables. In the last cannot even use non parametric methods, unless
section we briefly summarize the paper and we having available a random sample drawn from the
propose some possible financial applications. random variable ݃ሺܺሻ, which is surely uncommon
and difficult to obtain. For these reasons, the aim of
this paper is to provide a method for estimating
݃ሺܺሻ and its distribution using a “standard”
2 Problem Formulation bivariate random sample ሺ‫ݔ‬ଵ ǡ ‫ݕ‬ଵ ሻǡ ǥ ǡ ሺ‫ݔ‬௡ ǡ ‫ݕ‬௡ ሻ of
Let ܻ be an integrable random variable on the independent observations from the bi-dimensional
probability space ሺπǡ Աǡ ܲሻ and let ԱԢ be a sub-
sigma-algebra of Ա (i.e. ԱԢ ‫ ك‬Աሻ. The conditional
expectation of ܻ given ԱԢ is the unique (P a.s.) 1
random variable ‫ܧ‬ሺܻȁԱԢሻ such that: For simplicity, in this paper we write, with a little abuse of
i) ‫ܧ‬ሺܻȁԱԢሻ is ԱԢ-measurable; notation, the dispersion matrix of (X,Y) as:
ߪ ଶ ߪ௑௒
ii) ‫ א ܣ׊‬Աᇱ ǡ ‫׬‬஺ ‫ܧ‬ሺܻȁԱԢሻ݀ܲ ൌ ‫׬‬஺ ܻ݀ܲ. ȭൌ൤ ௑ ൨ ൌ ൫ሺߪ௑ ଶ ǡ ߪ௑௒ ሻǡ ሺߪ௒௑ ǡ ߪ௒ ଶ ሻ൯.
ߪ௒௑ ߪ௒ ଶ

(,661  9ROXPH


:6($675$16$&7,216RQ0$7+(0$7,&6 7RPPDVR/DQGR6HUJLR2UWREHOOL

variable ሺܺǡ ܻሻ. For this purpose, we use two ͳ߱ ‫ܣ א‬


whereͳ஺ ሺ߱ሻ ൌ ቄ . Indeed, by definition of
different methods, namely the OLP method, recently Ͳ߱ ‫ܣ ב‬
introduced by [8], and the kernel non parametric the conditional expectation, observe that ‫ܧ‬ሺܻȁԱ௞ ሻis
regression method. a Ա௞ -measurable function such that, for any set
‫ א ܣ‬Ա௞ , (that can be seen as a union of disjoint sets,
in particular ‫ ܣ‬ൌ ‫ڂ‬஺ೕ‫ك‬஺ ‫ܣ‬௝ ሻ we obtain the equality

2.1 The OLP Method


‫׬‬஺ ‫ܧ‬ሺܻȁԱ௞ ሻ݀ܲ ൌ ‫׬‬஺ ܻሺ߱ሻ݀ܲ ሺ߱ሻǤ (4)
The OLP method has been recently introduced by
[8] (see also [11]) to approximate the conditional
expectation, based on an appropriate partition of the It is proved in [8] that ‫ܧ‬ሺܻȁԱ௞ሺ௛ሻ ሻ converges almost
sample space. The method, as defined in [8], certainly to the random variable ‫ܧ‬ሺܻȁܺሻ, that is:
requires the knowledge of the probability measure
ܲ: in this paper we do not rely on this assumption Ž‹௞՜ஶ ‫ܧ‬ሺܻȁԱ௞ ሻ ൌ ‫ܧ‬ሺܻȁܺሻ a.s.. (5)
and we propose an “estimator” for the random
variable ݃ሺܺሻ. Hence, if we approximate ‫ܧ‬ሺܻȁԱ௞ ሻ, then we also
Define by ߪሺܺሻ the ı-algebra generated by X (that approximate ݃ሺܺሻ, for sufficiently large k.
is, ߪሺܺሻ ൌ ܺ ିଵ ሺࣜሻ ൌ ሼܺ ିଵ ሺ‫ܤ‬ሻǣ ‫ࣜ א ܤ‬ሽ, where ࣜ is However, in practical situations, we do not know the
the Borel ı-algebra on Թ). Observe that the probability measure ܲ used to approximate‫ܧ‬ሺܻȁ‫ܣ‬௝ ሻ
regression function is just a “pointwise” realization in (3). Hence, in this paper, we propose to
of the random variable ‫ܧ‬ሺܻȁߪሺܺሻሻ. The following approximate the random variable ‫ܧ‬ሺܻȁԱ௞ ሻ, which in
methodology is aimed at approximating ‫ܧ‬ሺܻȁܺሻ turns approximates ‫ܧ‬ሺܻȁܺሻ, based on the
rather than estimating ݃ሺ‫ݔ‬ሻ. The ı-algebra ߪሺܺሻ can observations of a random sample. Let
be approximated by a ı-algebra generated by a ሺ‫ݔ‬ଵ ǡ ‫ݕ‬ଵ ሻǡ ሺ‫ݔ‬ଶ ǡ ‫ݕ‬ଶ ሻǡ ǥ ǡ ሺ‫ݔ‬௡ ǡ ‫ݕ‬௡ ሻ be a random sample of
suitable partition of π. In particular, for any ݇ ‫ א‬Գ, independent observations from the bi-dimensional
௞ variable ሺܺǡ ܻሻ. First, as we generally do not know
we consider the partition ൛‫ܣ‬௝ ൟ
௃ୀଵ
ൌ ሼ‫ܣ‬ଵ ǡ ǥ ǡ ‫ܣ‬௞ ሽ of the marginal distribution of ܺ, we can determine the
π in ݇ subsets, described as follows: partition ൛‫ܣ‬መ௝ ൟ

using the percentiles of the
ଵ ௃ୀଵ
‫ܣ‬ଵ ൌ ቄ߱ǣ ܺሺ߱ሻ ൑ ‫ܨ‬௑ିଵ ቀ௞ቁቅ,
empirical distribution, obtained from the
௛ିଵ ௛
‫ܣ‬௛ ൌ ቄ߱ǣ ‫ܨ‬௑ିଵ ቀ ௞ ቁ ൏ ܺሺ߱ሻ ൑ ‫ܨ‬௑ିଵ ቀ௞ ቁቅ ǡ observations ሺ‫ݔ‬ଵ ǡ ǥ ǡ ‫ݔ‬௡ ሻ. The number of intervals ݇
݂‫ ݄ݎ݋‬ൌ ʹǡ ǥ ǡ ݇ െ ͳ should be basically an increasing function of the
ିଵ ௞ିଵ
number of observations ݊, as discussed below.
‫ܣ‬௞ ൌ π െ ‫ڂ‬௞ିଵ
௃ୀଵ ‫ܣ‬௝ ൌ ሼ߱ǣ ܺሺ߱ሻ ൐ ‫ܨ‬௑ ቀ ௞ ቁሽ. Then, if we assume to know the probability ‫݌‬௜ ,

The partition ൛‫ܣ‬௝ ൟ is practically determined by a corresponding to the i-th outcome ‫ݕ‬௜ , we obtain:
௃ୀଵ
number (݇ െ ͳ) of percentiles of ܺ. Furthermore,
‫ܧ‬ሺܻȁ‫ܣ‬መ௝ ሻ ൌ σ௫೔ ‫א‬஺෠ೕ ‫ݕ‬௜ ‫݌‬௜ Τܲሺ‫ܣ‬መ௝ ሻ. (6)
note that, by definition of percentile, each interval
‫ܣ‬௝ have equal probability, that is, ܲሺ‫ܣ‬௝ ሻ ൌ ͳȀ݇, for
Otherwise, we can give uniform weight to each
݆ ൌ ͳǡ ǥ ǡ ݇. Starting with the trivial sigma algebra
observation, and thus we can use the following
Աଵ ൌ ሼ‫׎‬ǡ πሽ, we can obtain a sequence of sigma
estimator of ‫ܧ‬ሺܻȁ‫ܣ‬௝ ሻ:
algebras generated by these partitions, for different
values of k. Generally: ଵ
ܽො௝ ൌ σ௫೔‫א‬஺෠ೕ ‫ݕ‬௜ ǡ (7)
௡ ෡

௞ ೕ
Ա௞ ൌ ߪ ቀ൛‫ܣ‬௝ ൟ௃ୀଵ ቁ ǡ ݇ ‫ א‬ԳǤ (2) where ݊஺ೕ is the number of observations in ‫ܣ‬መ௝ , that
is, ݊஺ೕ ൌ ͓ሼ‫ݔ‬௜ ǣ ‫ݔ‬௜ ‫ܣ א‬௝ ǡ ݅ ൌ ͳǡ Ǥ Ǥ ǡ ݊ሽ ؆ ݊Ȁ݇ (to
Hence, it is possible to approximate the random
clarify the explanation, for ݇ ൌ Ͷ we obtain the
variable ‫ܧ‬ሺܻȁԱ௑ ሻ by ௡
three quartiles, and therefore ݊஺ೕ ؆ ସ and similarly

௞ ͳ஺ೕ ሺ߱ሻ ܲሺ‫ܣ‬௝ ሻ can be estimated by ସ). Note that, fixed ݇, as
‫ܧ‬ሺܻȁԱ௞ ሻሺ߱ሻ ൌ  ෍ න ܻ݀ܲ ൌ ௡՜ஶ
௝ୀଵ ܲሺ‫ܣ‬௝ ሻ ஺ೕ the number of observations n grows,ܲ൫‫ܣ‬መ௝ ൯ ሱۛۛሮ
ܲሺ‫ܣ‬௝ ሻ ൌ ͳȀ݇ and ܽො௝ is an asymptotically unbiased
ൌ σ௞௝ୀଵ ‫ܧ‬ሺܻȁ‫ܣ‬௝ ሻͳ஺ೕ ሺ߱ሻ, (3) estimator of ‫ܧ‬ሺܻȁ‫ܣ‬௝ ሻ:

(,661  9ROXPH


:6($675$16$&7,216RQ0$7+(0$7,&6 7RPPDVR/DQGR6HUJLR2UWREHOOL

ͳ σ೙
೔సభ ௬೔ ௄ቀ
ೣషೣ೔

‫ܧ‬ሺܽො௝ ሻ ൌ ෍ ‫ܧ‬ሺܻ௜ ͳ௑೔ ‫א‬஺෠ೕ ሻ ൌ ݃ො௡ ሺ‫ݔ‬ሻ ൌ
೓ሺ೙ሻ
, (11)
݊஺෠ೕ ௜ ೙
σ೔సభ ௄ቀ
ೣషೣ೔

೓ሺ೙ሻ

෡ ௬ௗ௉ ௡՜ஶ
‫׬‬೉‫א‬ಲ


ሱۛۛሮ  ‫ܧ‬ሺܻȁ‫ܣ‬௝ ሻ. (8) where ‫ܭ‬ሺ‫ݔ‬ሻ, denoted by kernel, is a density function
ଵȀ௞ (typically unimodal and symmetric around zero)
such that i) ‫ܭ‬ሺ‫ݔ‬ሻ ൏ ‫ ܥ‬൏ λ; ii) Ž‹୶՜േஶ ȁ‫ܭݔ‬ሺ‫ݔ‬ሻȁ ൌ
Therefore, we are always able to approximate
Ͳ (see, among others, [1] and [2]). Moreover, ݄ሺ݊ሻ
‫ܧ‬ሺܻȁԱ௞ ሻ, and thereby the conditional expectation
is the smoothing parameter, often referred to as the
‫ܧ‬ሺܻȁܺሻ, by using the following estimator :
bandwidth of the kernel, and it is a positive number
such that ݄ሺ݊ሻ ՜ Ͳ when ݊ ՜ λ. When the kernel
௞ ͳ
݃ොை௅௉ ሺܺሻ ൌ ෍ ͳ௑‫א‬஺෠ೕ ෍ ‫ݕ‬௜ ൌ ‫ ܭ‬is the probability density function of a standard
௝ୀଵ ௫೔ ‫א‬஺෠ೕ ݊஺෠ೕ normal distribution, then the bandwidth is the
standard deviation. It was proved in [1] that if ܻ is
ൌ σ௞௝ୀଵ ͳ௑‫א‬஺෠ೕ ܽො௝ . (9) quadratically integrable (see also [12]) then ݃ො௡ ሺ‫ݔ‬ሻ is
a consistent estimator for ݃ሺ‫ݔ‬ሻ. In particular,
where X is assumed independent from the i.i.d. observe that, if we denote by ݂ሺ‫ݔ‬ǡ ‫ݕ‬ሻ the joint
observations ሺ‫ݔ‬௜ ǡ ‫ݕ‬௜ ሻ. Note that ݃ොை௅௉ is a simple density of ሺܺǡ ܻሻ, the denominator of (11) converges
Ա௞ ‡ƒ•—”ƒ„Ž‡function, and it is conceptually to the marginal density of ܺ, while the numerator
different from the classical estimators, which are ஶ
converges to ‫ି׬‬ஶ ‫׬‬ሼ௑ୀ௫ሽ ‫ܲݕ‬ሺ݀‫ݔ‬ǡ ݀‫ݕ‬ሻ. As a
generally aimed at estimating an unknown
parameter rather than a random variable. A further consequence, we know that ݃ො௡ ሺܺሻ ՜௔Ǥ௦Ǥ ݃ሺܺሻ.
From a practical point of view, if we apply the
property of the OLP estimator is that ‫ܧ‬ሺ݃ොை௅௉ ሺܺሻሻ ൌ
kernel estimator to the bi-variate random sample
‫ܧ‬ሺܻሻ:
ሺ‫ݔ‬ଵ ǡ ‫ݕ‬ଵ ሻǡ ǥ ǡ ሺ‫ݔ‬௡ ǡ ‫ݕ‬௡ ሻ we obtain the vector
௞ ሺ݃ଵ ǡ ǥ ǡ ݃௡ ሻ ൌ ሺ݃ො௡ ሺ‫ݔ‬ଵ ሻǡ Ǥ Ǥ ǡ ݃ො௡ ሺ‫ݔ‬௡ ሻሻ. In other words,
‫ܧ‬ሺ݃ොை௅௉ ሺܺሻሻ ൌ ෍ ‫ ܧ‬ቀͳ௑‫א‬஺෠ೕ ܽො௝ ቁ ൌ each value ݃௜ is a weighted average of kernels,
௝ୀଵ centered at each sample observation ‫ݔ‬௜ . Since we

ൌ ܲሺ‫ܣ‬መ௝ ሻ ෍ ‫ܧ‬൫ܽො௝ ൯ ൌ know that ݃௜ ՜ ‫ܧ‬ሺܻȁܺ ൌ ‫ݔ‬௜ ሻ when ݊ ՜ λ, then we
௝ୀଵ can also estimate the distribution function (say
‫ܨ‬ሺ‫ݔ‬ሻ ൌ ܲሺ݃ሺܺሻ ൑ ‫ݔ‬ሻ) of ݃ሺܺሻ with any parametric
ൌ ܲ൫‫ܣ‬መ௝ ൯ σ௞௝ୀଵ ‫ܧ‬ሺܻȁ‫ܣ‬መ௝ ሻ ൌ ‫ܧ‬ሺܻሻǡ (10) or non-parametric method, based on the outcomes
ሺ݃ଵ ǡ ǥ ǡ ݃௡ ሻ.

because it satisfies the basic properties of the


conditional expectation, that is, ‫ܧ‬ሺ‫ܧ‬ሺܻȁԱ௞ ሻሻ ൌ 3 A Simulation Comparison
‫ܧ‬ሺܻሻ. In this section, we compare the OLP and the
Observe that, given a bivariate sample of size ݊, the Kernel method with a simulation study. It is worth
OLP estimator yields ݇ distinct values for noting that the comparison between these methods
݃ොை௅௉ ሺ‫ݔ‬௜ ሻ,, i.e. the ܽො௝ ’s, where each one has is not really balanced. Indeed, the main difference
frequency ݊஺෠ೕ ؆ ݊Ȁ݇, for ݆ ൌ ͳǡ ǥ ǡ ݇. These between the two procedures is that the OLP method
outcomes can be used to estimate the unknown generates ݇ distinct outputs, each one with
distribution function of ݃ሺܺሻ. frequency ݊Ȁ݇, while the kernel method generally
yields ݊ different outputs, one for each observation
of ܺ. Hence, if the kernel density and the bandwidth
2.2 The Kernel Method parameter are suitably chosen, then the kernel
The kernel method, typically used to estimate the method should outperform the OLP method in terms
probability density of an unknown random variable of accuracy. On the other hand, the OLP estimator
(see, for instance, [11]), can also be applied to yields a set of distinct outputs ሺ݃ଵ ǡ ǥ ǡ ݃௞ ሻ where
estimate the regression function ݃ሺ‫ݔ‬ሻ ൌ each value ݃௝ is a conditional average over the set
‫ܧ‬ሺܻȁܺ ൌ ‫ݔ‬ሻ. In particular, if we do not know the ‫ܣ‬መ௝ . Therefore, the values ݃௝ ’s are generally robust
general form of ݃ሺ‫ݔ‬ሻ, except that it is a continuous estimates.
and smooth function, then we can consider the As pointed out in section 2, if we know that the
following kernel estimator: random vector ሺܺǡ ܻሻ is jointly Gaussian (or t-

(,661  9ROXPH


:6($675$16$&7,216RQ0$7+(0$7,&6 7RPPDVR/DQGR6HUJLR2UWREHOOL

distributed), then we also know the true distribution continuous distribution is typically better estimated
of ݃ሺܺሻ. The main motivation of this study is that, if by a large number of distinct observations.
we show that a method provides a good estimate in However, it should be stressed that the OLP method
the normal case, when ݃ሺܺሻ is known, then we has several other advantages compared to the Kernel
argue that it can provide similar results in many method. While the OLP method only requires that ܻ
other cases, when the distribution of ݃ሺܺሻ is is an integrable random variable, the kernel method
unknown. This is especially true for the OLP is suitable only for continuous random variables and
method, which does not depend on any particular requires also the assumption of finite variance.
specification except from the choice of the number Moreover, for the OLP method we only need to
of intervals ݇. Hence, assuming that specify how to determine the number of intervals k,
ሺܺǡ ܻሻ̱ܰሺߤǡ ȭሻ, (whereߤ ൌ ሺߤ௑ ǡ ߤ௒ ሻ and ȭ ൌ while, for the kernel method, we have to choose the
൫ሺߪ௑ ଶ ǡ ߩߪ௑ ߪ௒ ሻǡ ሺߩߪ௑ ߪ௒ ǡ ߪ௒ ଶ ሻ൯) we propose to “best” kernel density and bandwidth parameter. In
simulate a bivariate random sample from ሺܺǡ ܻሻ and particular, for the proposed analysis we used the
to apply the OLP and the Kernel methods to the following specifications.
observed data, just as described in section 2, in i) OLP - number of intervals
order to evaluate which method yields a better Obviously, the selected number of intervals k
approximation of the distribution of ݃ሺܺሻ. Indeed, can vary between 1 and n and, in order to
in both cases we obtain n outcomes ሺ݃ଵ ǡ ǥ ǡ ݃௡ ሻ, improve the accuracy of the estimate it must
which are used to estimate the probability generally be an increasing function of n (we
distribution ‫ܨ‬ሺ‫ݔ‬ሻ ൌ ܲሺ݃ሺܺሻ ൑ ‫ݔ‬ሻ of the r.v. ݃ሺܺሻ shall discuss this point in the next section).
(i.e. the Gaussian distributionܰሺߤ௒ ǡ ȁߩȁߪ௒ ሻ in this Note that, for ݇ ൌ ͳ we approximate the
particular case). For this purpose, we simply apply random variable ݃ሺܺሻ with a number, i.e. the
the empirical distribution function to the vector sample mean ‫ݕ‬ത, which is obviously not
ሺ݃ଵ ǡ ǥ ǡ ݃௡ ሻ. The empirical distribution is actually appropriate. On the other hand, for ݇ ൌ ݊ we
the natural consistent estimator of ‫( ܨ‬see e.g. [13]), approximate ݃ሺܺሻ with the marginal
and it is defined by distribution of ܻ, given by ‫ݕ‬ଵ ǡ Ǥ Ǥ ǡ ‫ݕ‬௡ , which is
also generally inappropriate. Hence, in order to
ଵ maximize i) the number of intervals, and ii) the
‫ܨ‬෠௡ ሺ‫ݔ‬ሻ ൌ ௡ σ௡௜ୀଵ ͳሼ௚೔ஸ௫ሽ , (12)
number of observations in each interval (݊஺෠ೕ ),
where ͳ஺ is the indicator function for the set ‫ܣ‬. In in this analysis we propose to use:
this paper we propose to use ‫ܨ‬෠௡ as a non-parametric ݇ ൌ ඃξ݊ඇ, (15)
estimator for the distribution of ݃ሺܺሻ. Obviously, where ‫ ۀݔڿ‬is the integer part of ‫ݔ‬. By doing so,
according to how the outcomes ሺ݃ଵ ǡ ǥ ǡ ݃௡ ሻ are we obtain ݇ intervals containing
generated (OLP or Kernel) we obtain two different (approximately) ݇ observations. If we do not
empirical distributions ‫ܨ‬෠௡ ’s, which approximate the have any information about the dependence
true distribution, given by ܰሺߤ௒ ǡ ȁߩȁߪ௒ ሻ. Then, in between ܺ and ܻ, this method is actually the
order to evaluate which method provides the best most robust, in that it provides the largest
fitting distribution, we compute two different possible number of conditional averages ܽො௝ ,
descriptive measures of fit based on probability where each ܽො௝ is computed based on the largest
distances, namely the Kolmogorov-Smirnov (or possible number of values (݊஺෠ೕ ). In the next
uniform) metric, defined by
section we prove that the rule identified by (14)
is actually appropriate and yields a consistent
‫ܦ‬൫‫ܨ‬෠௡ ǡ ‫ܨ‬൯ ൌ •—’௫‫א‬Թ ȁ‫ܨ‬෠௡ ሺ‫ݔ‬ሻ െ ‫ܨ‬ሺ‫ݔ‬ሻȁ, (13) estimator. Moreover, in section 4 we propose a
new rule for the choice of ݇, based on the
and the Kantorovich metric [14] (i.e. the ‫ܮ‬ଵ  metric correlation value between ܺ and ܻ, that might
for distribution functions), defined by further enhance the performance of the OLP
ஶ estimator.
‫ܭ‬൫‫ܨ‬෠௡ ǡ ‫ܨ‬൯ ൌ ‫ି׬‬ஶ ȁ‫ܨ‬෠௡ ሺ‫ݔ‬ሻ െ ‫ܨ‬ሺ‫ݔ‬ሻȁ݀‫ݔ‬. (14) ii) Kernel – density and bandwidth
Generally, the choice of the kernel density and
Generally, provided that both methods can capture especially the bandwidth parameter can be
the shape of ‫ܨ‬, we would expect that the kernel really troublesome and this could be a
method yields a better fit, because of its larger drawback of the method. Indeed, there are
number of distinct outcomes ݃௜ ’s. Indeed, a several sophisticated techniques to choose the

(,661  9ROXPH


:6($675$16$&7,216RQ0$7+(0$7,&6 7RPPDVR/DQGR6HUJLR2UWREHOOL

optimal bandwidth, which is still an open the results of the OLP method are surprisingly
problem in the literature (see e.g. [15]). Since valiant.
we know that ሺܺǡ ܻሻ is jointly normally
distributed, we simply use the normal kernel,
which is obviously the most appropriate choice Kernel OLP
in this particular case. As for the bandwidth ߩ D K D K
parameter, we can use the Sturge’s or the 0 0.784 0.059 0.565 0.207
Scott’s rule (see [16] and [17]) which are 0.09
especially suitable under normality assumptions 0.894 0.070 0.609 0.203
(see also [18]). In particular, in what follows we 0.18 0.255 0.044 0.284 0.088
shall show just the results obtained by applying 0.27 0.227 0.073 0.122 0.060
the Scott’s rule, because it provided better 0.36 0.225 0.105 0.177 0.085
approximations of ‫ ܨ‬in our analyses. We recall 0.45 0.059 0.031 0.118 0.085
that the optimal bandwidth, according to the
0.54 0.076 0.057 0.111 0.069
Scott’s rule, is given by ͵Ǥͷߪ௑ ݊ିଵȀଷ.
Note that, in the Gaussian case, the distribution of 0.63 0.111 0.103 0.169 0.122
the conditional expectation, that is, 0.72 0.103 0.106 0.072 0.078
݃ሺܺሻ̱ܰሺߤ௒ ǡ ȁߩȁߪ௒ ሻ, mainly depends on the 0.81 0.098 0.129 0.094 0.091
correlation between the variables. Hence, we 0.9 0.067 0.132 0.077 0.082
generated several random samples of different sizes
Table 1: simulations with ݊ ൌ ͷͲͲ from a multivariate
from ሺܺǡ ܻሻ̱ܰሺߤǡ ȭሻ (where we assume the
normal distribution
marginals be standard normal random variables,
i.e.,ߤ௑ ൌ ߤ௒ ൌ Ͳǡ ߪ௑ ൌ ߪ௒ ൌ ͳሻǡfor different
(positive) values of ߩ, and analyzed the results Kernel OLP
accordingly. The Table 1 shows the results in terms ߩ D K D K
of the probability metrics defined above. First, note
that the K-S distance is quite high (about 0.5) for 0 0.601 0.009 0.502 0.045
ߩ ൌ Ͳ. The obvious reason is that ߩ ൌ Ͳ yields 0.09 0.067 0.008 0.062 0.015
݃ሺܺሻ ൌௗ ‫ܧ‬ሺܻሻ, which is a degenerate distribution 0.18 0.038 0.011 0.030 0.009
(at 0, in this case) and therefore the Kolomogorov- 0.27 0.034 0.013 0.033 0.011
Smirnov metric ‫ܦ‬൫‫ܨ‬෠௡ ǡ ͳሼ௫ஹ଴ሽ ൯ is generally close to 0.36 0.017 0.008 0.029 0.010
0.5, while the Kantorovich metric, which is based 0.45
on the area between the functions, better captures 0.017 0.009 0.017 0.010
the distance between the distributions in this 0.54 0.012 0.010 0.017 0.013
particular case. Note also that the consistency of 0.63 0.013 0.010 0.015 0.013
both methods is apparent from tables 1 and 2, in that 0.72 0.009 0.010 0.014 0.012
increasing the sample size (from ݊ ൌ ͷͲͲ in Table 1 0.81 0.007 0.009 0.015 0.013
to ݊ ൌ ͳͲହ in Table 2) the distance between the true 0.9
and the estimated distribution approaches zero, for 0.005 0.008 0.014 0.014
any fixed value of ߩ (except for the Kolomogorov- Table 2: simulations with ݊ ൌ ͳͲହ from a multivariate
normal distribution
Smirnov distance at ߩ ൌ Ͳ, as explained above).
However, we observe that the kernel method
Similarly, we performed a simulation analysis also
generally outperforms the OLP method. Although in
for the Student’s t distribution. We generated
several cases the results are similar (the OLP
several random samples of different sizes from
method is better only in some rare cases), as
ሺܺǡ ܻሻ̱ܵ‫ݐ‬ሺߤǡ ߑǡ Ͷሻ (withߤ ൌ ሺͲǡʹሻ and ߪ௒ ൌ ߪ௒ ൌ
expected and discussed above. In particular, note
ͳ): in this case, we know that ‫ܧ‬ሺܻȁܺሻ̱ܵ‫ݐ‬ሺͲǡ ȁߩȁǡ Ͷሻ.
that the kernel method generally provides more
The results confirms what discussed above, as
accurate estimates than the OLP for small or large
values of ߩ: indeed, the value of ߩ will be critical shown in Table 3, for ݊ ൌ ͳͲହ . In particular,
observe that the OLP estimator outperforms the
for the optimal choice of ݇, as discussed in the next
section. Nevertheless, if we consider that the kernel kernel estimator for values of ߩ between 0.18 and
method has been calibrated just to provide the best 0.45, but in the other cases the kernel estimator is
possible estimates under assumption of normality, more accurate.
Finally, Fig. 1 and Fig. 2 show that the OLP
estimator well captures the shape of the distribution

(,661  9ROXPH


:6($675$16$&7,216RQ0$7+(0$7,&6 7RPPDVR/DQGR6HUJLR2UWREHOOL

but approximates it with a simple function which


1.0

has an inferior number of addends. Differently in


Fig.3 we observe that the kernel method yields more 0.8

accurate results for small values of ߩ.


0.6

Kernel OLP 0.4

ߩ D K D K
0 0.670 0.023 0.590 0.057 0.2

0.09 0.079 0.015 0.069 0.017


0.18 0.056 0.019 0.039 0.014 1 1 2 3 4 5

0.27 Fig. 2. ሺܺǡ ܻሻ̱ܵ‫ݐ‬ሺߤǡ ߑǡ Ͷሻ,ߤ ൌ ሺͲǡʹሻ, ߪ௒ ൌ ߪ௒ ൌ ͳ and


0.025 0.019 0.027 0.016
0.36 ߩ ൌ ͲǤͷ. n=10000, Green=true dist ,Red= estimated dist
0.021 0.017 0.020 0.015
(Kernel), Blue=estimated dist (OLP)
0.45 0.019 0.018 0.021 0.017
0.54 0.011 0.013 0.025 0.019
0.63 1.0
0.014 0.018 0.017 0.024
0.72 0.011 0.016 0.024 0.023
0.81 0.8
0.005 0.010 0.012 0.022
0.9 0.006 0.012 0.013 0.023
0.6
Table 3: simulations with ݊ ൌ ͳͲହ from a multivariate t
distribution
0.4

0.2

1.0
1 2 3 4
Fig. 3ሺܺǡ ܻሻ̱ܵ‫ݐ‬ሺߤǡ ߑǡ Ͷሻ,ߤ ൌ ሺͲǡʹሻǡ ߪ௒ ൌ ߪ௒ ൌ ͳ and
0.8
ߩ ൌ ͲǤͳ. n=1000, Green=true dist ,Red= estimated dist
(Kernel), Blue=estimated dist (OLP)
0.6

0.4
4 On the Optimal Number of
0.2 Intervals
The simulation comparison in section 3 was
apparently “rigged” in favor of the kernel method.
1 1 2 3 4 5 Indeed, we used the information about the
Fig. 1. ሺܺǡ ܻሻ̱ܰሺߤǡ ߑሻ, ߤ ൌ ሺͲǡʹሻ, ߪ௒ ൌ ߪ௒ ൌ ͳ and distributional assumptions (normality) to improve
ߩ ൌ ͲǤͺ. Green=true dist ,Red= estimated dist (Kernel), the results of the kernel method as much as possible,
Blue=estimated dist (OLP) i.e. using the normal kernel and the optimal
bandwidth. On the other hand, this information was
not used also to enhance the performance of the
OLP estimator. However, in this adverse situation,
we obtained that the OLP estimator yields surprising
results. Differently, in this section we propose to use
the information about the joint distribution in order
to further improve the OLP estimator, under some
particular conditions.
As discussed in section 3, the number of intervals k
for the OLP method can vary between 1 and n.

(,661  9ROXPH


:6($675$16$&7,216RQ0$7+(0$7,&6 7RPPDVR/DQGR6HUJLR2UWREHOOL

However, if we assume that the random vector intervals ‫ܣ‬௝ , ݆ ൌ ͳǡ ǥ ǡ ݇. Then, the mean squared
ሺܺǡ ܻሻ is jointly normally distributed, then we know error of the OLP estimator is given by:
that, for ߩ ൌ Ͳ, we obtain ݃ሺܺሻ ൌௗ ‫ܧ‬ሺܻሻ, and ଶ
‫ ܧ‬ቂ൫݃ොை௅௉ ሺܺሻ െ ݃ሺܺሻ൯ ቃ ൌ
therefore the distribution of ݃ሺܺሻ can be better
estimated by a number, that is, the sample mean ‫ݕ‬ത. ௞
ൌ ൅ ߩଶ ቊͳ െ ݇ σ௞௝ୀଵ ቂ݂௑ ൬‫ܨ‬௑ିଵ ቀ
௝ିଵ
ቁ൰ ൅
Hence, in this particular case, the optimal value of k ௡ ௞

is exactly 1, rather than ඃξ݊ඇ. On the other hand, for ௝ ଵ
െ݂௑ ൬‫ܨ‬௑ିଵ ቀ௞ቁ൰ቃ ቀͳ ൅ ௡ቁൠ, (16)
|ߩȁ ൌ ͳ (i.e. ܻ ൌ ܽ ൅ ܾܺ), we obtain ݃ሺܺሻ ൌ
‫ܧ‬ሺܽ ൅ ܾܺȁܺሻ ൌ ܽ ൅ ܾܺ ൌ ܻ, and therefore the where ݂௑ is the (Gaussian) marginal density of X.
distribution of ݃ሺܺሻ can be estimated with the
marginal observations of ܻ, ሺ‫ݕ‬ଵ ǡ ǥ ǡ ‫ݕ‬௡ ሻ. Thus, when Proof
ȁߩȁ ൌ ͳ the optimal value of k is exactly n: in this We know that: ‫ܧ‬ሺܻȁܺሻ ൌ ߩ̱ܺܰሺͲǡ ȁߩȁሻ. Moreover,

case we would get the maximum possible number ‫ ܧ‬ቂ൫݃ොை௅௉ ሺܺሻ െ ݃ሺܺሻ൯ ቃ ൌ ‫ܧ‬ሾሺ݃ොை௅௉ ሺܺሻ െ ߩܺሻଶ ሿ ൌ
(݊ ൒ ඃξ݊ඇ) of distinct observations, ൌ ‫ܧ‬ሺ݃ොை௅௉ ሺܺሻሻଶ ൅ ߩଶ െ ʹߩ‫ܧ‬ሺܺ݃ොை௅௉ ሺܺሻሻ, where
From these considerations, we argue that, also in the ௞ ଶ
case that ߩ ‫ א‬ሺͲǡͳሻ, the dependence between ܺ and ‫ܧ‬ሺ݃ොை௅௉ ሺܺሻሻଶ ൌ ‫ ܧ‬ቆ෍ ͳ௑‫א‬஺ೕ ܽො௝ ቇ ൌ
ܻ should influence the choice of the number of ௝ୀଵ
௞ ଶ
intervals k. In particular, k should be chosen ൌ ‫ ܧ‬ቆ෍ ͳ௑‫א‬஺ೕ ൫ܽො௝ ൯ ቇ ൌ
according to the mean-dependence structure ௝ୀଵ
between the random variables. We recall that ͳ ௞
generally stochastic independence implies mean- ൌ ෍ ‫ܧ‬൫ܽො௝ ଶ ൯
݇ ௝ୀଵ
independence, which in turn implies uncorrelation because σ௜ஷ௝ ܽො௝ ܽො௝ ͳ௑‫א‬஺೔‫ת‬஺ೕ ൌ Ͳ.
(nevertheless, if ሺܺǡ ܻሻ is jointly Gaussian the three
conditions are equivalent). In the following
Note that
proposition we derive the formula of the mean
݇ ଶ ଶ
squared error (MSE) between the OLP estimator ‫ܧ‬ሺܽො௝ ଶ ሻ ൌ ൬ ൰ ‫ ܧ‬ቆ൤෍ ܻ௜ ͳ௑‫א‬஺ೕ ൨ ቇ
and the conditional expectation ݃ሺܺሻ in the ݊ ௜

Gaussian case. It should be stressed, that generally ݇ ଶ ݊‫ ܧ‬ቀܻ௜ ଶ ͳ௑‫א‬஺ೕ ቁ ൅


the MSE is intended as the expectation of the ൌ൬ ൰ ቌ ቍൌ
݊ ൅෍ ‫ ܧ‬ቂቀܻ௜ ͳ௑‫א‬஺ೕ ቁ ቀܻ௛ ͳ௑‫א‬஺ೕ ቁቃ
squared error between an estimator, that is, a ௜ஷ௛
random variable, and a number. Interestingly, in this ݇ ଶ ଶ
special case the MSE is based on the difference ൌ ൬ ൰ ൬݊‫ ܧ‬ቀܻ ଶ ͳ௑‫א‬஺ೕ ቁ ൅ ݊ሺ݊ െ ͳሻ ቂ‫ ܧ‬ቀܻͳ௑‫א‬஺ೕ ቁቃ ൰
݊
between two random variables. We show that the ‫ ܧ‬ቀܻ ଶ ͳ௑‫א‬஺ೕ ቁ ൅
MSE is a mathematical function of the correlation ଶ
݇ ‫ۇ‬ ଶ‫ۊ‬
coefficient, and therefore we can provide a simple ൌ ஶ
݂௑௒ ሺ‫ݔ‬ǡ ‫ݕ‬ሻ
݊ ‫ۈ‬൅ሺ݊ െ ͳሻ ൥න ݂௑ ሺ‫ݔ‬ሻ න ‫ݕ‬ ݀‫ݔ݀ݕ‬൩
‫ۋ‬
rule of thumb for determining the optimal number of ௑‫א‬஺ೕ ିஶ ݂௑ ሺ‫ݔ‬ሻ
intervals ݇, also in the case ȁߩȁ ‫ א‬ሺͲǡͳሻ. Without ‫ۉ‬ ‫ی‬
loss of generality, we focus on the special case that ‫ ܧ‬ቀܻ ଶ ͳ௑‫א‬஺ೕ ቁ ൅ ሺ݊ െ ͳሻ ȉ
ሺܺǡ ܻሻ̱ܰሺͲǡ ȭሻ with ߪ௒ ൌ ߪ௒ ൌ ͳ, to simplify the ‫ۇ‬ ଶ ଶ‫ۊ‬
݆െͳ
computation. Obviously, if ߩ ൌ Ͳ then ‫ܧ‬ሺܻȁܺሻ ൌ Ͳ ݇ଶ ‫ۍ ۈ‬ ൝െ ቆ‫ܨ‬௑ ൬ିଵ
൰ቇ Ȁʹൡ ൅ ‫ې‬ ‫ۋ‬
ൌ ‫ۇ ߩ ێ ۈ‬ ݇ ‫ۊ‬ ‫ۋ ۑ‬
and ݇ ൌ ͳ is the optimal choice, if ȁߩȁ ൌ ͳ then ݊ ‫ۈ‬ȉ ‫ێ‬ ‫ۋ‬
‫’š‡ۈ‬
‫ێ ۈ‬ξʹߨ ‫ۈ‬ ଶ ‫ۋ ۑۋ‬
‫ۋ‬
‫ܧ‬ሺܻȁܺሻ ൌ േܺ ൌ ܻ and ݇ ൌ ݊ is the optimal choice, ݆ ‫ۑ‬
‫ێ‬ െ ‡š’ ൝െ ቆ‫ܨ‬௑ିଵ ൬ ൰ቇ Ȁʹൡ ‫ۑ‬
as discussed above. For the proof of the following ‫ۏ ۉ‬ ‫ۉ‬ ݇ ‫ی ےی‬
proposition we assume to know the true percentiles
of ܺ and thereby the true intervals ‫ܣ‬௝ . where
ஶ ௙೉ೊ ሺ௫ǡ௬ሻ
‫ି׬‬ஶ ‫ݕ‬ ௙೉ ሺ௫ሻ
݀‫ ݕ‬ൌ ‫ܧ‬ሺܻȁܺሻ ൌ ߩܺ,
Proposition 1. Let ሺܺǡ ܻሻ be a bivariate Gaussian
௫మ ௫మ
vector, ሺܺǡ ܻሻ̱ܰሺͲǡ ߑሻ with ȭ ൌ ൫ሺͳǡ ߩሻǡ ሺߩǡ ͳሻ൯ න ‫ ି ݁ݔ‬ଶ ൌ െ ݁ ି ଶ ǡ
(i.e. ߪ௒ ൌ ߪ௒ ൌ ͳ ), and let ሺ‫ݔ‬ଵ ǡ ‫ݕ‬ଵ ሻǡ ǥ ǡ ሺ‫ݔ‬௡ ǡ ‫ݕ‬௡ ሻ be a
random sample of independent observations from and the equality ‫ ܧ‬ቂቀܻ௜ ͳ௑‫א‬஺ೕ ቁ ቀܻ௛ ͳ௑‫א‬஺ೕ ቁቃ ൌ
ሺܺǡ ܻሻ. Assume to know the ݇ െ ͳ percentiles ଶ
௝ ቂ‫ ܧ‬ቀܻ௜ ͳ௑‫א‬஺ೕ ቁቃ holds for the independence between
‫ܨ‬௑ିଵ ቀ ቁ of ܺ, for ݆ ൌ ͳǡ ǥ ǡ ݇ െ ͳ, and thereby the
௞ the observations. Observe also that

(,661  9ROXPH


:6($675$16$&7,216RQ0$7+(0$7,&6 7RPPDVR/DQGR6HUJLR2UWREHOOL

σ௞௝ୀଵ ‫ ܧ‬ቀܻ ଶ ͳ௑‫א‬஺ೕ ቁ ൌ σ௞௝ୀଵ ‫׬‬௑‫א‬஺ ‫ ݕ‬ଶ ݂௒ ሺ‫ݕ‬ሻ ൌ ܸሺܻሻ ൌ slower than the number of observations. Corollary 1
ೕ gives necessary and sufficient conditions for the
ͳ. Then: consistency of the OLP estimator in the Gaussian
ͳ ௞
case. We argue that the same rule holds also if the
෍ ‫ܧ‬൫ܽො௝ ଶ ൯ ൌ
݇ ௝ୀଵ joint distribution is not normal. Obviously, the rule
݇ ߩଶ ௞ ݆െͳ
ଶ ݇ ൌ ‫݊ڿ‬௔ ‫ ۀ‬where ܽ ‫ א‬ሺͲǡͳሻ (e.g. ݇ ൌ ඃ݊଴Ǥହ ඇ, used in
ൌ ቌͳ ൅ ሺ݊ െ ͳሻ ෍ ൥‡š’ ൝െ ቆ‫ܨ‬௑ିଵ ൬ ൰ቇ Ȁʹൡ the simulation analysis) satisfies the conditions of
݊ ʹߨ ௝ୀଵ ݇
Corollary 1. In order to further increase the

݆
ଶ convergence rate of the OLP estimator, we argue
െ ‡š’ ൝െ ቆ‫ܨ‬௑ିଵ ൬ ൰ቇ Ȁʹൡ൩ ൱Ǥ that, when ȁߩȁ is close to 0 the exponent ܽ should be
݇
close to 0, and when ȁߩȁ is close to 1 the exponent ܽ
Furthermore, since should be close to ͳ. Indeed, observe that the second

݇ߩ ݆ term in the MSE expression approaches zero only as
ିଵ
‫ܧ‬ሺܽො௝ ሻ ൌ ൭‡š’ ൝െ ቆ‫ܨ‬௑ ൬ ൰ቇ Ȁʹൡ ݇ tends to infinity, but it can be negligible for small
ξʹߨ ݇

values of ȁߩȁ, while in this case the asymptotic
ିଵ
݆൅ͳ behavior of the first term is critical. On the other
െ ‡š’ ൝െ ቆ‫ܨ‬௑ ൬ ൰ቇ Ȁʹൡ൱ǡ
݇ hand, we obtain the exactly opposite situation when
we obtain the variables are highly correlated, thus, in this case,
௞ a larger value of ݇ would increase the convergence
‫ܧ‬൫ܺ݃ොை௅௉ ሺܺሻ൯ ൌ ෍ ‫ܧ‬൫ܽො௝ ൯ ‫ ܧ‬ቀܺͳ௑‫א‬஺ೕ ቁ ൌ rate. Hence, as a rule of thumb, we propose to use
௝ୀଵ మȀయ
ଶ ݇ ൌ ݇ ‫ כ‬ൌ ቒ݊ȁ௥ȁ ቓ (17)
݇ߩ ݆
ൌ ൭‡š’ ൝െ ቆ‫ܨ‬௑ିଵ ൬ ൰ቇ Ȁʹൡ (where ‫ ݎ‬is the value of the empirical correlation
ʹߨ ݇
between the data) which yields ݇ ൌ ͳ for ȁ‫ݎ‬ȁ ؆ Ͳ,
ଶ ଶ
݆൅ͳ ݇؆݊ for ȁ‫ݎ‬ȁ ؆ ͳ, and ensures that
െ ‡š’ ൝െ ቆ‫ܨ‬௑ିଵ ൬ ൰ቇ Ȁʹൡ൱ ǡ ‫ܧܵܯ‬൫݃ොை௅௉ ሺܺሻ൯ ՜ Ͳ for any different value of ߩ.
݇
The possible usefulness of this simple rule is well
which yields the thesis.
described by the following examples.
Obviously, in practical cases we do not know the
Example
true percentiles, but we estimate them with the
Let ሺܺǡ ܻሻ̱ܰሺߤǡ ߑሻ with ߤ ൌ ሺͲǡʹሻǡ ߪ௒ ൌ ߪ௒ ൌ ͳ
empirical distribution: these estimates are
and ߩ ൌ ͲǤ͹, which yields that ݃ሺܺሻ̱ܰሺʹǡͲǤ͹ሻ. We
consistent, that is, for fixed ݇ and for ݊ ՜ λ the
generate a bivariate sample of size 1000 from the
sample percentiles converge to the true ones (as an
obvious consequence of the law of large numbers). random vector ሺܺǡ ܻሻ and estimate the distribution
Nevertheless, we also need that ݇ ՜ λ in order to function ‫ ܨ‬of ݃ሺܺሻ with the OLP method, using the
మȀయ
obtain a consistent estimator of ݃ሺܺሻ, therefore if following values of ݇: 1) ݇ ൌ ݇ ‫ כ‬ൌ ቒ݊ȁ௥ȁ ቓ ൌ ʹʹ͵
the number of estimands grows as fast as ݊ does, (where ‫ ݎ‬ൌ ͲǤ͸ͻ); 2) ݇ ൌ ‫݊ڿ‬଴Ǥଷ ‫ ۀ‬ൌ ͹; 3) ݇ ൌ
then the MSE of OLP method will not converge to ඃξ݊ඇ ൌ ͵ͳ; and 4) ݇ ൌ ඃ݊଴Ǥଽହ ඇ ൌ ͹Ͳ͹. Fig. 4 below
0. In view of Proposition 1, it is apparent that, in the shows that the best performance of the OLP
case ȁߩȁ ് ͳ, the necessary and sufficient conditions estimator is obtained for 1) ݇ ൌ ඃ݊ȁ௥ȁ ඇ, as in this
for the convergence of the OLP estimator are: i)
case the estimated distribution is incredibly well-
݇ሺ݊ሻ ՜ λ; iii) ݇Ȁ݊ ՜ Ͳ. This is stated in the
fitting to the true one. The other methods surely
following corollary, which is a straightforward
provide inferior performances. Indeed, the estimated
consequence of Proposition 1.
distributions yielded by 2) and 3) seem to capture
the shape of the reference distribution ‫( ܨ‬that is,
Corollary 2. Let ሺܺǡ ܻሻ be a bivariate Gaussian
ܰሺʹǡͲǤ͹ሻ) but approximate it with a “lower
vector, i.e. ሺܺǡ ܻሻ̱ܰሺߤǡ ߑሻ͕ where ȁߩȁ ‫ א‬ሺͲǡͳሻ, and
resolution”, in that the number of intervals (i.e.
let ሺ‫ݔ‬ଵ ǡ ‫ݕ‬ଵ ሻǡ ǥ ǡ ሺ‫ݔ‬௡ ǡ ‫ݕ‬௡ ሻ be a random sample of distinct values ܽො௝ ) is quite poor (especially in case
independent observations from ሺܺǡ ܻሻ. A necessary
2)). Conversely, in 4) we have a higher value of
and sufficient condition for ‫ܧܵܯ‬൫݃ොை௅௉ ሺܺሻ൯ ՜ Ͳ is distinct values of ܽො௝ and consequently a “higher
that ݇ ՜ λ and ݇Ȁ݊ ՜ Ͳ.
resolution” in the plot, but the estimated distribution
has apparently a different shape from the true one.
In other words, the general rule is that the number of
Nevertheless these results also confirm that with
percentiles (which have to be estimated) must grow

(,661  9ROXPH


:6($675$16$&7,216RQ0$7+(0$7,&6 7RPPDVR/DQGR6HUJLR2UWREHOOL

݇ ൌ ඃξ݊ඇ we generally obtain a good compromise We finally argue that the proposed rule can be
between closeness to the true distribution and appropriate for dealing with distributions which are
number of distinct observations ܽො௝ . approximately Gaussian, and therefore can be
usefully applied to several kinds of data, because of
case 1 case 2 the central limit theorem for random vectors.
1.0 1.0 Indeed, the multidimensional central limit theorem
0.8 0.8 states that the standardized sum of i.i.d random
0.6 0.6 vectors (with finite variance) converges to
0.4 0.4 a multivariate normal distribution. Therefore, just
0.2 0.2 based on the assumption that ሺܺǡ ܻሻ have a joint
0 1 2 3 4 0 1 2 3 4 distribution (i.e. they are defined on the same
probability space), we can generally use the
case 3 case 4
1.0 1.0 empirical correlation between the observations of ܺ
0.8 0.8 and ܻ in order to get information about the
0.6 0.6 dependence between the variables, and thereby to
0.4 0.4 determine the optimal number of intervals for the
0.2 0.2 OLP estimator.
0 1 2 3 4 0 1 2 3 4
Fig. 4. Estimated (blue) and true (red) distribution
functions for different values of k in the Gaussian case. 5 Conclusion
In this paper, we proposed two different procedures
Furthermore, we repeat the same experiment for a (OLP and kernel) to estimate the distribution
bivariate t distribution, that is ሺܺǡ ܻሻ̱ܵ‫ݐ‬ሺߤǡ ߑǡ ͷሻ function of a conditional expectation, based on a
with ߤ ൌ ሺͲǡʹሻǡ ߪ௒ ൌ ߪ௒ ൌ ͳ and ߩ ൌ ͲǤ͵. In this bivariate random sample. In particular, the
case we obtain ݃ሺܺሻ̱ܵ‫ݐ‬ሺʹǡͲǤ͵ǡͷሻ. We generate a properties of the OLP estimator have been studied
bivariate sample of size 10000 from the random thoroughly. It has been shown that the method can
vector ሺܺǡ ܻሻ and estimate the distribution function provide a consistent approximation of the random
‫ ܨ‬of ݃ሺܺሻ with the OLP method, using the variable ‫ܧ‬ሺܻȁܺሻ, based on a suitable choice of a
following values of ݇: 1) ݇ ൌ ݇ ‫ כ‬ൌ ʹ͸ (where parameter ݇. Both the OLP and kernel methods
make it possible to estimate the distribution function
‫ ݎ‬ൌ ͲǤ͵ʹ); 2) ݇ ൌ ‫݊ڿ‬଴Ǥଵ ‫ ۀ‬ൌ ͳ; 3) ݇ ൌ ඃξ݊ඇ ൌ ͳͲͲ;
of‫ܧ‬ሺܻȁܺሻ non parametrically. Our simulation
and 4) ݇ ൌ ‫݊ڿ‬଴Ǥ଻ ‫ ۀ‬ൌ ͳʹͷ. Fig. 5. shows that in case
results show that the methods are comparable.
1) and 3) we apparently obtain the best
However, it should be stressed that the OLP method
approximations. However, the Kantorovich metric
presents several advantages compared to the kernel,
(ͲǤͳ͸ͺ for case 1 and ͲǤͳͺ for case 3) confirms that
in that it does not require any particular assumption
the best choice is ݇ ‫כ‬. in order to be applied. Conversely, the kernel
method requires on many restrictive conditions,
case 1 case 2
1.0 1.0 such as continuity and finite variance, and its
0.8 0.8 performance depends on a suitable choice of the
0.6 0.6 kernel function and bandwidth parameter. Finally,
0.4 0.4 since the performance of the OLP estimator depends
0.2 0.2 just on the chosen number of intervals ݇, we
provided some general criteria for the choice of ݇
1 0 1 2 3 4 5 1 0 1 2 3 4 5 under normal assumptions. Consequently, we
case 3 case 4 proposed a practical rule in order to optimize the
1.0 1.0 performance of the method.
0.8 0.8 Both estimators (kernel and OLP) can be used in
0.6 0.6 optimization procedures as required in several
0.4 0.4 financial applications. In particular, we can use the
0.2 0.2 conditional expectation estimators to i) order the
1 0 1 2 3 4 5 1 0 1 2 3 4 5 investors’ choices or ii) evaluate and exercise
Fig. 5. Estimated (blue) and true (red) distribution arbitrage strategies in the market (see [10]). In the
functions for different values of k in the Student’s t case. first case, we know that any non satiable risk averse
investor prefers the future wealth WT at time T with
respect the wealth Wt at time t (t<T), only if

(,661  9ROXPH


:6($675$16$&7,216RQ0$7+(0$7,&6 7RPPDVR/DQGR6HUJLR2UWREHOOL

‫ܧ‬ሺܹ௧ ȁ்ܹ ሻ ൑ ்ܹ ƒǤ •ǤǤ Thus, by using the proposed 2003 Winter Simulation Conference, S. Chick,
approximation of the conditional expected value, we P. J. Sánchez, D. Ferrin, and D. J. Morrice, eds,
can attempt to order and optimize the choices of non 1003, pp.383-391.
satiable risk averse investors, as suggested by [8]. In [7] A.N. Avramidis, A cross-validation approach
the second case, as a consequence of the to bandwidth selection for a kernel-based
fundamental theorem of arbitrage, we know that estimate of the density of a conditional
there exists no arbitrage opportunity in the market if expectation, Proceedings of the 2011 Winter
there exists a risk neutral martingale measure under Simulation Conference, S. Jain, R. R. Creasey,
which the discounted price process results to be a J. Himmelspach, K. P. White, and M. Fu, eds.,
martingale. So, if we consider the augmented 2011, pp.439-443.
filtration ሼԱ௧ ሽ௧ஹ଴ associated to the Markov price [8] S. Ortobelli, F. Petronio, T. Lando, A portfolio
process ሼܺ௧ ሽ௧ஹ଴ , then we obtain ‫ ݏ׊‬൑ ‫ ݐ‬that return definition coherent with the investors
‫ܧ‬ሺܺ௧ ȁԱ௦ )=‫ܧ‬ሺܺ௧ ȁܺ௦ ). Therefore, the conditional preferences, under revision in IMA-Journal of
expected value estimator and the fundamental Management Mathematics.
theorem of arbitrage can be used to estimate the risk [9] M. Roth, On the multivariate t distribution,
neutral measure and to optimize arbitrage strategies Technical report from Automatic Control at
in the market. Linköpings universitet, 2013.
[10] S. Ortobelli, T. Lando, On the use of
Acknowledgements conditional expectation estimators, New
Developments in Pure and Applied
This paper has been supported by the Italian funds Mathematics, ISBN: 978-1-61804-287-3, 2015,
ex MURST 60% 2014, 2015 and MIUR PRIN pp. 244-246.
MISURA Project, 2013–2015, and ITALY project [11] W. Hardle, M. Muller, S. Sperlich, A.
(Italian Talented Young researchers). The research Werwatz, Nonparametric and semiparametric
was supported through the Czech Science models. Springer Verlag, Heidelberg, 2004.
Foundation (GACR) under project 15-23699S and [12] V. A. Epanechnikov, Non-parametric
through SP2015/15, an SGS research project of estimation of a multivariate probability density,
VSB-TU Ostrava, and furthermore by the European Theory of Probability and its Applications, vol.
Social Fund in the framework of 14, no. 1, 1965, pp. 153-158.
CZ.1.07/2.3.00/20.0296. [13] T. Lando, L., Bertoli-Barsotti, Statistical
Functionals Consistent with a Weak Relative
Majorization Ordering: Applications to the
References: Mimimum Divergence Estimation, WSEAS
[1] E. A. Nadaraya, On estimating regression, Transactions on Mathematics, Vol 13, 2014,
Theory of Probability and its Applications, vol. pp. 666-675.
9, no. 1, 1964, pp. 141-142. [14] S. T. Rachev, Probability metrics and the
[2] G. S. Watson, Smooth regression analysis, stability of stochastic models. Wiley, New
Sankhya, Series A, vol. 26, no. 4, 1964, pp. York, 1991.
359-372. [15] C. Raykar, R. Duraiswami, Fast optimal
[3] D. Ruppert, M. P. Wand, R. J. Carroll, bandwidth selection for kernel density
Semiparametric Regression. Cambridge estimation, Proceedings of the Sixth SIAM
University Press, 2003. International Conference on Data Mining,
[4] G, Mzyk, Generalized kernel regression for the 2006, pp. 522-526.
identification of Hammerstein series, [16] D.V. Scott, On Optimal and Data-Based
International journal of applied mathematics Histograms, Biometrika,Vol. 66, No. 3, 1979,
and computer science, Vol. 17., No. 2, 2007, pp. 605-610.
pp. 189.197. [17] H. A. Sturges, The Choice of a Class Interval,
[5] S. Ortobelli, F. Petronio, T. Lando, Portfolio Journal of the American Statistical Association,
problems based on returns consistent with the Vol. 21, No. 153, 1926, pp. 65-66.
investor's Preferences, Advances in Applied and [18] E. Bura, A. Zhmurov, V. Barsegov,
Pure Mathematics, ISBN: 978-960-474-380-3, Nonparametric density estimation and optimal
2014, pp. 340-347. bandwidth selection for protein unfolding and
[6] S. G. Steckey, S.G. Henderson, A kernel unbinding data, The Journal of chemical
approach to estimating the desity of a physics, Vol. 130, 2009, article no. 015102.
conditional expectation, Proceedings of the

(,661  9ROXPH

You might also like