MartinBarraganLilloRomo JAS

Journal of Applied Statistics
Fo
Functional Boxplots based on epigraphs and hypographs.

rP
Journal: Journal of Applied Statistics

ee
Manuscript ID: CJAS-2013-0638.R1
Manuscript Type: Original Article

rR
Date Submitted by the Author: n/a
Complete List of Authors: Martin-Barragan, Belen; The University of

Edinburgh, Business School
Lillo, Rosa; Universidad Carlos III de Madrid,
ev
Dpto. de Estadistica
Romo, Juan; Universidad Carlos III de Madrid,
Dpto. de Estadistica
ie
Functional data, Box and Whisker Plots, Data

Keywords: Visualization, Functional Data Orderings,
Functional quartiles
w
<a
href="http://www.ams.org/mathscinet/msc/msc2010.html"
62-09, 62G30 , 62-07
target="_blank">2010 Mathematics Subject
On
Classification</a>:
ly
URL: http://mc.manuscriptcentral.com/cjas
Page 1 of 25 Journal of Applied Statistics
1
2
3
4
5
6
7
8
9
10
11
12
13
Functional Boxplots based on epigraphs and
14
hypographs.
Fo
15
16
17
18
Belen Martin-Barragan
rP
19 University of Edinburgh Business School
20
21
email: Belen.Martin@ed.ac.uk
22
Rosa E. Lillo
ee
23
24 Universidad Carlos III de Madrid
25
26 email: rosaelvira.lillo@uc3m.es
rR
27
28 Juan Romo
29
30
Universidad Carlos III de Madrid
email: juan.romo@uc3m.es
ev
31
32
33 July 9, 2014
34
ie
35
36
37 Abstract
w
38
39 Functional boxplot is an attractive technique to visualize data that
40 come from functions. We propose an alternative to the functional
On
41 boxplot based on depth measures. Our proposal generalizes the usual

42 construction of the box-plot in one dimension related to the down-
43
44
upward orderings of the data by considering two intuitive pre-orders
45 in the functional context. These orderings are based on the epigraphs
ly
46 and hypographs of the data that allow a new definition of functional

47 quartiles which is more robust to shape outliers. Simulated and real
48 examples show that this proposal provides a convenient visualization
49 technique with a great potential for analyzing functional data and
50
illustrate its usefulness to detect outliers that other procedures do not
51
52 detect.
53
54
55
56
57
1
58
59
60
Journal of Applied Statistics Page 2 of 25
1
2
3
4
5
6
7
8
9
10
11 Keywords: Functional data, Box and Whisker plot, Functional quartiles,
12
13
Functional Data Orderings, Data visualization
14
Fo
15
16 1 Introduction
17
18 Visualization techniques are very useful in data analysis. Their aim is to
rP
19 summarize information into a graph or a plot. A natural visualization tool
20
21 for univariate data is the boxplot, which summarizes a set of statistical mea-
22 surements by plotting the quartiles, the range and the outliers. As it is
ee
23 known, the data have to be ordered to build the boxplot. Therefore, a mean-
24 ingful way of sorting the data is required to extend the concept of boxplot to
25
26 complex data such as functional data, where the observed units are functions
rR
27 [15, 17, 7]. Functional data analysis are being applied in many fields includ-
28 ing biology, meteorology, medicine and speech recognition. A wide range of
29 applications and techniques can be found in [16]. Visualization methods for
30
functional data have been proposed in recent years. In particular, different
ev
31
32 extensions of the univariate boxplot have been already introduced [10, 20].
33 Two different ideas to adapt univariate boxplots to functional data are
34 addressed in [10]: functional bagplots and functional highest density region
ie
35
36
boxplots. Both techniques require the reduction of the data into the first two
37 principal components, and visualization is only based on this information. To
w
38 overcome this fact, [20] proposed a visualization tool based on the notion of
39 depth; that is, given a data set, the depth of an observation measures its
40
centrality with respect to remaining data. Thus, a depth measure provides
On
41
42 a center-outward ordering of the data. Different notions of depth have been
43 proposed for multivariate data [22, 11, 19, 12, 3, 18, 2], among others and for
44 functional data [8, 4, 3, 13, 14]. The functional boxplot proposed in [20] sorts
45
ly
the functions according to the modified band-based depth (MBD) [13]. The
46
47 band-based depth (BD) considers that a function is deep if it is contained
48 in many bands among all the bands that can be formed with functions of
49 the sample. MBD is a variant of BD that considers the proportion of the
50 curve laying in the band. The main characteristics of the univariate boxplot
51
52 are reflected in the functional boxplot: a central function that represents the
53 median; a central 50% region and a fence to detect outliers. This procedure
54
55
56
57
2
58
59
60
1
2
3
4
5
6
7
8 differs from the univariate case where the data are sorted from the lowest
9 to the largest and the first and third quartiles are necessary to define the
10
11 box. The fences to detect outliers are obtained by inflating the central box
12 by 1.5 times the range of the box. The factor 1.5 comes directly from the
13 common rule in the univariate case but it can be adjusted depending on the
14 characteristics of the data as pointed out in [21] where a simulation-based
Fo
15
16 method is proposed to choose the inflating factor.
17 Since a natural order in the functional context does not exist, we propose
18 two pre-orders to define functional quartiles different to the depth measures
rP
19 that provide a center-outward ordering. These orderings are based either on
20
21
epigraphs or hypographs. Roughly speaking, the epigraph of a function is
22 the area above its graph, see e.g. [9, 14]. The index of a function is low if
ee
23 the function is contained in the epigraph of many functions of the sample.
24 An analogous index can be defined using the hypograph. We show how the
25
26
combination of both indexes allows us to define quartiles and therefore, a
rR
27 box-plot that is robust to outliers (specifically shape outliers). This is one

28 of the contributions of the paper, as we will show in Section 5. Hence our
29 proposal avoids one of the weakness of the boxplot defined in [20] that may
30
be affected by shape outliers mainly in the visualization of the central box.
ev
31
32 Robustness is one the main desired properties of boxplots in the univariate
33 case, as it is commonly used to get a first glimpse of the data and detect
34 outliers. The robustness of our proposal is shown in both real and simulated
ie
35 data, as well as its usefulness to detect outliers.

36
37 The article is organized as follows. The ordering indexes are introduced
w
38 in Section 2 and analyzed in Section 3. Both are combined in Section 4 to

39 define the functional quartiles and construct a boxplot. Sections 5 and 6
40 illustrate the new boxplot and compare it with those options based on depth
On
41
42 measures with simulations and real data. Finally, some conclusions are given
43 in Section 7.
44
45
ly
46 2 Ordering functional data

47
48
49
In functional data analysis, the observations are real functions yi (t), i =
50 1, 2, . . . , n, t ∈ I, where I is an interval in IR. Once we fix a criterium to order
51 the sample curves, y[i] will denote the i−th lowest curve. There is a natural
52 ordering for univariate data, but the choice is not unique for multivariate or
53
functional data.
54
55
56
57
3
58
59
60
1
2
3
4
5
6
7
8 The importance and difficulties of ordering functional data is highlighted
9 for instance in [10]. Similar difficulties exist for multivariate data. A four-
10
11 fold classification of possible ordering principles is proposed in [1]: marginal
12 ordering, reduced (aggregate) ordering, partial ordering and conditional (se-
13 quential) ordering. Each of these multivariate orderings have their own weak-
14 nesses. For instance, marginal orderings lose multivariate information about
Fo
15
16 correlations and data structure. A special kind of multivariate orderings has
17 attracted a lot of attention since Barnett’s paper. The idea is to sort the
18 data from the most centered to the most outward [22]. These notions are
rP
19 better known as data depths and play an important role in concepts that
20
21
involve centrality, such as the median. See e.g. [11, 19, 12] and [3] and the
22 reviews [18] and [2]. Extensions of data depth to functional data have been
ee
23 analyzed over the last decade [8, 4, 3]. A new definition in which the graph
24 of a function plays a key role is proposed in [13].
25
26
In this paper we are interested in a functional data ordering that is not
rR
27 based on a center-outward order, but on an down-upward order. Hence, we

28 need an appropriate index that expresses this ordering. A simple choice could
29 be, for instance, the integral of its absolute value over the interval. However,
30
this definition does not take into account the form of the other functions in
ev
31
32 the sample. Our proposal is inspired in [13] where a natural notion of depth
33 is based on bands of curves and we adapt the ideas proposed in [14, 9] to
34 define an index based on epigraphs and hypographs of curves. The graph of
ie
35 a function f in I is
36
37 G(f ) = {(t, f (t)), t ∈ I}.
w
38 The epigraph and hypograph of a function f are defined, respectively, as

39
40 epi(f ) = {(t, y) ∈ I × IR, y ≥ f (t)}, (1)
On
41
42
hyp(f ) = {(t, y) ∈ I × IR, y ≤ f (t)}. (2)
43 Let X (·) denotes the indicator function. We propose two indexes to order
44
45 functional data:
ly
46 • Epigraph index:
47
n
48 X X (G(yi ) ⊆ epi(f ))
49 EI(f ) = 1 − BE(f ) = 1 −
50 i=1
n
51
52
A function f has a high index if few functions are contained in its
53 epigraph epi(f ). Note that BE(f ) is the proportion of functions in the
54 sample that are contained inside the epigraph of f.
55
56
57
4
58
59
60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Fo
15
16
17
18
rP
19
20
21
22 Figure 1: Epigraph (left) and hypograph (right) for function f.
ee
23
24
25 • Hypograph index:
26
rR
n
27 X X (G(yi ) ⊆ hyp(f ))
28 HI(f ) =
n
29 i=1
30
A function f has a high index if many functions are contained in its
ev
31
32 hypograph hyp(f ). Note that HI(f ) is the proportion of functions in
33 the sample that are contained inside the hypograph of f.
34
ie
35 Figure 1 shows a simple example with n = 3 curves in the sample and

36
37 another curve f. The yellow area in the left graphic is the epigraph of f
w
38 whereas the one in the right is its hypograph. Two of the three remaining
39 functions are contained in epi(f ), hence, EI(f ) = 1 − 32 = 1 − 0.67 = 0.33.
40 No function is contained in the hypograph, so HI(f ) = 03 = 0. Inspired
On
41
42
on [13], one could also define the modified versions of these measures: the
43 modified epigraph index (MEI) and the modified hypograph index (MHI),
44 analogously to the MBD, taking into account the proportion of I where curve
45
ly
yi is inside epi(f ) (respectively, the hyp(f )). The MEI of curve f considers
46
47
also for how long y3 is in the epigraph, that is an interval of length 0.25.
48 Hence MEI(f ) = 1 − 2+0.25 3
= 0.25. Analogously MHI(f ) = 0.753
= 0.25.
49 Note that MEI(f ) and MHI(f ) only differ in the case in which function f
50 coincides with yi in some subset of I with positive Lebesgue measure, for
51
some i = 1, 2, . . . , n.
52
53
54
55
56
57
5
58
59
60
1
2
3
4
5
6
7
7
8
9 6
10 5
11 4
12 3
13 2
14 1
Fo
15 0
16
−1
17
−2
18
rP
19 −3
0 0.2 0.4 0.6 0.8 1
20
21 Figure 2: Example of 5 curves among 100 generated by Model 1.
22
ee
23
24 3 Analysis of the indexes EI and HI
25
26 This section describes the behavior of the indexes defined in Section 2. We
rR
27
28
consider both EI and HI, and generate the data from a model previously
29 considered in [20], [13] and [8]. Following [20], Model 1 is described as follows:
30
ev
31
32
33
• “Model 1 is Xi (t) = g(t) + ei (t), i = 1, 2, . . . , n, with mean
34 g(t) = 4t, t ∈ [0, 1], and where ei (t) is a stochastic Gaussian
ie
35 process with zero mean and covariance function γ(s, t) =

36 exp (−|s − t|).”
37
w
38 A set of n = 100 curves is generated using Model 1. The simulated

39
40 functions can be seen in Figure 2. Five randomly selected curves are shown
On
41 in color whereas the remaining ones are shown in grey. The value of each
42 index EI and HI is given in Table 1 (columns 3-4). Among the five selected
43 curves, the green and the cyan, represented as solid curves, are the most
44
45
extreme since their indexes are the highest and the lowest for both EI and
ly
46 HI. For the other three curves, the graphics suggest that no curve can be
47 said to be higher or lower than the other. Indeed, each index is very similar
48 for the three functions and each one yields a different ordering for them.
49
50
Although the behavior of these three functions is coherent with the fact that
51 there exists no unique order for functions, the behavior of the green and the
52 cyan curves illustrate that the proposed indexes are adequate to sort the
53 functions.
54
55
56
57
6
58
59
60
1
2
3
4
5
6
7
8 color form EI HI BD MBD
9 red dotted 0.9495 0.1818 0.0382 0.4431
10
11
green solid 0.8889 0.0404 0.0289 0.4547
12 blue dotted 0.9798 0.1313 0.0253 0.4496
13 purple dotted 0.9899 0.1313 0.0226 0.3201
14 cyan solid 1.0000 0.6162 0.0200 0.1219
Fo
15
16
17
Table 1: Values of the proposed indexes and the band-based depths for five
18
curves randomly selected among the curves generated by Model 1.
rP
19
20
21
22 The two last columns of Table 1 show the values of the band-based depths
BD and MBD. By chance, the green function seems to be very central, so
ee
23
24 MBD gives it the maximal value. For BD, the red function is the most central
25 one. In general, sorting the functions by the depth value (using either BD or
26
MBD) gives a very different result than sorting them by one of the proposed
rR
27
28 indexes EI or HI, that is, the ordering based on the proposed indexes will
29 sort the functions in a down-up sense, whereas the depth measures will do it
30 central-outward.
ev
31
32
Figure 3 shows all the curves in different grades of gray depending on the
33 proposed indexes. The 10% lowest are in black, the next 10% in very dark
34 gray, the next 10% in slightly lighter gray, and so on. Hence, a light value
ie
35 means high EI (respectively HI). It is clear in these plots how the higher the
36
curve, the higher its index. This is true for both EI and HI, showing that
37
w
38 they are useful to sort the functions in a down-up sense.

39
40
On
41 4 Functional boxplots based on epigraphs and

42
43 hypographs.
44
45
ly
This section provides the construction of the functional boxplot based on the
46
47
indexes defined in Section 2 which are based on the epigraphs and hypographs
48 of the functions. In the univariate case, the quartiles are essential elements in
49 the construction of a boxplot, since they both define the central box and are
50 crucial for the fences that indicate outliers. In [20], the central box considered
51
as a central region plays the role of the inter-quartile-range of the univariate
52
53 boxplot. We propose a definition of functional quartile that conveniently
54 combines the down-upward orderings defined in Section 2. The definition of
55
56
57
7
58
59
60
1
2
3
4
5
6
7
Epigraph Index Hypograph Index
8 10 10
9
10
11
12 5 5
13
14
Fo
15
0 0
16
17
18
rP
19 −5 −5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
20
21
22
Figure 3: Data generated with Model 1. The tone of gray indicates the
index. 10% functions with lowest index are black, 10% next functions with
ee
23
24 lowest index are darker, and so on.
25
26
rR
27 functional quartiles is carefully designed to be robust.

28 The main elements of a univariate boxplot are: the median, the first and
29
third quartiles, which define the limits of the box, the outliers, represented
30
by isolated points; and the minimum and maximum among the nonoutlying
ev
31
32 data, which limit the whiskers. Once the data are ordered, the first (re-
33 spectively, third) quartile is computed as the datum that leaves below (resp,
34 above) it 25% of the observations in the sample. The construction of the func-
ie
35
36 tional boxplot proposed by [20] has a similar structure as in the univariate
37 case: the functional median, the central box, the outliers and the minimum
w
38 and maximum non-outliers. The main difference of our approach with [20]
39 is how the central box is considered. The central box that we propose is
40
based on the functional quartiles instead of the central region provided by
On
41
42 the depth measure.
43 An example of functional boxplot is illustrated in Figure 4. The first and
44 third functional quartiles are represented as solid black curves. They define
45
ly
46
a box that is represented in cyan. Once the functional quartiles are com-
47 puted, our outlier detection step is a direct extension of the univariate case.
48 Note that in the functional case, the first and third quartiles are functions
49 q1 (t) and q3 (t). The IQR is a function IQR(t) = q3 (t) − q1 (t). Following the
50
1.5 IQ empirical criterion, widely extended in univariate data analysis, two
51
52 fences are computed to find the outliers. The lower fence is computed as
53 f1 (t) = q1 (t) − 1.5(IQR(t)), and the upper fence f3 is computed analogously.
54
55
56
57
8
58
59
60
1
2
3
4
5
6
7
8 Model 2
3.5
9
10
11 3
12
f3
13 2.5
14
Fo
15
16 2
17 q3
q1
18 1.5 f1
rP
19
20
21 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
22
ee
23
24 Figure 4: Data generated with Model 2. Construction of functional boxplot
25
26
based on epigraphs and hypographs.
rR
27
28
Outliers are detected as the functions g for which there exists t ∈ I such that
29
30 g(t) < f1 (t) or g(t) > f3 (t), i.e. the functions that go out of the fences. In
Figure 4, outliers are represented as dotted red curves. Finally the minimal
ev
31
32 (rep. maximal) nonoutlying function is computed as the lower (resp. upper)
33 envelope of all the functions that are not outliers. They are represented in
34
ie
35 Figure 4 as solid blue curves and corresponds to the whiskers of the univari-
36 ate plot. Code for computation of the functional boxplot and the indexes
37 proposed in Section 2 are available under request.
w
38 The original proposal in [20] defines the central box as the convex envelope
39
40 of the 50% deepest functions. As we have pointed out, our proposal is based
On
41 on the functional quartiles provided by the orderings defined in Section 2 but

42 the definitions are not obvious and deserves further attention. We now focus
43 on the first quartile, the third quartile is analogous.
44
45
We have two indexes HI and EI that sort the functions down-upward.
ly
46 A first option would be to define the first quartile as the boundary of the
47 convex envelope of the 25% lowest functions using either EI or HI. However,
48 preliminary experiments show that the function defined by such a boundary
49
50
is higher than what might be expected for a quartile. This can be observed
51 in Figure 5-(a), where such boundary is around the middle of the set of
52 functions. This is true for both HI (dotted) and EI (solid). An alternative
53 and symmetric proposal is to consider the convex envelope of the 75% highest
54
55
56
57
9
58
59
60
1
2
3
4
5
6
7
8 (a) First quartile
9 8 8
10
11 6 6
12
13 4 4
14
Fo
15 2 2
16
17 0 0
18
rP
19 −2 −2
20
21 −4 −4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
22
ee
23 Figure 5: Illustration of computing the first quartile
24
25
26 functions. An analogous reasoning for the third quartile suggest to use the
rR
27
28
75% lowest functions.
29 We may use the EI or HI index to sort the functions. When using it to
30 compute the first quartile, EI is more sensitive than HI to shape outliers,
ev
31 whereas if we use it to compute the third quartile, HI is more sensitive.

32
This is illustrated in Figure 5-(b). Let f be an outlier such as the function
33
34 represented by the thick dashed blue curve. The other functions in the sample
ie
35 are represented by black thin curves. The 75% functions with highest HI
36 are represented using solid lines, whereas the remaining 25% are represented
37
w
using dashed lines. We can see that f contains no other curve in its epigraph.
38
39 Hence, EI(f ) = 1. If we use the epigraph index to compute the first quartile,
40 outlier f would be part of the 75% highest functions used to compute such
On
41 quartile and the quartile would be affected by it. On the contrary, if we use
42 the hypograph index, the quartile would not get affected by the outlier f
43
44 because HI(f ) = 0 (hyp(f ) contains no other curve in the sample). This is
45 true for shape outliers with HI(f ) equal or close to zero. However, other type
ly
46 of outliers may have HI equal to one. In this case, such an outlier g belongs
47 to the 75% highest functions, but it does not affect the quartile. Indeed,
48
49 HI(g) = 1 means that all other functions in the sample belong to hyp(g).
50 Consequently g is equal to the point-wise maximum of all the functions of
51 the dataset. Since the first quartile is computed as the lower boundary of
52 the envelope, the maximum of all the functions do not cause distortions in
53
54
such a boundary.
55
56
57
10
58
59
60
1
2
3
4
5
6
7
8 An analogous argument can be followed for the third quartile, yielding
9 that the epigraph index should be used instead of the hypograph index.
10
11 Based on these arguments, the indexes defined in Section 2 can be used to
12 overcome irregularities in the boxplot due to shape outliers. As consequence
13 of the previous discussion, the functional quartiles are defined as follows.
14
Fo
15 • The first functional quartile is the lower boundary of the envelope
16
determined by the 75% of functions with highest HI.
17
18
• The third functional quartile is defined as the upper boundary of
rP
19
20 the envelope determined by the 75% of functions with lowest EI.
21
22 Note that these definitions use both HI and EI. Definitions using only
ee
23 one index might be more intuitive but easily fail to detect shape outliers,
24 yielding to irregularities in the boxplot.
25
26
rR
27
28 5 Simulation studies
29
30 We first present experiments with data generated according to some models
ev
31 proposed in the literature. Since our proposal is compared with that defined
32
33
in [20], we use the same models provided in that paper which also appeared
34 in [8] and [13].
ie
35 Model 1 has already been defined in Section 3. This is the basic one
36 without contamination. Models 2-4 are modifications of it but containing
37
w
magnitude outliers, while Model 5 illustrates shape contamination. They are

38
39 described as follows:
40
On
41
42
• “Model 2 includes a symmetric contamination: Yi (t) = Xi (t)+
43
44 ci σi K, where Xi (t) follows Model 1, ci is 1 with probability q
45 and 0 with probability 1 − q, K is a contamination size con-
ly
46 stant, and σi is a sequence of random variables independent

47 of ci taking values 1 and -1 with probability 1/2.
48
49 • Model 3 is partially contaminated: Yi (t) = Xi (t) + ci σi K, if
50 t ≥ Ti and Yi (t) = Xi (t) otherwise, where Ti is a random
51
52
number generated from a uniform distribution on [0, 1].
53
54
55
56
57
11
58
59
60
1
2
3
4
5
6
7
8 • Model 4 is contaminated by peaks: Yi (t) = Xi (t) + ci σi K,
9 if Ti ≤ t ≤ Ti + ` and Yi (t) = Xi (t) otherwise, where Ti is
10
11 a random number generated from a uniform distribution on
12 [0, 1 − `].
13
14
• Model 5 considers shape contamination with different pa-
rameters in the covariance function γ(s, t) = k exp −c|t − s|µ .
Fo
15
16 The basic Model 1, Xi (t) = g(t) + e1i (t), has parameter val-
17 ues k = 1, c = 1, µ = 1 for the covariance function of e1i .
18
To generate irregular curves, let Yi (t) = g(t) + e2i (t), where
rP
19
20 e2i (t) is a Gaussian process with zero mean and covariance
21 function parameters k = 8, c = 1, µ = 0.2. The contami-
22 nated model is given by Zi (t) = (1 − ci )Xi (t) + ci Yi (t), i =
ee
23
1, 2, . . . , n, where ci is 1 with probability q and 0 with prob-
24
25 ability 1 − q.”
26
rR
27 In the simulation studies, n = 100 curves are generated with parameters

28 q = 0.1, K = 8, and ` = 3/49. We compare our proposal based on epigrahs
29 and hypographs (hereafter EH-fb) with the functional boxplots proposed in
30
[20], which uses the MBD to compute the central region. [20] proposes the
ev
31
32 use of MBD instead of BD because it is more flexible. We also consider
33 an analogous version that uses BD. Hereafter we will refer to these two ap-
34 proaches as MBD functional boxplot (MBD-fb) and BD functional boxplot
ie
35 (BD-fb), respectively. The three different approaches considered (EH-fb,

36
37 BD-fb and MBD-fb) are shown in Figure 6 (first, second and third columns,
w
38 respectively). Each row corresponds to a different data generating model.

39 The number of false positives (FP), i.e. nonoutliers erroneously detected as
40 outliers, and false negative (FN), i.e. undetected outliers, is also shown.
On
41
42 The three methods perform similarly for Model 1. In our simulation,
43 BD-fb seems inadequate for Model 2. The central region is too large and
44 consequently, no outlier is detected. Indeed a closer look to the picture revels
45
ly
that one of the outliers actually belongs to the central region. A reason for
46
47
this might be that it is completely inside many bands as compared with the
48 non-outlying data. The irregularities of the curves make unlikely for them
49 to stay completely inside the bands formed by other two curves. MBD-fb,
50 as proposed in [20], does not present this problem. However, in Models 3
51
52
and 4, MBD-fb erroneously considers one or several outliers as part of the
53 central box, hence the boxplot has a strange form and it is not able to detect
54 the outliers. Since MBD takes into account the proportion of the interval I
55
56
57
12
58
59
60
1
2
3
4
5
6
7
8 where a curve is inside a band, the outliers that differ from normal curves in a
9 small subinterval are difficult to detect. This happens for instance in Models
10
11 3 and 4, where EH-fb overcomes this problem. The three functional boxplots
12 look very similar in Model 5. The central box seems a good representation of
13 the data in the three cases, MBD-fb providing a slightly thinner box. This
14 makes EH-fb and BD-fb to be more conservative for outlier detections, giving
Fo
15
16 1 and 2 false negatives, respectively.
17 In this randomly generated example, EH-fb provides a reliable boxplot,
18 whose central box resembles the form of the non-outlying data for the five
rP
19 models. Band-based versions of functional boxplot fails to represent the
20
21
central box in at least one of the models.
22 In Figure 6 we can only show the behavior of one random generation of
ee
23 the set of functions. In order to analyze if the behavior shown in the graphics
24 is common or not, we have repeated the simulation experiment 1000 times.
25
26
We consider the worst possible scenario, that is, one of the outliers belongs
rR
27 to the central box. This behavior produces an important distortion in the

28 central box, which, in these cases, becomes a bad representation of the form
29 of the functions. Table 2 shows the percentage of runs where this behavior
30
is present, i.e. at least one outlier is part of the central box. It is remarkable
ev
31
32 how, for Model 4, MBD-fb erroneously considers at least one outlier as part
33 of the central box in 98.8% of the runs. This is the worst possible case.
34 Another two very bad cases are the behavior of BD-fb for Model 2 and
ie
35 MBD-fb for Model 3 where 71.3% and 57.7% of the runs give this kind of
36
37 bad representations of the central box. For the EH-fb, the worst case is
w
38 Model 3, where it happens for 39.4% of the runs. This analysis shows the
39 potential of EH-fb as a robust variant of the band-based functional boxplots.
40
On
41 model EH-fb BD-fb MBD-fb

42
43 1 0% 0% 0%
44 2 0% 71% 0%
45 3 39% 12% 58%
ly
46 4 19% 5% 99%
47
48 5 13% 7% 0%
49
50
51 Table 2: Number of runs where at least one outlier erroneously belongs to
52 the central region.
53
54
55
56
57
13
58
59
60
1
2
3
4
5
6
7
Model 1 Model 1 Model 1
8 8 8 8
9 6 6 6
10
4 4 4
11
12 2 2 2
13 0 0 0
14 −2 −2 −2
Fo
15 −4 −4 −4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
16 FP=0 FN=0 FP=0 FN=0 FP=0 FN=0
17 Model 2 Model 2 Model 2

15 15 15
18
rP
19 10 10 10
20 5 5 5
21
22 0 0 0
ee
23 −5 −5 −5
24
−10 −10 −10
25 0 0.2 0.4 0.6
FP=0 FN=0
0.8 1 0 0.2 0.4 0.6
FP=0 FN=7
0.8 1 0 0.2 0.4 0.6
FP=0 FN=0
0.8 1
26
rR

27 15 15 15
28 10 10 10
29
5 5 5
30
ev
31 0 0 0
32 −5 −5 −5
33
34 −10
0 0.2 0.4 0.6 0.8 1
−10
0 0.2 0.4 0.6 0.8 1
−10
0 0.2 0.4 0.6 0.8 1
ie
FP=0 FN=0 FP=0 FN=0 FP=0 FN=2

35
36 15 15 15
37
w
10 10 10
38
39 5 5 5
40 0 0 0
On
41
−5 −5 −5
42
43 −10
0 0.2 0.4 0.6 0.8 1
−10
0 0.2 0.4 0.6 0.8 1
−10
0 0.2 0.4 0.6 0.8 1
FP=0 FN=0 FP=0 FN=1 FP=0 FN=5
44
45
ly
10 10 10
46
47 5 5 5
48
0 0 0
49
50
−5 −5 −5
51
52 −10 −10 −10
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
53 FP=0 FN=1 FP=0 FN=2 FP=0 FN=0
54
55 Figure 6: EH-fb (left), BD-fb (center) and MBD-fb (right). Simulated data.
56
57
14
58
59
60
1
2
3
4
5
6
7
8 6 Real data
9
10 In this section, we analyze two datasets from weather applications: Rain
11
12 Australia and US Precipitations. The first one comes from [5, 6] and
13 analyzes series of rainfalls in Australia. The next example was included in
14 [20] and analyzes precipitations in different regions of the United States.
Fo
15
16
17 6.1 Rain Australia Dataset
18
rP
19 In the first real example, Rain Australia, we consider 191 rainfall curves
20 from different weather stations in Australia. This dataset has been used
21
before in [5, 6] and can be downloaded from
22
http://dss.ucar.edu/datasets/ds482.1. Each curve yi (t) represents the
ee
23
24 averaged rainfall in station i at day t with t ∈ [1, 365] . The raw data are
25 observed daily and contain some missing values. We try to use the data with
26
as little pre-processing as possible. For a missing value, the next available
rR
27
28 value contains the accumulated fall of the previous missed days. We have
29 considered this accumulated fall equally distributed along the consecutive
30 missed days.
ev
31 For each station and day, the averaged rainfall is computed for all years
32
33 the station was operative. The dataset contains data from year 1840 until
34 1990. Over these years there have been changes in the location of the stations.
ie
35 Hence, none of the stations have been operative over the whole period. There
36 is one particular case of a station having been operative only during less than
37
w
38 two years, whereas the others have been operative between 17 and 126 years.
39 Since yi (t) represents the rainfall at station i and day t averaged over the
40 years with available data; the consequence is that some of the curves are
On
41 smoother than others.

42
43
Figure (7) provides a plot of the 191 curves (top-left graphic), and the
44 EH-fb (top-right), BD-fb (bottom-left) and MBD-fb (bottom-right).
45 The proportion of detected outliers is given under each graphic. The
ly
46 three boxplots look very different. EH-fb detects two outliers. One of them
47
corresponds to a station where poor information is available. For this station,
48
49 the database only contains one complete year of data and two months of
50 other years. Hence, the function of averaged daily rainfall is very different to
51 other stations where more years are available. Indeed, it has one day where
52
the curve almost doubles the maximum value of the remaining functions.
53
54 The second outlier behaves different from the remaining curves (see top-
55
56
57
15
58
59
60
1
2
3
4
5
6
7
8 left graphic). The rainfall around September-October is much higher than
9 in other stations. This station is situated in Queenstown, in the island of
10
11 Tasmania.
12 BD-fb also detects two outliers. They correspond to stations with high
13 rainfalls in winter and low rainfalls in summer. This behavior does not
14 seem very different of the data plotted on the top-left graphic. The station
Fo
15
16 where very few data are available and Queenstown station are not detected
17 as outliers and Queenstown station even belongs to the central box.
18 MBD-fb provides a completely different picture since 14.66% of the curves
rP
19 are detected as outliers. In this case, MBD-fb detects so many outliers and
20
21
they are so similar to the data that if they were plotted as in the other
22 graphics, they would cover the central box and make the boxplot difficult to
ee
23 see. In this graphic we have chosen to plot them behind the boxplot, avoiding
24 it be masked. Note that we have not performed any smoothing on the curves.
25
26
Smoothing the curves before plotting the functional boxplot could give very
rR
27 different results, specially for MBD-fb. We believe that functional boxplots,

28 along many visualization techniques, are often used in order to get a first
29 glance to the data, as a step previous to a more thoughtful analysis. Hence,
30
it is important to have good visualization methods that work well with as
ev
31
32 few preprocessing as possible. This example illustrates that EH-fb is a good
33 option in such a setting.
34
ie
35
36 6.2 US Precipitations Dataset
37
w
38 The last example, US Precipitations, corresponds again to rainfall obser-

39 vations. In this example we use a dataset where the curves have been pre-
40 viously smoothed. Each curve represents total annual rainfalls from 1895 to
On
41
42
1997 at a station in the United States. The original data is provided by the In-
43 stitute for Mathematics Applied to Geosciences (http://www.image.ucar.
44 edu/Data/US.monthly.met/. A preprocessing step to smooth the curves is
45
ly
suggested in [20] and data after this preprocessing can be found in Sun’s
46
webpage (http://www.stat.tamu.edu/~sunwards/publication.html)1 A
47
48 functional boxplot is computed for every of the nine climatic regions defined
49 by the National Climatic Data Center.
50
1
51 More specifically, the data are in the zip file datasets under the name of the article
52 Functional Boxplo ts. The files called sfitreg1cy.data and similar contain the curves
53 for different regions.
54
55
56
57
16
58
59
60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Fo
15
16 (a) Rain Australia (b) EH functional boxplot
17 500 500
18
rP
19 400 400
Averaged daily rainfall

20
21 300 300
22
ee
23
200 200
24
25
26 100 100
rR
27
28 0
50 100 150 200 250 300 350
0
50 100 150 200 250 300 350
29 1.05% outliers
30
(c) BD functional boxplot (d) MBD functional boxplot
ev
31 500 500
32
33
400 400
34
ie
35
36 300 300
37
w
38 200 200
39
40 100 100
On
41
42 0 0
43 50 100 150 200 250 300 350 50 100 150 200 250 300 350
1.05% outliers 14.66% outliers
44
45
ly
46 Figure 7: Comparing Boxplots. Historical daily precipitations in Australia.

47
48
49
50
51
52
53
54
55
56
57
17
58
59
60
1
2
3
4
5
6
7
8 Figure 8 show the curves, Figures 9-11 show EH-fb, BD-fb and MBD-
9 fb. BD-fb produces boxplots that are very similar to EH-fb. The main
10
11 difference is that in this example BD-fb is more conservative, as it detects a
12 lower number of outliers.
13 It can be observed that, in general, EH-fb produces central boxes that
14 are flatter than the central boxes produced by MBD-fb. The most relevant
Fo
15
16 cases are South and West regions, where the central box of MBD-fb have
17 some picks, whereas the one of EH-fb is completely flat. The reason of this
18 difference might be that functions that are low in the plot, but contain many
rP
19 ups and downs, are considered by MBD-fb as central functions, whereas EH-
20
21
fb is considering them as low functions. We have seen in previous examples
22 that EH-fb are usually more conservative because they produce wider central
ee
23 boxes. However, in this example, despite having wider boxes there are some
24 outliers that are detected by EH-fb and are not detected by MBD-fb. One
25
26
of these cases is East North Central region, where at certain station the
rR
27 rainfall has increased at the end of the period considered. The other example
28 is Central region, where a curve has a high pick around 1950 and some other
29 picks in other years. These two curves seem to have a different behavior with
30
respect to the other functions in its region, and are detected by EH-fb but
ev
31
32 not by MBD-fb. Another interesting example is South East region. The
33 two graphics look very similar, they both detect many outliers at the upper
34 part of the graphic, above the central box. However, EH-fb also detects an
ie
35 outlying curve below the central box, whereas MBD-fb does not detect any
36
37 curves in that part of the graphic. This is again because MBD-fb is mainly
w
38 based on the depth of the curves, a concept that involves centrality, whereas
39 EH-fb is based on an index that captures how low or high the curve is.
40 Note that the percentage of outliers seems quite high for some of the
On
41
42 regions. This is true for both MBD-fb and F-fb. In [21] a method is proposed
43 to adjust the factor 1.5 in such a way that the percentage of non-outliers
44 wrongly detected as outliers resembles the univariate case. This method is
45
ly
applied to this database for the MBD-fb and can be applied also for the
46
47
EH-fb. However, the picks in the central box that can be found in regions
48 Central and South, do not depend on the factor 1.5, so this issue will not
49 be solved with the simulated-based approach.
50
51
52
53
54
55
56
57
18
58
59
60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Fo
North East East North Central Central
15
4000
4000
4000
16
17
3000
3000
3000
18
rP
19
2000
2000
2000
20
21
1000
1000
1000
22
ee
23
0
0
24 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995
South East West North Central South

25
4000
4000
4000
26
rR
27
3000
3000
3000
28
29
2000
2000
2000
30
ev
31
1000
1000
1000
32
33
0
1895 1920 1945 1970 1995 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995
34
ie
South West North West West

35
4000
4000
4000
36
37
w
3000
3000
3000
38
39
2000
2000
2000
40
On
41
1000
1000
1000
42
43
0
1895 1920 1945 1970 1995 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995
44
45
ly
46 Figure 8: Observed yearly precipitation curves over the nine climatic regions
47 for the coterminous United States from 1895 to 1997.
48
49
50
51
52
53
54
55
56
57
19
58
59
60
1
2
3
4
5
6
7
8
9
10
11
12
13
North East ( 0.21% outliers ) East North Central ( 0.12% outliers ) Central ( 0.31% outliers )
14
4000
4000
4000
Fo
15
16
3000
3000
3000
17
18
2000
2000
2000
rP
19
20
1000
1000
1000
21
22
0
0
ee
23 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995
24 South East ( 2.83% outliers ) West North Central ( 2.41% outliers ) South ( 0.00% outliers )
25
4000
4000
4000
26
rR
3000
3000
3000
27
28
2000
2000
2000
29
30
1000
1000
1000
ev
31
32
0
33 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995
34
ie
35 South West ( 3.77% outliers ) North West ( 2.33% outliers ) West ( 4.34% outliers )
4000
4000
4000
36
37
w
3000
3000
3000
38
39
2000
2000
2000
40
On
41
1000
1000
1000
42
43
0
44 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995
45
ly
46
47
Figure 9: EH-fb of observed yearly precipitation over the nine climatic
48 regions for the coterminous United States from 1895 to 1997.
49
50
51
52
53
54
55
56
57
20
58
59
60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
4000
4000
4000
Fo
15
16
3000
3000
3000
17
18
2000
2000
2000
rP
19
20
1000
1000
1000
21
22
0
0
ee
23 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995
25
4000
4000
4000
26
rR
3000
3000
3000
27
28
2000
2000
2000
29
30
1000
1000
1000
ev
31
32
0
33 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995
34
ie
4000
4000
4000
36
37
w
3000
3000
3000
38
39
2000
2000
2000
40
On
41
1000
1000
1000
42
43
0
44 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995
45
ly
46
47
Figure 10: BD-fb of observed yearly precipitation over the nine climatic
49
50
51
52
53
54
55
56
57
21
58
59
60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
4000
4000
4000
Fo
15
16
3000
3000
3000
17
18
2000
2000
2000
rP
19
20
1000
1000
1000
21
22
0
0
ee
23 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995
25
4000
4000
4000
26
rR
3000
3000
3000
27
28
2000
2000
2000
29
30
1000
1000
1000
ev
31
32
0
33 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995
34
ie
4000
4000
4000
36
37
w
3000
3000
3000
38
39
2000
2000
2000
40
On
41
1000
1000
1000
42
43
0
44 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995 1895 1920 1945 1970 1995
45
ly
46
47
Figure 11: MBD-fb of observed yearly precipitation over the nine climatic
49
50
51
52
53
54
55
56
57
22
58
59
60
1
2
3
4
5
6
7
8 7 Conclusions
9
10 Visualization methods play an important role in science. Functional boxplots
11
12 [20] and other related approaches for functional data [10] have shown to be
13 useful for such a purpose. The functional boxplot proposed in [20] is mainly
14 based on the data depth, which is a center-outward ordering of the curves.
Fo
15 A new version of functional boxplot based on a down-upward ordering of
16
17 the data has been proposed in this paper. This type of ordering provides a
18 natural extension of the univariate boxplot.
rP
19 The simulated examples show that EH-fb is a good alternative to its
20 depth-based counterparts, being able to give a robust representation of the
21
22
central box in cases where either BD-fb or MBD-fb fails. In our simulated
examples, EH-fb approach was less affected than BD-fb by irregularities in
ee
23
24 the data, a case in which MBD-fb would work properly. However, when
25 outliers were different in small subintervals, MBD-fb approach easily fails
26
to detect them, whereas EH-fb do. Our real examples illustrate that EH-fb
rR
27
28 detects different outliers than depth-based approaches. In particular it has
29 been shown to be a convenient tool when data have yet to be preprocessed.
30 Thus, EH-fb is a promising visualization technique for functional data.
ev
31
An extension of EH-fb to spatio-temporal data, dealing with surfaces
32
33 instead of curves, is straightforward, as it has been noted by [20] for depth-
34 based functional boxplots. In general, these approaches are not direct for
ie
35 multivariate data, unless an intuitive index has sense for a particular ap-
36 plication. Extensions to other complex data, such as images or networks,
37
w
38 look interesting for further research. In the last years, internet-based social
39 networks have become very popular. The information captured by those
40 networks brings up new challenges for data analysis and visualization tech-
On
41 niques.
42
43
44
45 8 Acknowledgment
ly
46
47 This research was partially supported by project MTM2012-36163-C06-03,
48 ECO2011-25706 of Ministerio de Educación y Ciencia (Spain).
49
50
51
52
53
54
55
56
57
23
58
59
60
1
2
3
4
5
6
7
8 References
9
10 [1] V. Barnett. The ordering of multivariate data. Journal of the Royal
11
12 Statistical Society, Series A, 139(3):318–355, 1976.
13
14 [2] I. Cascos. Data depth: multivariate statistics and geometry. In I. Cascos,
Fo
15 W.S. Kendall, and I. Molchanov, editors, New Perspectives in Stochastic
16 Geometry, pages 398–423. Oxford University Press, 2010.
17
18 [3] J.A. Cuesta-Albertos and A. Nieto-Reyes. The random Tukey depth.
rP
19
20
Computational Statistics & Data Analysis, 52:49794988, 2008.
21
22 [4] A. Cuevas, M. Febrero, and R. Fraiman. Robust estimation and classi-
fication for functional data via projection-based depth notions. Compu-
ee
23
24 tational Statisitcs, 22(3):481–496, 2007.
25
26 [5] A. Delaigle and P. Hall. Defining probability density for a distribution
rR
27
of random functions. The Annals of Statistics, 38:1171–1193, 2010.
28
29
30
[6] A. Delaigle and P. Hall. Achieving near perfect classification for func-
tional data. Journal of the Royal Statistical Society: Series B (Statistical
ev
31
32 Methodology), pages no–no, 2011.
33
34 [7] F. Ferraty and P. Vieu. Nonparametric functional data analysis: theory
ie
35 and practice. Springer, 2006.

36
37
w
[8] R. Fraiman and C. Muñiz. Trimmed means for functional data. Test,
38
39 10(2):419440, 2001.
40
On
41 [9] A.M. Franco-Pereira, R. Lillo, and J. Romo. Extremality for functional

42 data. In F. Ferraty, editor, Recent Advances in Functional Data Analysis
43 and Related Topics, chapter 20, pages 131–134. Springer, 2011.
44
45
ly
[10] R.J. Hyndman and H.L. Shang. Rainbow plots, bagplots, and boxplots
46
47 for functional data. Journal of Computational and Graphical Statistics,
48 19:2945, 2010.
49
50 [11] R.Y. Liu. On a notion of data depth based on random simplices. Annals
51 of Statistics, 18(1):405–414, 1990.
52
53
54
55
56
57
24
58
59
60
1
2
3
4
5
6
7
8 [12] R.Y. Liu, J.M. Parelius, and K. Singh. Multivariate analysis by data
9 depth: descriptive statistics, graphics and inference. Annals of Statistics,
10
11 27(3):783–858, 1999.
12
13 [13] S. Lopez-Pintado and J. Romo. On the concept of depth for functional
14 data. Journal of the American Statistical Association, 104(486):718–734,
Fo
15 2009.
16
17 [14] S. Lopez-Pintado and J. Romo. A half-region depth for functional data.
18
Computational Statistics and Data Analysis, 55(46):1679–1695, 2011.
rP
19
20
21 [15] J.O. Ramsay and C.J. Dalzell. Some tools for functional data analysis
22 (with discusion). J. R. Stat. Soc. Ser. B, 53:539–572, 1991.
ee
23
24 [16] J.O. Ramsay and B.W. Silverman. Applied Functional Data Analysis.
25 New York: Springer-Verlag, 2002.
26
rR
27 [17] J.O. Ramsay and B.W. Silverman. Functional Data Analysis. New York:
28
29
Springer-Verlag, 2005.
30
[18] R. Serfling. Depth functions in nonparametric multivariate inference. In
ev
31
32 R.Y. Liu, R. Serfling, and D.L. Souvaine, editors, Data Depth: Robust
33 Multivariate Analysis, Computational Geometry and Applications, page
34 116. American Mathematical Society, 2006.
ie
35
36 [19] R. Serfling and Y. Zuo. General notions of statistical depth function.
37
w
38 Annals of Statistics, 28(2):461–482, 2000.

39
40 [20] Y. Sun and M.G. Genton. Functional boxplots. Journal of Computa-
On
41 tional and Graphical Statistics, 20(2):316–334, 2011.

42
43 [21] Y. Sun and M.G. Genton. Adjusted functional boxplots for spatio-
44 temporal data visualization and outlier detection. Envirometrics, 23:54–
45
ly
46
64, 2012.
47
48 [22] J.W. Tukey. Mathematics and the picturing of data. In R.D. James,
49 editor, Proceedings of the International Congress of Mathematicians,
50 August, 21-29, 1974, volume 2, pages 523–531. Vancouver: Canadian
51 Mathematical Society, 1975.
52
53
54
55
56
57
25
58
59
60

MartinBarraganLilloRomo JAS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MartinBarraganLilloRomo JAS

Uploaded by

Copyright:

Available Formats

Journal of Applied Statistics

Functional Boxplots based on epigraphs and hypographs.

Journal: Journal of Applied Statistics

Manuscript ID: CJAS-2013-0638.R1

Manuscript Type: Original Article

Date Submitted by the Author: n/a

Complete List of Authors: Martin-Barragan, Belen; The University of

Functional data, Box and Whisker Plots, Data

41 boxplot based on depth measures. Our proposal generalizes the usual

46 and hypographs of the data that allow a new definition of functional

27 box-plot that is robust to outliers (specifically shape outliers). This is one

35 data, as well as its usefulness to detect outliers.

38 in Section 2 and analyzed in Section 3. Both are combined in Section 4 to

46 2 Ordering functional data

27 based on a center-outward order, but on an down-upward order. Hence, we

38 The epigraph and hypograph of a function f are defined, respectively, as

35 Figure 1 shows a simple example with n = 3 curves in the sample and

35 process with zero mean and covariance function γ(s, t) =

38 A set of n = 100 curves is generated using Model 1. The simulated

38 they are useful to sort the functions in a down-up sense.

41 4 Functional boxplots based on epigraphs and

27 functional quartiles is carefully designed to be robust.

41 on the functional quartiles provided by the orderings defined in Section 2 but

31 whereas if we use it to compute the third quartile, HI is more sensitive.

magnitude outliers, while Model 5 illustrates shape contamination. They are

46 stant, and σi is a sequence of random variables independent

27 In the simulation studies, n = 100 curves are generated with parameters

35 (BD-fb), respectively. The three different approaches considered (EH-fb,

38 respectively). Each row corresponds to a different data generating model.

27 to the central box. This behavior produces an important distortion in the

41 model EH-fb BD-fb MBD-fb

17 Model 2 Model 2 Model 2

Model 3 Model 3 Model 3

FP=0 FN=0 FP=0 FN=0 FP=0 FN=2

41 smoother than others.

27 different results, specially for MBD-fb. We believe that functional boxplots,

38 The last example, US Precipitations, corresponds again to rainfall obser-

Averaged daily rainfall

Averaged daily rainfall

46 Figure 7: Comparing Boxplots. Historical daily precipitations in Australia.

South East West North Central South

South West North West West

35 and practice. Springer, 2006.

41 [9] A.M. Franco-Pereira, R. Lillo, and J. Romo. Extremality for functional

38 Annals of Statistics, 28(2):461–482, 2000.

41 tional and Graphical Statistics, 20(2):316–334, 2011.

You might also like