Professional Documents
Culture Documents
3 (2013) A29
DOI: 10.1051/swsc/2013051
B.A. Laken et al., Published by EDP Sciences 2013
ABSTRACT
The composite (superposed epoch) analysis technique has been frequently employed to examine a hypothesized link between solar
activity and the Earth’s atmosphere, often through an investigation of Forbush decrease (Fd) events (sudden high-magnitude de-
creases in the flux cosmic rays impinging on the upper-atmosphere lasting up to several days). This technique is useful for isolating
low-amplitude signals within data where background variability would otherwise obscure detection. The application of composite
analyses to investigate the possible impacts of Fd events involves a statistical examination of time-dependent atmospheric re-
sponses to Fds often from aerosol and/or cloud datasets. Despite the publication of numerous results within this field, clear con-
clusions have yet to be drawn and much ambiguity and disagreement still remain. In this paper, we argue that the conflicting
findings of composite studies within this field relate to methodological differences in the manner in which the composites have
been constructed and analyzed. Working from an example, we show how a composite may be objectively constructed to maximize
signal detection, robustly identify statistical significance, and quantify the lower-limit uncertainty related to hypothesis testing.
Additionally, we also demonstrate how a seemingly significant false positive may be obtained from non-significant data by minor
alterations to methodological approaches.
Key words. statistics and probability – cosmic ray – cloud
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
J. Space Weather Space Clim. 3 (2013) A29
0.2
Unitless
Di 0.0
-0.2
Ni
Si -0.4
-0.6
0 100 200 300 400 -20 -10 0 10 20
Time (days) t (days since event)
Ai compoisite (unitless)
Xi and Fi (unitless)
1.5
0.0
1.0
Ct ± SEM
0.2
0.5
0.0 -0.4
0 100 200 300 400 -20 -10 0 10 20
Time (days) t (days since event)
Fig. 1. (a) Fictional time series Xi, comprised of deterministic (Di), stochastic (Ni), and a low-amplitude repeating signal (Si) components. Di and
Ni range from 0.0 to 1.0, while Si has an amplitude of 0.3. (b) Fi, a 21-point smooth (box-car running mean) of Xi. By subtracting Fi from Xi a
high-pass filtered dataset Ai is produced. (c) A composite matrix of events from Ai, where rows = n, the number of repeating signals in Si (a
composite event), and columns = t, the number of days since the composite event. (d) Ct, the composite means of Ai, the SEM.
be highly difficult to objectively determine the presence of a Todd & Kniveton 2001, 2004; Kniveton 2004; Harrison &
signal at t0. However, by averaging the 10 events of the matrix Stephenson 2006; Bondur et al. 2008; Kristjánsson et al.
together to create a composite mean, 2008; Sloan & Wolfendale 2008; Troshichev et al. 2008;
Svensmark et al. 2009; Dragić et al. 2011; Harrison et al.
1X n
2011; Laken & Čalogović 2011; Okike & Collier 2011; Laken
Ct ¼ M jt ; ð1Þ
n j¼0 et al. 2012a; Mironova et al. 2012; Svensmark et al. 2012;
Artamonova & Veretenenko 2011; Dragić et al. 2013). How-
the noise is reduced by the square root of the number of ever, despite extensive research in this area, clear conclusions
events, regarding the validity of a solar-climate link from composite
r studies have yet to be drawn. Instead, the numerous composite
C t ¼ pffiffiffi ; ð2Þ analyses have produced widely conflicting results: some stud-
n
ies have shown positive statistical associations between the
where DCt indicates the standard error of the mean (SEM) at t CR flux and cloud properties (e.g., Tinsley & Deen 1991;
and r is the sample standard deviation of Ct. As the noise Pudovkin & Veretenenko 1995; Todd & Kniveton 2001,
diminishes the signal has an increased chance of being iden- 2004; Kniveton 2004; Harrison & Stephenson 2006;
tified. For each time step t, the mean and SEM of the matrix Svensmark et al. 2009, 2012; Dragić et al. 2011, 2013;
may be calculated using Equations 1 and 2 respectively; these Harrison et al. 2011; Okike & Collier 2011), while others find
data are presented in Figure 1d, and indeed show the success- no clearly significant relationships (e.g., Lam & Rodger 2002;
ful isolation of a signal of approximately 0.3 amplitude cen- Kristjánsson et al. 2008; Sloan & Wolfendale 2008; Laken
tered on t0. This example and all those presented in this et al. 2009; Laken & Čalogović 2011; Laken et al. 2011;
manuscript relate only to one particular statistic, the sample Laken et al. 2012a; Čalogović et al. 2010), or even identify
mean, however, any other statistic may also be used. While significant correlations of a negative sign (e.g., Wang et al.
the composite approach appears straightforward, inconsisten- 2006; Troshichev et al. 2008). We suggest that these ambigu-
cies in the design, creation, and evaluation of composites can ities may result from seemingly minor methodological differ-
strongly affect the conclusions of composites studies and are ences between the composites (e.g., relating to the filtering or
the focus of this paper. normalization of data), which are capable of producing widely
Numerous composite studies have been published within divergent results. When the dataset is not suited to a particular
the field of solar-terrestrial physics, relating to a hypothesized method of statistical analysis, incorrect conclusions regarding
connection between small changes in solar activity and the the significance (i.e., the probability p-value associated with
Earth’s atmosphere (e.g., Tinsley et al. 1989; Tinsley & Deen composite means at given times) of the composites may be
1991; Pudovkin & Veretenenko 1995; Stozhkov et al. 1995; reached, which is part of the reason why these aforementioned
Egorova et al. 2000; Fedulina & Laštovička 2001; studies have presented conflicting results.
A29-p2
B.A. Laken & J. Čalogović: Composite analysis with MC methods
A29-p3
J. Space Weather Space Clim. 3 (2013) A29
In autocorrelated data the value at any given point in time is -100 (c)
affected by preceding values. As a result, sequential data points
CR flux (counts)
-4
are no longer independent, and so there is less sample variation
within the dataset than there would otherwise be if the points -6 Normalized NM counts
were independent. If the variance and the standard deviation 21-day smooth
(r) are calculated using the usual formulas, they will be smaller -8 Anomaly (d)
than the true values. Instead, the number of independent data -20 -10 0 10 20
points (effective sample size) must be used to scale the sample
t (days since event)
variance (Wilks 1997). This adjustment will produce confi-
dence intervals that are larger than when autocorrelations are Fig. 3. (a–c) The differences between symmetrical idealized distur-
ignored, making it less likely that a fluctuation in the data will bances of increasing duration (3-day, 7-day, and 11-day) with an
be interpreted as statistically significant at any given p-value. In amplitude of 100 units (blue lines) centered on t0, and the results of
subtracting a 21-day smooth (box-car running mean) from the data
composite analyses, the number of independent data points (i.e., which acts as a high-pass filter (black lines). The resulting filtered
the effective length of the composite sequences) is equivalent to data (referred to as anomalies) are shown as red lines, and display
the effective sample size (Forbush et al. 1983): effective sample overshoot effects. (d) The effect of subtracting the smooth filter from
sizes may be calculated by the methods detailed in Ripley the CR flux (units in normalized counts). The anomaly has been
(2009) and Neal (1993). Despite the well-established nature offset by 5.32 counts in (d) for plotting purposes.
of methods they are not widely applied within the solar-terres-
trial community, with numerous studies implementing signifi-
cance testing that simplistically assumes that the data are
independent in the t-dimension. and high autocorrelation being inadvertently included into com-
In Section 3.4 we present an additional method of signifi- posites. Suitable filters should be applied to the data to remove
cance testing for composite studies based on Monte Carlo anal- longer timescale variability, but to retain variations at timescales
ysis approaches that is highly robust. This method does not rely that concern the hypothesis testing. In the case of a CR flux-
on comparing composite means at given times over the com- cloud hypothesis, a 21-day running mean (high-pass filter)
posite to evaluate significance, but instead evaluates the p-value may be suitable, as the microphysical effects of ionization on
of each t-point individually, using probability density functions aerosol/cloud microphysics are expected to occur at timescales
(PDFs) constructed from large numbers of randomly generated of <1 week (Arnold 2006).
samples (with a unique PDF at each t-point). Although filtering data has the benefit of potentially reduc-
ing noise, it should be applied with caution, as it may also intro-
3.2. Generating anomalies: the importance of static means duce artifacts, and reduce or even destroy potential signals
in composite analyses altogether. Figure 3a–c shows the influence of a 21-day smooth
filter on three idealized symmetrical disturbances of differing
To effectively evaluate changes in daily timescale data, varia- length. These disturbances are centered on t0, each has an
tions at timescales beyond the scope of the hypothesis testing amplitude of 100 units in the y-axis dimension and span time
should be removed from the dataset. Forbush et al. (1982, periods of (a) 3-days, (b) 7-days, and (c) 11-days. To each time
1983) showed that bias can be introduced into traditional statis- series, a smooth filter of 21-days (i.e., a box-car running mean)
tical tests if fluctuations at longer timescales are not removed. has been subtracted from the disturbance (the filters are shown
The removal of annual variations alone (which normally dom- as black lines). The resulting anomalies are shown as the red
inate geophysical data) will not remove fluctuations that cannot lines of each figure. As the duration of the idealized disturbance
be eliminated by linear de-trending, such as those relating to the increases, so too does the appearance of overshoot artifacts.
effects of planetary-to-synoptic scale systems. Such systems From Figure 3a–c, the original (peak-to-trough) signal of 0 to
may extend over thousands of kilometers in area, and their 100 has been shifted by overshoot effects by: (a) +9.6,
influence on atmospheric variability may last from periods of (b) +19.0, and (c) +27.8. For a and b, the amplitude of the ori-
days to several weeks; the random inclusion of their effects ginal disturbance is fully preserved, however, as the timescales
in composites may produce fluctuations at timescales shorter of the disturbance increase further, the amplitude is reduced.
than seasonal considerations, yet greater than those concerning E.g. for the 11-day disturbance the amplitude is diminished
our hypothesis testing (so-called intermediate timescales) which by 0.8% compared to the unfiltered signal, while for a 15-day
may not be removed by linear de-trending. Consequently, when disturbance (not shown) the amplitude decreases by 9%. The
intermediate timescale fluctuations are not accounted for prior amount of signal attenuation will increase as the deviations
to an analysis, it may result in periods of shifted mean values approach the width of the smooth filter. For deviations at
A29-p4
B.A. Laken & J. Čalogović: Composite analysis with MC methods
0.4 600
Histogram density
0.0 400
-0.2 300
-0.4 200
-0.8 0
-40 -20 0 20 40 -0.4 -0.2 0.0 0.2 0.4
t (days since random key event) t0 difference between the methods (%)
Fig. 4. (a) One random instance of n = 20 events, for both linearly detrended data (black line) and an anomaly from a 21-day moving average
(blue line). Differences between these two curves show the possible influences of intermediate timescales on the data. (b) The differences
between these curves at t0 for 10,000 random instances of n = 20. The histogram shows a 1r value of 0.1%, a non-trivial difference which may
influence the outcome of a composite analysis.
timescales of approximately 1/3rd the width of the smooth filter, linear bias from the composite resulting from seasonality. Sec-
the attenuation is negligible. Given this, filters should be applied ondly, we have also created an anomaly by subtracting a 21-day
with care to minimize noise while preserving potential signals. running mean (high-pass filter) from cloud cover (blue line). In
Following the application of the filter in this work we have con- this latter method, all variations (linear and non-linear) at time-
strained the reliability of our analysis to time periods of <7 days. scales greater than 21-days are accounted for, including both
Figure 3d shows a composite of n = 44 events, centered on seasonality and influences from intermediate timescale fluctua-
the CR flux minimum of Fd events. CR flux data (blue line) are tions. A period of 21-days was selected as this is three times
from the combined daily mean counts of the neutron monitors greater than the upper limit response time predicted for a CR
(NM) of Moscow and Climax, normalized by subtracting the flux cloud relationship (Arnold, 2006), and so subtracting a run-
mean of the 1950–2010 period (as in Fig. 2a). A 21-day smooth ning mean of this length from the data should not diminish the
of these data are shown (black line). The normalized data show amplitude of any potential signals, yet it may still be useful in
a peak-to-trough reduction between t3 and t0 of 4.17% and a minimizing influences from longer term variations which are
21-day smooth (high-pass filter) of these data are also shown not the subject of the hypothesis testing. A comparison of these
(black line). Following the removal of a 21-day high-pass filter two approaches (Fig. 4a) gives an example of how the failure to
from the normalized data, the resulting anomaly (red line) remove non-linear intermediate timescale variations influences
shows a slightly smaller peak-to-trough change over the same the composite, with noticeable deviations between the two
period of 4.05%. This indicates that 97.1% of the signal has methods. However, it is difficult to objectively assess the bias
been preserved following the application of the 21-day (high- expressed as the difference between these two methods with
pass) smooth filter. Overshoot distortion effects are also obser- only one random example. With this in mind histogram
vable in the CR flux anomaly, and notably this produced an Figure 4b is presented, displaying the results of 100,000 ran-
artificial enhancement of the increase in the CR flux prior to domly generated composites, showing the difference between
and following the Fd events. the linear fit removed data and the 21-day smooth cloud anomaly
Despite the limitations associated with data filtering, the at t0. The histogram shows that the 1r difference between these
noise reduction which may be gained by minimizing the influ- anomalies (i.e., the impact of not accounting for intermediate
ence of autocorrelated structures within data may be non- timescale variations) is approximately 0.1%. We note this value
negligible. To exemplify the potential impacts of intermediate will vary depending on the size of n and the length of the running
timescale structures on composites and the benefit of removing mean applied, and some differences between the two filtering
them, we have generated a random composite of cloud cover methods may also be attributed to overshoot errors. As we shall
data with n = 20 events, over an analysis period of t±40 days. demonstrate in later sections, unaccounted-for biases of such
The n = 20 events upon which the composite was based were amplitudes may have a non-negligible impact on the estimated
selected using a random number generator, ensuring that the statistical significance of composited anomalies.
chances of any one date (event) being selected for the compos-
ite were equal, and that once selected, a date could not re-occur 3.3. Using signal-to-noise-ratio as a constraint when designing
in the composite. All of the random composites used through- a composite
out this work were created in this manner. The composite is pre-
sented in Figure 4a, with the cloud anomalies calculated by two The data we have presented in Figure 2b are global in extent
methods. Firstly, through a simple removal of a linear trend and utilize all pressure levels of the ISCCP dataset (1000–
from the cloud data (black line); this approach will remove 50 mb). However, composite studies often use subsets of data
A29-p5
J. Space Weather Space Clim. 3 (2013) A29
for analysis, with varying spatio-temporal restrictions: e.g., cover anomalies (21-day running mean subtracted) vary for dif-
focusing on stratospheric layers over Antarctica (Todd & fering area a and composite size n. Using MC simulations this
Kniveton 2001, 2004; Laken & Kniveton 2011); areas over iso- is calculated for composites varying over an a of 0.1–100% of
lated Southern Hemisphere oceanic environments at various the global surface, and n of 10–1,000. For each combination of
tropospheric levels (Kristjánsson et al. 2008); low clouds over variables a and n a probability density function (PDF) of
subtropicaltropical oceans (Svensmark et al. 2009); indicators 10,000 randomly generated composite means at t0 are generated
of the relative intensity of pressure systems (vorticty area index) and the 97.5th percentile of the distribution is then presented in
over regions of the Northern Hemisphere Mid-latitude zone, Figure 5. For each corresponding a and n, any potential CR
during differing seasons (Tinsley et al. 1989; Tinsley & Deen flux-induced cloud variations (which we shall refer to as a sig-
1991); total ozone at latitudes between 40N and 50N, during nal) must overcome the values shown in Figure 5 in order to be
periods of high solar activity and eastern phases of the quasi- reliably detected; so these values represent the upper confidence
biennial oscillation (Fedulina & Laštovička 2001); variations level. A nearly symmetrical 2.5 percentile lower confidence
in atmospheric pressure and formation of cyclones/anticyclones level (not shown) is also calculated for every a and n combina-
over the North Atlantic (Artamonova & Veretenenko 2011); tion, and together these upper and lower confidence intervals
and diurnal temperature ranges over areas of Europe (Dragić constitute the p = 0.05 significance threshold.
et al. 2011; Dragić et al. 2013; Laken et al. 2012a). In addition The noise associated with composites of differing a and n
to spatial/seasonal restrictions, it has been frequently proposed sizes defines the lower-limit relationship which can be detected
that perhaps a CR-cloud effect is only readily detectable during at a specified confidence level. Thus, for a specific magnitude
the strongest Fd events (e.g., Harrison & Stephenson 2006; of CR flux reduction, the minimum necessary efficiency at
Svensmark et al. 2009, 2012; Dragić et al. 2011). Although which a CR-cloud relationship must operate in order to be
such propositions are rational, high-magnitude Fd events are detected at a specified p-value can be calculated. For example,
quite rare. For example, of the 55 original Fd events between if composites were constructed with n = 10 samples from data
1984 and 1995 listed in the NOAA resource (discussed in with an area of only 1% of the Earth’s surface, then a change in
Sect. 2), the Fd intensity (IFd, the daily absolute percentage cloud cover would have to exceed approximately ±6.3% in
change of background cosmic ray intensity as observed by order to be classified as statistically significant at the
the Mt. Washington neutron monitor) is as follows: 28 weak p = 0.05 two-tailed level.
events (IFd < 5%), 21 moderate events (5% IFd <10%), Based on such calculations, composites can be constructed
and 6 strong events (IFd 10%). Consequently, composites from which a response can realistically be identified. For exam-
focusing only on intense Fd events are invariably small in size, ple, the most favorable study of a CR-cloud link by Svensmark
and therefore highly restricted by noise (Laken et al. 2009; et al. (2009) finds using a sample of n = 5 Fd events that fol-
Čalogović et al. 2010; Harrison & Ambaum 2010). lowing an 18% reduction in the CR flux a 1.7% decrease in
While there is often a good theoretical basis for restricting cloud cover occurs. Regardless of the criticisms of this study
the sampling area, such constraints considerably alter the poten- (given in Laken et al. 2009 and Čalogović et al. 2010), we
tial detectability of a signal. Restricting either the spatial area use this observation to calculate the most favorable upper limit
(hereafter a) of the sampling region or the number of sample (UL) sensitivity of cloud cover changes to the CR flux (1.7/18
events (composite size, hereafter n) will increase the back- giving a factor of 0.094). From a consideration of this UL sen-
ground variability (noise) in composites. This relationship is sitivity and the relationship between composite noise to area a
quantitatively demonstrated in Figure 5, which shows how and composite size n in Figure 5, we may then surmise that a
noise levels in random composites constructed from cloud CR flux-induced change in cloud cover (a signal), could almost
A29-p6
B.A. Laken & J. Čalogović: Composite analysis with MC methods
Histogram density
1995; Todd & Kniveton 2001). This is because the UL sensitiv-
ity suggests that the CR flux would at most produce a cloud sig- 600 Diff: 0.088 Diff: 0.156
Histogram density
Evaluating the statistical significance of composited geophysi-
Diff: 0.106 Diff: 0.064
cal data by a comparison to randomly obtained samples is 400
not a novel idea. One of the first applications of this
method within the field of solar-terrestrial physics appears in
(Schuurmans & Oort 1969) (hereafter SO69), who investigated 200
the impacts of solar flares on Northern Hemisphere tropo-
spheric pressures over a composite of 81 flare events. They con-
structed their composite over a three-dimensional (longitude, 0
latitude, height) grid and examined changes over a 48 h period. -4 -2 0 2 4
They then evaluated the significance of their observed pressure ISCCP cloud anomaly (%)
variations at the p = 0.05 significance level (calculated from
the standard deviation of their data). SO69 noted that due to Fig. 6. Distribution of daily anomalies for the entire 1983–2010 data
autocorrelation in their data, the number of significant grid period for: (a) cosmic ray flux (%), and; (b) cloud cover (%) data.
For comparison an idealized Gaussian distribution is shown on the
points beyond the p = 0.05 level may exceed the false discov- black lines. The mean ±1.96r, and 2.5/97.5th percentile values are
ery rate without there necessarily being a significant solar- displayed on the red dashed and blue solid lines respectively.
related signal in their data (a point which we shall discuss fur-
ther later in this work).
To assess the potential impact of autocorrelation on their a 21-day running mean, then the MC-generated composites
data and more accurately gauge how unusual their result was, must also be based on data with this treatment). Following this,
SO69 constructed three random composites of equal size to the composite mean at each t is calculated. If enough available
their original composite (n = 81), and compared the random data exist, this procedure can be repeated many times, making it
results to their original findings. Although SO69 were limited possible to build up large PDFs of composite means for each t.
by the lack of computing power available to them, and conse- From these, confidence intervals at specific p-values may then
quently their number of random composites was not sufficient be identified as percentiles of the calculated distributions. We
to precisely identify the statistical significance of their flare note that this procedure may fail in cases where insufficient data
composite, our method, which we will shortly describe in exists to build up distributions which are representative of the
detail, is based upon the same principles as those of SO69. In parent population from which the samples are drawn.
essence, we will use populations of randomly generated com- Geophysical data do not often follow idealized Gaussian
posites and identify threshold significance values using proba- distributions, as a result the two-tailed confidence intervals
bility density functions (PDFs) of these values, from which should be identified asymmetrically for optimum precision:
we may precisely evaluate the statistical significance of varia- i.e., the upper/lower p = 0.05 confidence interval should corre-
tions in the composite mean over t. spond to the 2.5th and 97.5th percentiles of the cumulative fre-
To achieve this, we can use the MC-based method of esti- quency, not simply to the ±1.96r value. This is demonstrated in
mating noise used to calculate Figure 5 to provide a simple, yet Figure 6, which shows a distribution of both the CR flux anom-
powerful method of evaluating the statistical significance of aly and cloud anomaly data, comprised of all data points from
composite means. From MC-generated PDFs we may test 1983–2010 (in this instance the data presented are the entire
how likely it is that we can randomly obtain composite means population of daily values rather than composite values). The
of a given magnitude by chance. Events are selected at random two-tailed p = 0.05 thresholds are calculated from both percen-
from the data being scrutinized, and composites of equal sam- tiles (blue lines) and from the ±1.96r level around the mean (red
ple sizes to the original composite are constructed. The data of dashed lines). Under an ideal Gaussian distribution (displayed on
the MC must be treated in exactly the same manner as that of the black lines) the 2.5/97.5th percentile and ±1.96r values
the composite being evaluated with respect to the calculation of would be both equivalent and symmetrical around the mean
anomalies, or application of any normalization procedures value. However, due to the departures from the ideal Gaussian
(i.e., if the composite is based on an anomaly calculated from distribution there are differences between these significance
A29-p7
J. Space Weather Space Clim. 3 (2013) A29
0.5
0.0
-2
-0.5
-4 -1.0
-40 -20 0 20 40 -40 -20 0 20 40
t (days since CR flux minimum) t (days since CR flux minimum)
Fig. 8. Composites of (n = 44) Forbush decrease events over a t±40 day period not coincident within a t±7 day period of GLEs from 1983 to
1995 for: a) the CR flux (%), and b) cloud cover (%). The mean anomalies (solid black lines) are plotted together with the ±1.96 SEM values
(blue dashed lines) for each of the t±40 days of the composite. Based on PDFs of 10,000 MC simulations at each t value two-tailed confidence
intervals are calculated at the p = 0.05 and p = 0.01 levels (dashed and dotted red lines respectively).
to deviations of 2.5 ± 2.4r and 2.1 ± 2.1r respectively. The the FDR. Considered together these results clearly show that
p-value associated with these mean anomalies is p = 0.01 (with there are no unusual variations during or following statistically
upper/lower SEM values of p = 0.91/p < 0.00) and p = 0.04 significant variations in the CR flux, and consequently support
(p < 0.00/p = 0.96) respectively. All other variations shown the rejection of H1 and the acceptance of H0.
in Figure 8, including those observed following the Fd induced From a combination of the observed reduction in CR flux of
CR flux reduction, were of a smaller magnitude. The second 3%, and the observation that no clear cloud changes occurred
largest negative anomaly observed occurred at t+5 and warrants above the p = 0.01 significance level which are equivalent to
discussion (for reasons which will become apparent in the next cloud anomalies of about 0.40% (see Fig. 8b), we may also
paragraph): the anomaly possesses a mean of 0.37 ± 0.41%, conclude that if a CR-cloud relationship exists, then a 1% CR
corresponding to a deviation of 2.4 ± 2.6r and a significance of flux change is, at most, able to alter cloud by 0.13% (0.4%/
p = 0.02 (with upper/lower SEM values of p = 0.84/p < 0.00). 3%). If a CR flux-cloud relationship were more efficient than
It should be stressed that by analyzing 81 points (t±40) over this limit, we would be able to detect a statistically significant
a composite using the p = 0.05 significance level, there will be cloud response over daily timescales. Since we do not, our con-
a false discovery rate (FDR) of approximately 4 days clusions must be limited by the statistical noise of the depen-
(81 · 0.05 = 4.05) and we can expect that the mean anomaly dent (cloud) dataset. However, we note that supporting lines
will exceed the p = 0.05 level by chance on approximately four of evidence suggest at least that the higher range of values asso-
occasions over an 81-day period. Indeed, in Figure 8b we ciated with this upper-limit constraint is likely too large, as it
observe mean anomalies of p < 0.05 in six instances over the implies that over a solar cycle decadal oscillations in the CR
composite, in line with the expected FDR. In contrast, the flux on the order of 20% may induce cloud changes of
CR flux is observed to be significant at the p = 0.05 level at 2.6% (20% · 0.13%, assuming a linear CR-cloud relation-
26 days out of 81 as a result of the influence of Fd and GLE ship), but no such periodic variations in cloud have been iden-
events. We stress that the FDR noted here relates to the fre- tified in either ISCCP or MODIS cloud data at any atmospheric
quency with which the mean anomaly will appear as significant level over the past 26 years of satellite observations
and not the error around the mean anomaly (SEM). These error (Kristjánsson et al. 2004; Agee et al. 2012; Laken et al.
intervals of course relate to how accurately the mean value is 2012b, 2012c).
known, so in our case we have presented intervals which show Although the MC methods as shown in the previous exam-
with a ±95% accuracy the range in which the mean value is ples have many advantages over classical statistical tests (like
likely to occur. Consequently, the SEM ranges are regularly the Student’s t-test), we reiterate that there are situations where
seen to pass the p = 0.05 threshold. Evaluating the statistical MC methods could yield incorrect or imprecise estimates of
significance of both the mean and its error range will give a significance levels. Specifically, this may occur in instances
more robust indication of statistical significance than can be where: (1) there is a limited amount of data to generate unique
obtained from the mean value alone. For example, the t3 random samples. The total number of unique samples which
and t0 CR flux anomalies are unambiguously statistically signif- may be generated can be calculated as
icant (SEM are p < 0.00), however, although the t26 cloud
m!
anomalies show a mean and lower SEM which are statistically MCsims ¼ : ð3Þ
significant (p < 0.05), almost all of the upper SEM values exist n!ðm nÞ!
at p > 0.05, indicating that this is not a robust result, as should where MCsims is the number of unique simulations, n is the
be expected from a data point with its significance attributed to subsample size (i.e., size of composites), and m is the size
A29-p9
J. Space Weather Space Clim. 3 (2013) A29
0.6 the changes by comparing the post-Fd event values to the val-
3
ues of the normalization period. While these propositions are
reasonable, and this procedure has appeared in numerous pub-
0.4 lications (e.g., Kniveton 2004; Svensmark et al. 2009, 2012;
2 Dragić et al. 2011, 2013), this approach can have a considerable
impact on the statistical significance of the anomalies, as we
will now demonstrate.
Cloud cover anomaly (%)
0.2
In Figure 9 we present cloud cover anomalies (%) calcu-
commencing at t10 subtracted from the observed mean at each effective sample size (Neal 1993; Ripley 2009). We note that
t). From the resulting confidence intervals we clearly demon- the MC-approach of the red lines in Figure 8 and the blue lines
strate that the variations of t+5 are non-significant (p > 0.05). of Figure 9 is identical, and because the confidence intervals are
We note that methodological differences in generating the calculated for each time-point individually, it is able to correctly
anomalies between Figures 8 and 9 (black lines in both figures) account for the effects of restricted sample variability and auto-
have resulted in the t+5 decrease being exaggerated by 0.05% in correlation present in the data and exacerbated by the normali-
the later figure. Despite this, the anomaly appeared to be weakly zation procedure.
significant (p = 0.02) in Figure 8 but not in Figure 9
(p = 0.06). This is because the autocorrelation effects present 3.7. A note on adding dimensions
in the analysis of Figure 9 have also resulted in relatively wider
confidence intervals. This effect is also combined with the dif- The composites discussed thus far only concern area-averaged
ferent calculation of the t+5 anomalies, and as a result we obtain (one-dimensional) data. Such composites are usually from
two differing p-values. So which is correct? This is an inten- either point measurements (e.g., Harrison et al. 2011) or area-
tionally misleading question, as the two results are merely inde- averaged variations (e.g., Svensmark et al. 2009) with time as
pendent statements of the probability of obtaining values of an the considered dimension. A limitation of this approach is that
equal magnitude by chance when the data is treated in a certain it does not provide the capacity to differentiate between small
manner, including, excluding, or ignoring various effects. When changes over large areas, or large changes over small areas:
this result is interpreted objectively, examining both the mean i.e., one-dimensional composites of this nature only enable an
and error, longer-term variations, and excluding as many imper- evaluation of integrated anomalies.
tinent aspects of the data as is possible (e.g., by restricting influ- By considering the data at higher dimensions (i.e., over
ence from autocorrelations and minimizing noise), we may both temporal and spatial domains), differentiation between
readily reach the conclusion that the t+5 anomaly is not unusual localized high-magnitude anomalies, and low-magnitude
over the composite. large-area anomalies is possible. However, the increased com-
If we were to make the unsupported assumption that there plexity also requires increased caution, as a dataset with three
exists a connection between the t+5 variation in cloud and the dimensions (e.g., latitude, longitude, and time) is essentially a
Fd events, then we may interpret the data to suggest that an series of parallel, independent hypothesis tests (see the descrip-
approximate change in the CR flux of 1% may lead to global tion in the auxiliary material of Čalogović et al. 2010). To such
cloud changes of 0.12% (0.37/3.02, where 0.37 is the absolute data, the methods of one-dimensional composite analyses previ-
cloud anomaly at t+5 in Fig. 8b). This response is even larger ously described may be applied. Following proper construction
than our previously discussed upper limit value of 0.09% based of the composite anomalies, the area over which anomalies are
on Svensmark et al. (2009). Objectively considered, there is no statistically significant may then be evaluated; this may be done
evidence yet presented for accepting that the t+5 anomaly is by identifying a threshold p-value, and assessing the signifi-
more than stochastic variability, or for making the assumption cance of the data points independently. Summing the number
of a connection between the Fd events and cloud anomalies of statistically significant data points at each t of the composite
at t+5. However, it may always be favorably argued that the gives a simple integer measure, against which the normality of
noise levels of the experiment may mask a small CR-cloud sig- the anomalies may be evaluated: this is referred to as the field
nal overwhelmed by noise. Indeed, experiments of this manner significance (for further details, see Storch 1982; Livezey &
may only restrict the potential magnitude of such a signal by Chen 1983; Wilks 2006).
increasing experimental precision, but may never disprove it If autocorrelation effects and the presence of factors influ-
entirely. encing the data are absent (i.e., for an idealized sequentially
To the previous point regarding autocorrelation in the data independent random dataset), the percentage of significant pix-
and its influence on the width and development of the confi- els over the integrated field should only reflect the false discov-
dence intervals over t we add some further remarks. If the com- ery rate (FDR) associated with the chosen p-value. However, as
posited cloud data of Figure 9 were to be replaced by an previously noted, autocorrelation effects are normally a com-
imaginary time series absent of autocorrelation (pure white mon feature of geophysical data. Cloud data show both spatial
noise), and these data were treated in the same manner and temporal autocorrelation, and thus the number of significant
(with a normalization period of five days), the resulting MC- data points at any given time is likely to be greater than
calculated confidence intervals (equivalent to the blue lines of expected from calculations of the FDR alone. The field signif-
Fig. 9) would also display some weak non-stationary character- icance may be effectively assessed via a variety of approaches
istics similar to those of Figure 9. The confidence intervals such as the Walkers test (Wilks 2006). In addition, MC-
would be relatively narrow around the normalization period approaches similar to those previously described in this work
and expand (to a nearly static value) outside of the normaliza- could be readily adapted to calculate distributions of randomly
tion period. The presence of persistent noise enhances these generated field significance against which to calculate a
characteristics, resulting in narrower confidence intervals during p-value.
the normalization period, which then expand with increasing While integrated (one-dimensional) composites are only
time from the normalization period. The duration of this expan- able to test the net amplitude of anomalies, multi-dimensional
sion and the final width of the confidence intervals are related to composites may identify the extent of significant anomalies
the amount of persistent long-memory noise within the by creating a series of independent parallel hypothesis tests.
data. However, a limiting weakness of such an approach, as previ-
We again reiterate that traditional statistical tests (such as ously noted, is the constraint of small sample areas on the sig-
the Student’s t-test) may also be used to calculate significance nal-to-noise ratio (Fig. 5); this makes it less likely that a low-
after an adjustment in the assumed degrees of freedom (Forbush amplitude signal may be reliably detected over a small area.
et al. 1982, 1983), which can be done by calculating the Therefore, at least in the context of studies discussed in this
A29-p11
J. Space Weather Space Clim. 3 (2013) A29
work, we suggest that multi-dimensional studies should be used signal-to-noise ratios connected to spatio-temporal restrictions;
in conjunction with one-dimensional time series analyses; e.g., (2) interference from variability in data at timescales greater
to demonstrate that any statistical significance identified in one- than those concerning hypothesis testing, which may not neces-
dimensional tests does not result from localized high-amplitude sarily be removed by accounting for linear trends over the com-
noise. posite periods; (3) normalization procedures which affect both
the magnitude of anomalies in composites, and estimations of
3.8. Estimating statistical significance in subpopulations their significance; (4) the application of statistical tests unable
to account for autocorrelated data and biases imposed by the
The MC-based estimation of statistical significance we have use of improper normalization procedures.
described in this work has numerous advantages over tradi- Statistical methods for correctly assessing significance in
tional significance testing methods. However, it is not without composites taking into account effective sample sizes have been
limitations as previously stated at the end of Section 3.5. To previously established (Forbush et al. 1982, 1983; Singh 2006;
these limitations we add a further caveat: MC-methods may Dunne et al. 2012). To these procedures, we add the composite
provide inaccurate results where composites are constructed construction outlined in this work, and a further robust proce-
out of subpopulations of an original (parent) dataset. In the dure for the estimation of significance based on a Monte Carlo
MC experiments presented in this work, we have assumed that approach.
the chances of any data point of a time series being included in
a MC are equal. This is true, as the chance of a Fd event occur- Acknowledgements. We kindly thank Beatriz González Merino (In-
ring is essentially random with respect to Earths atmosphere stituto de Astrofı́sica de Canarias), Thierry Dudock de Wit (Univer-
(although to this point we note that Fd occurrence tends to clus- sity of Orleans), Jefrey R. Pierce (Colorado State University), Joshua
ter around the period of solar maximum). However, it is not Krissansen-Totton (University of Auckland), Eimear M. Dunne
unusual for composite samples to be further constrained, adding (Finnish Meteorological Institute), and two anonymous referees for
additional selection criteria to composite events. Consequently, their helpful comments. The list of Forbush decrease and Ground Le-
the resulting composites are created from subpopulations of the vel Enhancements was obtained from http://www.ngdc.noaa.gov/stp/
solar/cosmic.html. Cosmic ray data were obtained from the Solar
parent dataset, and thus their significance may not effectively be
Terrestrial physics division of IZMIRAN from http://helios.
assessed by drawing random samples from the parent dataset. izmiran.rssi.ru. The ISCCP D1 data are available from the ISCCP
In Section 3.3 we describe several studies where composites Web site at http://isccp.giss.nasa.gov/, maintained by the ISCCP re-
were restricted by Fd intensity, a common analysis approach. search group at the NASA Goddard Institute for Space Studies. This
To continue with this example, imagine we were to restrict work recieved support from the European COST Action ES1005
the n = 55 Fd events described in Section 3.3, to the most (TOSCA).
intense (IFd 10%) n = 6 events and we wished to identify
the p-value of the t0 composite mean. To properly account
for the sample restriction using a MC approach would require References
the creation of numerous n = 55 samples from the parent data- Agee, E.M., K. Kiefer, and E. Cornett, J. Clim., 25 (3), 1057, 2012.
set (we shall refer to these samples as parent composites). Arnold, F., Space Sci. Rev., 125 (1), 169–186, 2006.
Importantly, each of the parent composites needs to have the Artamonova, I., and S. Veretenenko, J. Atmos. Sol.-Terr. Phy., 73 (2),
correct statistical properties, e.g., in this instance, a mean at t0 366–370, 2011.
comparable to that of the composite prior to restriction (i.e., Bondur, V., S. Pulinets, and G. Kim, Doklady Earth Sciences
the n = 55 Fd composite at t0). MC populations would then (Springer), 422, 1124–1128, 2008.
need to be constructed from n = 6 subsamples, drawn from Čalogović, J., C. Albert, F. Arnold, J. Beer, L. Desorgher, and
E. Flueckiger, Geophys. Res. Lett., 37 (3), L03802, 2010.
the parent composites. We may then use PDFs of the MC Cane, H.V., Space Sci. Rev., 93 (1), 55–77, 2000.
results to identify the p-value of the t0 mean for the IFd 10% Chree, C., Philos. Trans. R. Soc. London Ser. A, 212, 75–116, 1913.
composite. Chree, C., Philos. Trans. R. Soc. London Ser. A, 245–277, 1914.
This change in methodology is necessary as the hypothesis Dickinson, R.E., Bull. Am. Meteorol. Soc., 56, 1240–1248, 1975.
test is no longer concerned with determining the chance of Dragić, A., I. Anicin, R. Banjanac, V. Udovicic, D. Jokovic, D.
obtaining a t0 mean of n = 6 randomly, but rather, it is now Maletic, and J. Puzovic, Astrophys. Space Sci. Trans., 7, 315–318,
concerned with determining the chance of obtaining a t0 mean 2011.
from n = 6 subpopulation when you begin with a parent com- Dragić, A., N. Veselinović, D. Maletić, D. Joković, R. Banjanac, V.
Udovičić, and I. Aničin, Further investigations into the connec-
posite of n = 55 with a specific mean value at t0. i.e., the ques- tion between cosmic rays and climate [arXiv:1304.7879],
tion becomes, if a group of events of n = 55 has a specific 2013.
mean, what are the chances that a subsample of n = 6 of these Dumbović, M., B. Vršnak, J. Čalogović, and R. Župan, A&A, 538,
events will have another specific mean? 2012.
Dunne, E.M., L.A. Lee, C.L. Reddington, and K.S. Carslaw, Atmos.
Chem. Phys., 12 (23), 11573–11587, 2012.
Egorova, L.V., V.Y. Vovk, and O.A. Troshichev, J. Atmos. Sol.-Terr.
4. Conclusions Phy., 62, 955–966, 2000.
Although numerous composite analyses have been performed Fedulina, I., and J. Laštovička, Adv. Space. Res., 27 (12), 2003–
to examine the hypothesized link between solar activity and cli- 2006, 2001.
Forbush, S.E., Phys. Rev., 54 (12), 975, 1938.
mate, widely conflicting conclusions have been drawn. In this Forbush, S., S. Duggal, M. Pomerantz, and C. Tsao, Rev. Geophys.
paper we have demonstrated that the cause of these conflicts Space Phys., 20 (4), 971976, 1982.
may relate to differences in the various methodological Forbush, S., M. Pomerantz, S. Duggal, and C. Tsao, Sol. Phys.,
approaches employed by these studies and the evaluation of 82 (1), 113–122, 1983.
the statistical significance of their results. We find that numer- Harrison, R.G., and M.H. Ambaum, J. Atmos. Sol.-Terr. Phy.,
ous issues may affect the analyses, including: (1) issues of 72 (18), 1408–1414, 2010.
A29-p12
B.A. Laken & J. Čalogović: Composite analysis with MC methods
Harrison, R.G., M.H. Ambaum, and M. Lockwood, Proc. R. Soc. Pudovkin, M., and S. Veretenenko, J. Atmos. Terr. Phys., 57 (11),
London Ser. A, 467 (2134), 2777–2791, 2011. 1349–1355, 1995.
Harrison, R.G., and D.B. Stephenson, Proc. R. Soc. London Ser. A, Pudovkin, M., S. Veretenenko, R. Pellinen, and E. Kyrö, Adv. Space.
462 (2068), 1221–1233, 2006. Res., 20 (6), 1169–1172, 1997.
Kniveton, D., J. Atmos. Sol.-Terr. Phy., 66 (13–14), 1135–1142, Ripley, B.D., Stochastic simulation (Wiley), 316, 2009.
2004. Schuurmans, C., and A. Oort, Pure Appl. Geophys., 75 (1), 233–246,
Kristjánsson, J., J. Kristiansen, and E. Kaas, Adv. Space. Res., 34 (2), 1969.
407–415, 2004. Singh, Y.P., Statistical considerations in superposed epoch analysis
Kristjánsson, J.E., C.W. Stjern, F. Stordal, A.M. Fjsraa, G. Myhre, and its applications in space research. J. Atmos. Sol.-Terr. Phy., 68
and K. Jonasson, Atmos. Chem. Phys., 8 (24), 7373–7387, 2008. (7), 803–813, 2006.
Laken, B., and J. Čalogović, Geophys. Res. Lett., 38 (24), L24811, Sloan, T., and A.W. Wolfendale, Environ. Res. Lett., 3 (2), 024001,
2011. 2008.
Laken, B., and D. Kniveton, J. Atmos. Sol.-Terr. Phy., 73 (2), 371– Storch, H.V., J. Atmos. Sci., 39, 187–189, 1982.
376, 2011. Stozhkov, Y.I., I. Martin, G. Pellegrino, H. Pinto, G. Bazilevskaya, P.
Laken, B., A. Wolfendale, and D. Kniveton, Geophys. Res. Lett., 36, Bezerra, V. Makhmutov, N. Svirzevsky, and et al., IlNuovo
L23803, 2009. Cimento C, 18 (3), 335–341, 1995.
Laken, B., D. Kniveton, and A. Wolfendale, J. Geophys. Res., 116, Svensmark, H., T. Bondo, and J. Svensmark, Geophys. Res. Lett.,
D09201, 2011. 36 (15), L1501, 2009.
Laken, B., J. Čalogović, T. Shahbaz, and E. Palle, J. Geophys. Res., Svensmark, J., M. Enghoff, and H. Svensmark, Atmos. Chem. Phys.
10–1029, 2012a. Discuss., 12 (2), 2012.
Laken, B., E. Pallé, and H. Miyahara, J. Clim., 2012b. Tinsley, B.A., G.M. Brown, and P.H. Scherrer, J. Geophys. Res.,
Laken, B., E. Pallé, J. Čalogović, and E. Dunne, J. Space Weather 94 (D12), 14783–14, 1989.
Space Clim., 2 (A18), 13, 2012c. Tinsley, B.A., and G.W. Deen, J. Geophys. Res., 96 (22), 283–22,
Lam, M., and A. Rodger, J. Atmos. Sol.-Terr. Phy., 64 (1), 41–45, 1991.
2002. Todd, M.C., and D.R. Kniveton, J. Geophys. Res., 106 (D23),
Livezey, R.E., and W. Chen, Mon. Weather Rev., 111, 46–59, 32031–32032, 2001.
1983. Todd, M.C., and D.R. Kniveton, J. Atmos. Sol.-Terr. Phy., 66 (13),
Lockwood, J.A., Space Sci. Rev., 12 (5), 658–715, 1971. 1205–1211, 2004.
Mironova, I., I. Usoskin, G. Kovaltsov, and S. Petelina, Atmos. Troshichev, O., V. Vovk, and L. Egorova, J. Atmos. Sol.-Terr. Phy.,
Chem. Phys., 12, 769–778, 2012. 70 (10), 1289–1300, 2008.
Neal, R.M., Technical Report CRG-TR-93-1, 1–144, 1993. Wang, M., L. He, and H. Jia, High Energy Phys. Nucl. Phys.-Beijing,
Ney, E.P., Nature, 183, 1959. 30 (1), 75, 2006.
Okike, O., and A.B. Collier, General Assembly and Scientific Wilks, D., J. Clim., 10 (1), 65–82, 1997.
Symposium, 2011 XXXth URSI IEEE, 1–4, 2011. Wilks, D., J. Appl. Meteorol. climatolo., 45 (9), 1181–1189, 2006.
Cite this article as: Laken B & Čalogović J: Composite analysis with Monte Carlo methods: an example with cosmic rays and clouds.
J. Space Weather Space Clim., 2013, 3, A29.
A29-p13