You are on page 1of 8

2019 IEEE International Conference on Systems, Man and Cybernetics (SMC)

Bari, Italy. October 6-9, 2019

Inferring the temporal structure of directed functional connectivity in


neural systems: some extensions to Granger causality*
Lionel Barnett1† and Anil K. Seth1

Abstract— Neural processes in the brain operate at a range building on [20], we present quantitative statistics for multi-
of temporal scales. Granger causality, the most widely-used step GC, and demonstrate how they may be estimated in
neuroscientific tool for inference of directed functional connec- sample and deployed for statistical inference. We also present
tivity from neurophsyiological data, is traditionally deployed
in the form of one-step-ahead prediction regardless of the an infinite-future GC statistic which summarises the total
data sampling rate, and as such yields only limited insight directed connectivity at all accessible predictive time scales,
into the temporal structure of the underlying neural processes. and a single-lag GC which forensically identifies individual
We introduce Granger causality variants based on multi-step, causal feedback at specific time lags. The intention is that
infinite-future and single-lag prediction, which facilitate a more these statistical tools facilitate unpicking the rich multi-
detailed and systematic temporal analysis of information flow
in the brain. scale temporal details of directed functional interactions in
complex neural dynamics.
I. I NTRODUCTION
II. W IENER -G RANGER CAUSALITY
Granger causality (henceforth GC) [1], [2] is a statistical,
predictive notion of causal influence originally developed in Wiener-Granger causality [23], [1], [2] is premised on a
econometrics, which may be inferred from time-series data, notion of causation whereby cause (i) precedes effect, and
and intuitively interpreted as information flow [3], [4]. Over (ii) contains unique information about the effect. Formally,
the past couple of decades it has rapidly become a popular we suppose given a discrete-time, n-dimensional vector1
tool for the inference from neurophysiological time series stochastic process u = { ut | t ∈ Z} representing the “uni-
data of time-directed functional (i.e., statistical) relationships verse of available information”. We introduce the notation
in the underlying neural dynamics. ut1 :t2 for the range { ut | t1 ≤ t ≤ t2 }, so that in particular
Brain recording modes such as M/EEG, ECoG and fMRI u−∞:t = { us | s ≤ t} denotes the infinite history of u up
may be characterised as the discrete, regular sampling of to and including time t.
continuous-time analogue signals associated with underlying Suppose now that u is partitioned into non-overlapping
neural processes [5]. Due to variation in biophysical param- sub-processes, ut = [xTt ytT ztT ]T , of dimension nx , ny , nz
eters such as axonal length, diameter, conduction velocity, respectively. We say that y does not Granger-cause x (at
myelination and synaptic delay [6], [7], [8], such processes time t) iff
typically feature signal propagation delays at a range of 
[y]

time scales. Typical application of GC, however, involves P(xt+1 | u−∞:t ) = P xt+1 u−∞:t , (1)
prediction only at the time scale of a single time step
where P(· | · ) denotes conditional distribution, and u[y] =
into the future with respect to the chosen sampling rate.
[xTt ztT ]T the “reduced” universe of information with y
Econometricians have long known that time aggregation
omitted. Intuitively, (1) says that removing the influence of
of signals engendered by discrete subsampling can induce
y from the historical information set makes no difference to
spurious GC inference [9], [10], [11], [12], [13], [14] and
the statistical distribution of x at the next time step, and we
confound detection of actual GCs [15], [16], [17], [5].
say that y Granger-causes x iff (1) does not obtain. Granger
There has, in addition, been an awareness that restriction
[1], [2] operationalised this definition in terms of (linear)
to single-step prediction may obscure the temporal details of
prediction:
causal interactions within a system [18], [19], [20], poten-
tially leading to misinterpretation of GC inferences. Dufour y Granger-causes x iff the history of y improves
and Renault in particular [20] present a thorough analysis prediction of the future of x beyond the extent to
of GC based on multiple prediction horizons, deriving alge- which x is already predicted by all other available
braic conditions for (non-)causality between constituent sub- historical information, including that of x itself.
processes at all time scales. Faes et al. [21], [22] present Of course in practice, the “universe of available information”
a distinct approach, where causal time scales are explored will be restricted to a specified set of accessible observables.
through causal filtering followed by downsampling. Here, Granger causality was subsequently quantified by Geweke
[24], [25] as a log-likelihood ratio statistic, and more recently
*This work was supported by the Dr. Mortimer and Theresa Sackler [3], [4] granted an information-theoretic (non-parametric)
Foundation.
1 Sackler Centre for Consciousness Science, Department of Informatics, interpretation in terms of the closely-related transfer entropy
University of Sussex, Falmer, Brighton, BN1 9QJ, United Kingdom
† Corresponding author: l.c.barnett@sussex.ac.uk 1 All vector quantities are taken to be column vectors.

978-1-7281-4569-3/19/$31.00 ©2019 IEEE 4395


h i
[y] [y] [y]T
[26], [27] (in fact if all stochastic variables are jointly Gaus- is Σxx where Σ[y] = E εt εt . Following [25] the
sian [3], the concepts coincide). Under this interpretation— Granger-Geweke causality statistic is defined as
which we prefer—Granger causality represents a “flow of
[y]
information” from the process y to the process x. Σxx
Fy→x|z = log (6)
III. G RANGER -G EWEKE CAUSALITY |Σxx |
Suppose now that the process u is covariance-stationary, and we have, in particular,
and without loss of generality we assume it to be zero-mean.
Then by Wold’s theorem [28], [29], u has a moving average Fy→x|z = 0 ⇐⇒ Axy (z) ≡ 0 . (7)
(MA) representation In finite sample, where the infinite histories are truncated

X at some model order p and models (3) and (5) estimated
ut = Bk εt−k , or ut = B(z)εt , (2) by maximum likelihood (e.g., an OLS), the estimated gener-
k=0 alised variances are proportional to the likelihoods, and the
where ε is a white noise process with nonsingular covariance sample estimator F̂y→x|z is a log-likelihood ratio statistic. In
matrix Σ = E[εt εTt ], z is the lag (backshift) operator2 (so that this scenario, (3, 5) are nested linear autoregression models,
z·εt = εt−1 , etc.), P
and the MA operator (transfer function) is and the null hypothesis of vanishing Granger causality is

given by B(z) = k=0 Bk z k with Bk the MA coefficient
matrices, and B0 = I (the identity matrix), so that B(z) H0 : Ak,xy = 0 , k = 1, . . . , p . (8)
is causal (does not reference the future). We also assume Thus, by the standard large-sample theory [34], [35], under
the minimum-phase condition that B(z) is nonsingular on the null hypothesis (8) the maximum-likelihood estimator
the closed unit disc in the complex plane, so that the MA F̂y→x|z , scaled by sample size, converges in distribution to
representation (2) may be inverted to yield a stable, causal a central χ2 (d) with degrees of freedom given by d = pnx ny ,
autoregressive (AR) representation and a non-central χ2 (d; λ) with non-centrality parameter

X λ = Fy→x|z under the alternative hypothesis3 . We note that
ut = Ak ut−k + εt , or A(z)ut = εt , (3) Fy→x|z is strictly non-negative, and thus biased in sample.
k=1
P∞ The above finite-sample analysis assumes that the AR
where A(z) = B(z)−1 = I− k=0 Ak z k is also nonsingular models (3, 5) are independently estimated. It is, however,
on the closed unit disc (see, e.g., [30], [28], [24]). known [36] that this may be problematic, in particular for
Granger considered prediction in the linear least-squares spectral (frequency-domain) Granger-Geweke causality [24],
sense. The optimal linear prediction of ut+1 given its history [25] (which we do not address here). In fact, from the Spec-
u−∞:t is the conditional expectation [31] tral Factorisation Theorem [30] it follows that the reduced

X model (5) may be deduced from the full model (3), leading
E[ut+1 | u−∞:t ] = Ak ut+1−k , (4) to more powerful and less biased GC estimators. There are
k=1 several approaches to effecting this computationally, in the
with residual prediction error εt+1 . Following [24], predic- frequency domain [37], [38], [39] and in the time domain
tion error is quantified by the determinant |Σ| of the residuals [40], [41]. More recently, [42], [14] show how this may
covariance matrix, also known as the generalised variance be efficiently accomplished using state-space methods [43]4 ;
[32], [33]. Considering now the partition ut = [xTt ytT ztT ]T , there are, furthermore, other compelling reasons to estimate
the optimal linear prediction E[xt+1 | u−∞:t ] of xt+1 given Granger-Geweke causality via state-space rather than AR
the full history u−∞:t has prediction error εx,t+1 with modelling [42], [14].
generalised variance |Σxx | (here subscript ‘x’ denotes x-
IV. M ULTI - STEP G RANGER CAUSALITY
component).
h We icontrast this with the optimal prediction
[y]
E xt+1 u−∞:t of xt+1 on the reduced universe of his- In the traditional approach, Granger causality is usually
considered only in terms of one-step-ahead prediction [cf.,
torical information u[y] = [xTt ztT ]T , where y is omitted; cf., (1, 4)]; but see, e.g., [19], [20]. However, as noted in Section I
(1) . This is derived from the reduced AR representation the duration of a single time step will vary according to

[y] [y] [y] [y] [y] [y]
X
ut = Ak ut−k + εt , or A[y] (z)ut = εt , (5) 3 We remark that in sample, the scaled lack-of-fit sum of squares
 
[y]
k=1 [trace Σxx − trace(Σxx )]/ trace(Σxx ) is asymptotically F -distributed
[y] under H0 , furnishing an alternative and more statistically powerful test for
so that
h the optimaliprediction of ut+1 on its own history the null. However, the F-statistic lacks an information-theoretic interpre-
[y] [y] P∞ [y] [y] tation, as well as some crucial invariance properties [33], [15] of the log-
is E ut+1 u−∞:t = k=1 k ut+1−k , and the gener-
A
h i likelihood ratio form, and is thus less satisfactory as a measure of magnitude
[y] of Granger-causal effect.
alised variance for the optimal prediction E xt+1 u−∞:t 4 Sample statistics derived from single full-model estimation with spectral
factorisation, however, fail to satisfy the requirements for the large-sample
2 Note that in the literature, the lag operator is sometimes taken as z −1 . theory; in lieu of known distributions for these estimators, independent
In the spectral domain, z may be viewed as residing on the unit circle in estimates of the full and reduced models or standard subsampling/surrogate
the complex plane: z = e−iω , where ω is the phase angle in radians. methods, may be considered preferable for statistical inference.

4396
the sampling rate, and the magnitude of reported Granger- In contrast to the 1-step case (7), this condition will generally
Geweke causality will depend crucially on the relationship be nonlinear—specifically, a series of matrix polynomial
between sampling frequency and underlying time scales of identities of order h—in the AR coefficients Ak . In the
neural signal transmission [5]. This suggests we examine unconditional case z = ∅, it may be shown [44], [45] that
more closely Granger causality based on an arbitrary future (h)
Fy→x = 0 ⇐⇒ Fy→x ∀h > 0 ; (17)
prediction horizon. A notion of (non-)causality consonant
with the measure we consider was introduced in [20]; here, however, in the conditional case, neither implication holds
for the first time (as far as we are aware), we quantify this in general [19], [20]. We note also from (2, 10, 11), that as
(h) [y](h)
notion with a Granger-Geweke statistic. h → ∞, both Σxx and Σxx → E[xt xTt ], the covariance
We require an expression for E[ut+h | u−∞:t ], h = matrix of x itself, implying [5]
1, 2, . . .; that is, optimal linear prediction at an arbitrary (h)
future prediction horizon h (but note that the historical lim Fy→x|z = 0 . (18)
h→∞
predictor set u−∞:t remains the same as for conventional Related analysis in a continuous-time scenario [5] suggests
1-step GC). In this case the h-step optimal prediction is that convergence in (18) is exponential.
more simply expressed in terms of the MA, rather than AR From (12) it follows that we again have nested (h-step AR)
representation [cf., (4)]. We have [31] models, so that in finite sample with truncation at p ≥ h,

X the null hypothesis of vanishing h-step Granger causality is
E[ut+h | u−∞:t ] = Bk εt+h−k , (9) [cf., (8)]
k=h
(h)
with residual errors H0 : Ak,xy = 0 , k = h, . . . , p , (19)
h−1 and again the scaled maximum-likelihood sample estimator
(h)
X
εt = Bk εt+h−k (10) (h)
for F̂y→x will be asymptotically χ2 (d) under the null
k=0 hypothesis (19), now with d = (p − h + 1)nx ny . Com-
[henceforth we use the round-bracket ‘(h)’ to indicate a putationally, multi-step GC may be estimated from AR or
prediction horizon h steps into the future]. Note that in state-space models, using (11, 15). For AR modelling, the
general ε(h) will not be a white noise process. The residuals MA coefficients may be calculated recursively using
covariance matrix is given by k−1
X
h i h−1 Bk = Ak + B` Ak−` , k = 2, 3, . . . , (20)
(h) (h)T
X
Σ(h) = E εt εt = Bk ΣBkT . (11) `=1
k=0 with B1 = A1 . For state-space modelling, calculation of the
Ph−1
Setting B (h) (z) = Bk z k and A(h) (z) = Bk is even more straightforward (see [42], eq. 4).
(h)
P∞ k=0 (h) k
B (z)A(z) = I − k=h Ak z , we may derive the h- V. F ULL - FUTURE G RANGER CAUSALITY
lagged AR form [cf., (3)]
Historically, the main emphasis of Granger causality anal-

X (h) (h) (h) ysis, especially in the econometrics literature, has been on
ut = Ak ut−k + εt , or A(h) (z)ut = εt , (12)
statistical inference of (non-)causality. However, in light
k=h
of the more recent interpretation of GC as a measure of
and the AR expression for the optimal h-step linear predic- information flow [3], [4], the Granger-Geweke statistic stands
tion [cf., (4)] as an effect size, which quantifies this information flow.

X (h)
This perspective seems to us particularly appropriate and
E[ut+h | u−∞:t ] = Ak ut+h−k . (13) intuitive with regard to functional analysis of neural systems.
k=h The conventional 1-step prediction GC statistic, however,
(h) may be considered potentially misleading as a comparative
The Ak satisfy the recursion relations [20]
(h+1) (h) (h)
effect size, insofar as it fails to take into account neural
Ah+k = Ah+k + Ah Ak , h, k = 1, 2, . . . , (14) time scales and their interplay with sampling rate. It would
(1) thus be useful to have (in addition to the multi-step GC of
with Ak = Ak .
Section IV), a summary GC measure of the total information
We now define h-step Granger-Geweke causality by anal-
flow between variables; i.e., from infinite past to infinite
ogy with (6) as [5]
future. This motivates our introduction of a “full-future” GC
[y](h) measure, based on past-conditional prediction of the infinite
Σ xx
(h)
Fy→x|z = log , (15) future; that is, E[ut+1:∞ | u−∞:t ].
(h)
Σxx We may calculate that the residuals covariance matrix of
h
[y](h) [y](h)T
i Ph−1 [y] [y]T
the prediction E[xt+1:t+h | u−∞:t ] of the future of x up to
where Σ[y](h) = E εt εt = k=0 Bk Σ[y] Bk , horizon t + h from the full process history u−∞:t , is given
and we have [cf., (7)] by the (h × h)-block matrix
(h) (h) {h}
Fy→x|z = 0 ⇐⇒ Axy (z) ≡ 0 . (16) Σxx = [Σp,q ]xx , p, q = 0, . . . , h − 1 (21)

4397
[note: we use curly braces {h} to distinguish the full-future 2
prediction horizon from the multi-step horizon (h)], where
h−1
X h−1
X 5 11
Σp,q = δp−k,q−` Bk ΣB`T . (22)
k=0 `=0

This may be written 1


{h}
Σxx = Bx{h} Σ⊗h Bx{h}T , (23) 8
with
20 4
4 3 5
Bx{h} =
 
B0,xu 0 0 ··· 0
 B1,xu B0,xu 0 ··· 0  Fig. 1. Causal structure of the simple 5-variable AR model (Section V).
  Numbers in red denote AR lags; see TABLE I and text for details.
 B2,xu
 B1,xu B0,xu ··· 0 
 (24)
 .. .. .. .. .. 
 . . . . .  target (x) source (y) AR lag AR coefficient
Bh−1,xu Bh−2,xu Bh−3,xu ··· B0,xu 1 2 11 0.221
2 1 5 0.306
where the index u denotes all components {x, y, z}, and 3 1 8 −0.403
  4 3 20 −0.215
Σ 0 ··· 0 3 5 4 0.352
0 Σ · · · 0
Σ⊗h =  . . . (25)
 
. . . ..  TABLE I
. . . .
S IMPLE 5- VARIABLE AR MODEL PARAMETERS (S ECTION V).
0 0 ··· Σ
is (h × h)-block-diagonal. h i
[y]
For the reduced prediction E xt+1:t+h u−∞:t , we ob-
[y]{h}
We demonstrate multi-step and full-future GC with a
tain Σxx the same way, replacing Σ with Σ[y] and Bk simple AR model with n = 5 variables (Fig. 1), and model
[y]
with Bk , and we define the full-future Granger causality as order p = 20. All AR coefficients were zero except for the
lag-1 self-regression terms A1,ii , and the lagged coefficients
{∞} {h}
Fy→x|z = lim Fy→x|z , (26) set out in TABLE I. The only non-zero multi-step and full-
h→∞
future GCs are plotted in Fig. 2, calculated according to (15)
where and (27) respectively for prediction horizon h = 1, . . . , 32,
[y]{h}
{h}
Σxx with, for each directed pair of variables x, y, full conditioning
Fy→x|z = log . (27)
{h} on all remaining variables z. Note that the both the multi-step
Σxx
and full-future GCs coincide with the conventional 1-step GC
We conjecture that under appropriate conditions (cf., Sec- (6) at prediction horizon h = 1. Vertical grey lines indicate
tion II) the limit in (27) exists; this has been verified by the AR lag of the causal interaction (TABLE I). We see that
(h)
extensive simulation. (cf., [5]) the Fy→x|z decay rapidly to zero beyond the causal
In sample with finite AR model order, the null hypothesis {h}
horizon, while the Fy→x|z rise and then quickly plateau
for vanishing F {∞} y → x|z is identical to the null (8) {∞}
beyond the causal horizon to the limiting value Fy→x|z (26).
for 1-step Granger causality (6). This follows from the
recursion relations (14) and expanding out the prediction
E[xt+1:t+h | u−∞:t ]. Thus the statistic is not useful in VI. S INGLE - LAG G RANGER CAUSALITY
its own right for statistical inference, and should rather A more fine-grained analysis of directed functional con-
be considered an informative quantitative measure of total nectivity may be interrogated as follows: if a variable y
past → future information flow between two variables. Granger-causes the variable x, at which specific time scale(s)
We have not found a closed formula for the determinants in is causal feedback concentrated? Note that multi-step GC
(27), but they may be approximated numerically; extensive (Section IV) does not directly address this question, since
simulations suggest that, although the size of the matrices there all lags of a predictive (source) variable are considered
{h}
Bx scale quadratically in h, convergence to the limit in together. Rather, for a specific lag τ > 0, we consider a
(27) is again exponential (cf., Fig. 2). We remark that in Granger statistic based on the null hypothesis
{∞} P∞ (h)
general, Fy→x|z will not be equal to the sum h=1 Fy→x|z
(h) H0 : Aτ,xy = 0 , (28)
of multi-step Granger causalities, since the residuals εt of
the latter
(10) forQdifferent h will in general be correlated, where Ak are the AR coefficient matrices in (3). We thus
h (k)
so that Σ{h} 6= k=1
Σ . compare the optimum prediction E[xt | u−∞:t−1 ] with the

4398
<τ>
multi-step GC 0.5 2→1 and the scaled sample estimator F̂y→x|z will thus be asymp-
full-future GC 2
totically χ (d) with d = nx ny . We remark that interpretation
0.4 <τ>
of Fy→x|z as an effect size for a putative “information flow”
0.3 <τ>
is somewhat moot; we may prefer to consider Fy→x|z simply
0.2 as a test statistic for inference of (the absence of) a causal
feedback from source to target variable at the given lag.
0.1
Unlike the previous GC measures, we do not have (given
0 full-model parameters) a construction for a state-space model
1 2 4 8 16 32 which represents the reduced model (28). The reduced model
prediction horizon (h) parameters may, however, still be solved computationally
1→2 1 1→3 from the Yule-Walker equations [19]. The full-model Yule-
0.4
Walker equations up to lag q yield
0.8
0.3
0.6 Σ = Γ0 − Γq Λ−1 T
q Γq , (31)
0.2  
0.4 with Γk = E ut uTt−k , k = . . . , −2, −1, 0, 1, 2, . . . the
0.1 autocovariance sequence—which may itself be derived from
0.2
the (estimated) full-model AR coefficients [41]—and
0 0  
1 2 4 8 16 32 1 2 4 8 16 32 Γq = Γ1 · · · Γq (32)
prediction horizon (h) prediction horizon (h)
 
0.5 5→3 3→4
Γ0 ··· Γq−1
 .. .. ..  .
0.8 Λq =  . . .  (33)
0.4
0.6
ΓTq−1 ··· Γ0
0.3

0.2 0.4 The reduced Yule-Walker solution for Σ<y;τ > is then ob-
tained as per (31), after deleting the y-columns of the
0.1 0.2 τ -th block-column in Γq , and the y-rows/columns of the
0 0 (τ − 1)-th block-row/column in Λq . Even though Λq may
1 2 4 8 16 32 1 2 4 8 16 32 be quite large5 , it is positive-definite Toeplitz and thus may
prediction horizon (h) prediction horizon (h) be Cholesky-decomposed and efficiently inverted.
<τ>
We envisage estimating Fy→x|z from the data for τ =
(h)
Fig. 2. Multi-step GC Fy→x|z (blue lines) and full-future GC Fy→x|z
{h} 1, . . . , p in turn (where the maximum lag p—the model
(blue lines) for pairs of variables x, y, plotted against prediction horizon h order for the full AR model (29)—is selected via a standard
(log-scale) for pairs of Granger-causal variables in a simple AR model with scheme), in order to ascertain the time scale(s) at which y
lagged causal feedback (see TABLE I and text for details). <τ>
influences x. See Fig. 3, where the Fy→x|z , x, y = 1, . . . , n,
h i x 6= y, are estimated in sample for a data sequence of length
<y;τ > 1000 generated from the AR model of Section V. Here z
optimum prediction E xt u−∞:t−1 , where the superscript
‘< y; τ >’ indicates that the single lag yt−τ of y is omitted denotes all other variables except the given x, y, so that
from the historical predictor set u−∞:t−1 . To make this every directed pairwise GC is conditioned on all remaining
clearer, consider the x-component of the AR representation variables, yielding the “Granger-causal graph” [41] at all
(3) of ut : lags up to p = 20. Likelihood-ratio single-lag GC statistics
(blue boxes) were calculated for separate OLS estimates
xt = A1,xx xt−1 + A2,xx xt−2 + . . . of the full and (for each j, τ ) reduced models (29) using
+ A1,xx yt−1 + A2,xx yt−2 + . . . + Aτ,xy yt−τ + . . . the (known) model order p = 20, while analytic GCs for
the model (black horizontal bars) were calculated from the
+ A1,xz zt−1 + A2,xz zt−2 + εxt . (29) actual model parameters (TABLE I) using the Yule-Walker
The reduced AR representation then omits the boxed lag-τ procedure described above with q = 175 autocovariance
y regressor, and we define the single-lag Granger causality lags, which was sufficient to ensure that the Γk decay to
as near-machine precision. The red horizontal lines mark the
<τ> |Σ<y;τ > | critical GC level for rejection of the null hypotheses (28)
Fy→x|z = log xx , (30)
|Σxx | of zero single-lag GC at significance α = 0.05 according
where Σxx <y;τ >
is the residuals covariance matrix for the to the χ2 (1) estimator distribution, assuming a Bonferroni
reduced AR model. We note that Fy→x|z <τ>
= 0 ∀τ > 0 ⇐⇒ correction for all pn(n−1) hypotheses. We see that statistical
Fy→x|z = 0 5 For reasonable numerical precision we need sufficient lags q that Γ ≈ 0
k
The regression (29) with the null condition (28) represents for k > q, which will in turn depend on the spectral radius of A(z) [46];
a nested linear model, so that the large-sample theory applies, see e.g., [41].

4399
2→1 3→1 4→1 5→1

0.15 0.15 0.15 0.15

0.1 0.1 0.1 0.1

0.05 0.05 0.05 0.05

0 0 0 0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
lag (τ ) lag (τ ) lag (τ ) lag (τ )

1→2 3→2 4→2 5→2

0.15 0.15 0.15 0.15

0.1 0.1 0.1 0.1

0.05 0.05 0.05 0.05

0 0 0 0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
lag (τ ) lag (τ ) lag (τ ) lag (τ )

1→3 2→3 4→3 5→3

0.15 0.15 0.15 0.15

0.1 0.1 0.1 0.1

0.05 0.05 0.05 0.05

0 0 0 0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
lag (τ ) lag (τ ) lag (τ ) lag (τ )

1→4 2→4 3→4 5→4

0.15 0.15 0.15 0.15

0.1 0.1 0.1 0.1

0.05 0.05 0.05 0.05

0 0 0 0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
lag (τ ) lag (τ ) lag (τ ) lag (τ )

1→5 2→5 3→5 4→5

0.15 0.15 0.15 0.15

0.1 0.1 0.1 0.1

0.05 0.05 0.05 0.05

0 0 0 0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
lag (τ ) lag (τ ) lag (τ ) lag (τ )

Fig. 3. Single-lag GC inference (“Granger-causal graph”) for time-series data generated from the simple 5-variable AR model with varying causal lags
<τ>
of Section V (Fig. 1 and TABLE I). Blue boxes represent estimates of the single-lag GCs Fy→x|z (30), while bold black horizontal bars denote actual
values computed analytically. Red horizontal lines mark the critical GC level; see text (Section VI) for details.

4400
<τ>
inference of the Fy→x|z correctly identifies the causal lags as [10] E. Renault, K. Sekkat, and A. Szafarz, “Testing for spurious causality
well as directed functional connectivity in the model (Fig. 1). in exchange rates”, J. Empiri. Financ., vol. 5, no. 1, pp. 47–66, 1998.
[11] J. Breitung and N. R. Swanson, “Temporal aggregation and spurious
VII. CONCLUSIONS instantaneous causality in multiple time series models”, J. Time Ser.
Anal., vol. 23, no. 6, pp. 651–665, 2002.
In this article we address the issue of how, in an empirical [12] J. R. McCrorie and M. J. Chambers, “Granger causality and the
scenario, we may go beyond Granger-causal inference re- sampling of economic processes”, J. Econometrics, vol. 132, pp. 311–
336, 2006.
stricted to the time scale prescribed by the data sampling rate, [13] V. Solo, “On causality I: Sampling and noise”, in Proceedings of the
to obtain a more detailed picture of Granger-causal interac- 46th IEEE Conference on Decision and Control, New Orleans, LA,
tions at multiple times scales underlying the measured neuro- USA, Dec. 2007, IEEE, pp. 3634–3639.
(h) [14] V. Solo, “State-space analysis of Granger-Geweke causality measures
physiological process. Thus our multi-step measure Fy→x|z with application to fMRI”, Neural Comput., vol. 28, no. 5, pp. 914–
reflects information flow between variables at a specific 949, 2016.
<τ> [15] L. Barnett and A. K. Seth, “Behaviour of Granger causality under
future time horizon h, while the single-lag measure Fy→x|z
filtering: Theoretical invariance and practical application”, J. Neurosci.
identifies the precise time lag(s) at which a specific Granger- Methods, vol. 201, no. 2, pp. 404–419, 2011.
causal interaction operates. Via a simple didactic model, we [16] A. K. Seth, P. Chorley, and L. Barnett, “Granger causality analysis of
demonstrate, respectively, how the underlying time scales fMRI BOLD signals is invariant to hemodynamic convolution but not
downsampling”, NeuroImage, vol. 65, pp. 540–555, 2013.
are reflected via the multi-step statistic (Fig. 2), and may [17] D. Zhou, Y. Zhang, Y. Xiao, and D. Cai, “Analysis of sampling
be explicitly inferred from the data (Fig. 3). In addition, artifacts on the Granger causality analysis for topology extraction of
{∞} neuronal dynamics”, Front. Comput. Neurosci., vol. 8, no. 75, 2014.
in our full-future GC measure Fy→x|z , we present a useful
[18] C. Hsiao, “Autoregressive modeling and causal ordering of economic
summary measure of effect size for the total past → future variables”, J. Econ. Dyn. Contr., vol. 4, pp. 243–259, 1982.
information flow between variables. All measures are fully [19] H. Lütkepohl, “Testing for causation between two variables in higher
conditioned on (accessible) exogenous variables, so that dimensional VAR models”, in Studies in Applied Econometrics,
H. Schneeweiß and K. Zimmerman, Eds., pp. 75–91. Physica-Verlag
only direct functional relationships are reported. We describe HD, Heidelberg, 1993.
how our measures may be estimated computationally from [20] J.-M. Dufour and E. Renault, “Short run and long run causality in
time-series data, and (where appropriate) their asymptotic time series: Theory”, Econometrica, vol. 66, no. 5, pp. 1099–1125,
09 1998.
sampling distributions. We propose these measures as useful [21] L. Faes, G. Nollo, S. Stramaglia, and D. Marinazzo, “Multiscale
additions to the directed functional analysis toolbox, insofar Granger causality”, Phys. Rev. E, vol. 96, pp. 042150, 2017.
as they stand to elucidate time-dependant causal interactions [22] L. Faes, D. Marinazzo, and S. Stramaglia, “Multiscale information de-
composition: Exact computation for multivariate Gaussian processes”,
in neurophysiological processes of interest to neuroscientific Entropy, vol. 19, pp. 408, 2017.
research. We encourage future research into the behaviour [23] N. Wiener, “The theory of prediction”, in Modern Mathematics for
of our measures for more realistic data, where we should Engineers, E. F. Beckenbach, Ed., pp. 165–190. McGraw Hill, New
York, 1956.
expect causal interactions at multiple distributed lags [5]. [24] J. Geweke, “Measurement of linear dependence and feedback between
multiple time series”, J. Am. Stat. Assoc., vol. 77, no. 378, pp. 304–
ACKNOWLEDGMENT 313, 1982.
The authors would like to thank Stefan Haufe for useful [25] J. Geweke, “Measures of conditional linear dependence and feedback
between time series”, J. Am. Stat. Assoc., vol. 79, no. 388, pp. 907–
discussions about this work. 915, 1984.
[26] T. Schreiber, “Measuring information transfer”, Phys. Rev. Lett., vol.
R EFERENCES 85, no. 2, pp. 461–4, 2000.
[1] C. W. J. Granger, “Economic processes involving feedback”, Inform. [27] M. Paluš, V. Komárek, Z. Hrnčı́ř, and K. Štěrbová, “Synchronization as
Control, vol. 6, no. 1, pp. 28–48, 1963. adjustment of information rates: Detection from bivariate time series”,
[2] C. W. J. Granger, “Investigating causal relations by econometric Phys. Rev. E, vol. 63, no. 4, pp. 046211, 2001.
models and cross-spectral methods”, Econometrica, vol. 37, pp. 424– [28] Yu. A. Rozanov, Stationary Random Processes, Holden-Day, San
438, 1969. Francisco, 1967.
[3] L. Barnett, A. B. Barrett, and A. K. Seth, “Granger causality and [29] H. Lütkepohl, New Introduction to Multiple Time Series Analysis,
transfer entropy are equivalent for Gaussian variables”, Phys. Rev. Springer-Verlag, Berlin, 2005.
Lett., vol. 103, no. 23, pp. 0238701, 2009. [30] P. Masani, “Recent trends in multivariate prediction theory”, in
[4] L. Barnett and T. Bossomaier, “Transfer entropy as a log-likelihood Multivariate Analysis, P. R. Krishnaiah, Ed., pp. 351–382. Academic
ratio”, Phys. Rev. Lett., vol. 109, no. 13, pp. 0138105, 2013. Press, New York, 1966.
[5] L. Barnett and A. K. Seth, “Detectability of Granger causality [31] J. D. Hamilton, Time Series Analysis, Princeton University Press,
for subsampled continuous-time neurophysiological processes”, J. Princeton, NJ, 1994.
Neurosci. Methods, vol. 275, pp. 93–121, 2017. [32] S. S. Wilks, “Certain generalizations in the analysis of variance”,
[6] R. Miller, “What is the contribution of axonal conduction delay to Biometrika, vol. 24, pp. 471–494, 1932.
temporal structure in brain dynamics?”, in Oscillatory Event-Related [33] A. B. Barrett, L. Barnett, and A. K. Seth, “Multivariate Granger
Brain Dynamics, C. Pantev, T. Elbert, and B. Lütkenhöner, Eds., pp. causality and generalized variance”, Phys. Rev. E, vol. 81, no. 4, pp.
53–57. Springer Science+Business Media, New York, 1994. 041907, 2010.
[7] J. M. L. Budd and Z. F. Kisvárday, “Communication and wiring in [34] S. S. Wilks, “The large-sample distribution of the likelihood ratio for
the cortical connectome”, Front. Neuroanat., vol. 6, no. 42, 2012. testing composite hypotheses”, Ann. Math. Stat., vol. 6, no. 1, pp.
[8] R. Caminiti, F. Carducci, C. Piervincenzi, A. Battaglia-Mayer, G. Con- 60–62, 1938.
falone, F. Visco-Comandini, P. Pantano, and G. M. Innocenti, “Diam- [35] A. Wald, “Tests of statistical hypotheses concerning several parameters
eter, length, speed, and conduction delay of callosal axons in Macaque when the number of observations is large”, T. Am. Math. Soc., vol.
monkeys and humans: Comparing data from histology and magnetic 54, no. 3, pp. 426–482, 1943.
resonance imaging diffusion tractography”, J. Neurosci., vol. 33, no. [36] Y. Chen, S. L. Bressler, and M. Ding, “Frequency decomposition of
36, pp. 14501–14511, 2013. conditional Granger causality and application to multivariate neural
[9] F. Comte and E. Renault, “Noncausality in continuous time models”, field potential data”, J. Neurosci. Methods, vol. 150, pp. 228–237,
Econ. Theory, vol. 12, no. 2, pp. 215–256, 1996. 2006.

4401
[37] G. T. Wilson, “The factorization of matricial spectral densities”, SIAM
J Appl Math, vol. 23, no. 4, pp. 420–426, 1972.
[38] M. Dhamala, G. Rangarajan, and M. Ding, “Analyzing information
flow in brain networks with nonparametric Granger causality”, Neu-
roImage, vol. 41, no. 2, pp. 354–362, 2008.
[39] M. Dhamala, G. Rangarajan, and M. Ding, “Estimating Granger
causality from Fourier and wavelet transforms of time series data”,
Phys. Rev. Lett., vol. 100, pp. 018701, 2008.
[40] P. Whittle, “On the fitting of multivariate autoregressions, and the
approximate canonical factorization of a spectral density matrix”,
Biometrika, vol. 50, no. 1,2, pp. 129–134, 1963.
[41] L. Barnett and A. K. Seth, “The MVGC Multivariate Granger Causal-
ity Matlab c toolbox”, http://users.sussex.ac.uk/∼lionelb/MVGC,
2012.
[42] L. Barnett and A. K. Seth, “Granger causality for state-space models”,
Phys. Rev. E (Rapid Communications), vol. 91, no. 4, pp. 040101(R),
2015.
[43] E. J. Hannan and M. Deistler, The Statistical Theory of Linear Systems,
SIAM, Philadelphia, PA, USA, 2012.
[44] C. A. Sims, “Money, income and causality”, Am. Econ. Rev., vol. 62,
no. 4, pp. 540–552, 1972.
[45] P. E. Caines, “Weak and strong feedback free processes”, IEEE. Trans.
Autom. Contr., vol. 21, no. 5, pp. 737–739, 1976.
[46] P. D. Lax, Linear Algebra and Its Applications, John Wiley & Sons,
Inc., Hoboken, NJ, USA, 2007.

4402

You might also like