You are on page 1of 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
IEEE TRANSACTIONS ON IMAGE PROCESSING 1

Sequential Dictionary Learning From Correlated


Data: Application to fMRI Data Analysis
Abd-Krim Seghouane, Senior Member, IEEE and Asif Iqbal

Abstract—Sequential dictionary learning via the K-SVD algo-


rithm has been revealed as a successful alternative to conventional
data driven methods such as independent component analysis
(ICA) for functional magnetic resonance imaging (fMRI) data
analysis. fMRI datasets are however structured data matrices
with notions of spatio-temporal correlation and temporal smooth-
ness. This prior information has not been included in the K-SVD
algorithm when applied to fMRI data analysis. In this paper
we propose three variants of the K-SVD algorithm dedicated to
fMRI data analysis by accounting for this prior information. The Fig. 1. Pictorial description of the fMRI data matrix.
proposed algorithms differ from the K-SVD in their sparse coding
and dictionary update stages. The first two algorithms account paradigm becomes hard to model, e.g., when studying resting
for the known correlation structure in the fMRI data by using state or naturalistic paradigms such as continuous listening or
the squared Q, R-norm instead of the Frobenius norm for matrix watching a movie.
approximation. The third and last algorithm account for both the To overcome these drawbacks, data-driven methods were
known correlation structure in the fMRI data and the temporal
smoothness. The temporal smoothness is incorporated in the successfully suggested and applied to fMRI data analysis.
dictionary update stage via regularization of the dictionary atoms These methods consider the fMRI time series measured at
obtained with penalization. The performance of the proposed each voxel as a mixture of signals localized in a small set
dictionary learning algorithms are illustrated through simulations of regions and other simultaneous time-varying effects. They
and applications on real fMRI data. isolate the spatial brain activity by estimating a mixing matrix
Keywords: functional magnetic resonance imaging (fMRI), and the sources that define the spatially localized neural
dictionary learning, sparsity, sequential update, correlation, dynamics. A number of data driven fMRI analysis methods
regularization. use a data matrix Y formed by vectorizing each time series
observed in every voxel creating a matrix n × N where n
is the number of time points and N the number of voxels
I. I NTRODUCTION (≈ 10, 000 − 900, 000) [12]. These methods consider Y as the
mixture and factorize it into two matrices Y = DX, where
Approaches for brain activity analysis using fMRI data can
D is the mixing matrix and X the source matrix. Data-driven
be broadly divided as being either model-based or data-driven.
methods are suitable for the analysis of fMRI data as they
Model-based methods through the general linear model (GLM)
minimize the assumptions on the underlying structure of the
and random field theory [1][2][3] have widely been used.
problem by decomposing the observed data based on a factor
These methods use the a priori knowledge about the properties
model and a specific constraint. Different constraints have led
of the data; i.e.; the hemodynamic response function (HRF)
to different data-driven methods. For example, the maximum
[4][5][6], and the experimental paradigm; i.e.; the stimulus
variance constraint has led to principal component analysis
function, to investigate the goodness-of-fit of the model and
(PCA) [13][14][15], the independence constraint has led to
make inferences about regional brain activities. The use of
spatial ICA (sICA, for the format of the data described above)
this approach can however be limited for two reasons. First,
and temporal ICA (tICA) [16][17][18][19] and the sparsity
the performance can be affected by the a priori assumed
constraint has led to dictionary learning [20][21][22][23][24].
known HRF as it may not be adapted to the individual and
Currently, ICA has become a widespread data-driven method
experiment under study as it does not take into consideration
for fMRI analysis. In spite of the positive outcome, first,
individual and experimental variance [7][8] added to that its
ICA treats the observations samples which correspond to
commonly used forms are still questionable within the neu-
the voxels in case sICA as i.i.d ignoring the nature of the
roscience community [9]. Furthermore, the estimation of the
fMRI signals. Second, the independence constraint used in
HRF may engender extra uncertainty that will also inevitably
ICA raises questions when it comes to fMRI analysis as
affect the performance of these methods [10][11]. Second, the
most hemodynamic effects in the brain are unlikely to be
model-based methods can not be used when the experimental
independent due to complicated structure and strong neural
A-K. Seghouane and Asif Iqbal are with the Department of connectivity as well as the preprocessing steps applied to
Electrical and Electronic Engineering, The University of Melbourne. the fMRI data such as spatial smoothing [25]. In contrast,
Melbourne, Australia. e-mails: Abd-krim.seghouane@unimelb.edu.au and
aiqbal1@student.unimelb.edu.au. This work was supported by the Australian biological findings of sparse coding in the brain support the
Research Council through Grant FT. 130101394. effectiveness of the sparsity constraint [26] compared to the

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
2 IEEE TRANSACTIONS ON IMAGE PROCESSING

independency constraint. Furthermore, the effectiveness of matrix approximation. With fMRI data sets in the form of
ICA methods is attributed to their ability to handle sparse Y, regularizing the dictionary elements or atoms to encourage
component rather than independence [27]. Although this claim smoothness of the data set in the columns direction may be of
was proven to be incorrect in [28], dictionary learning al- interest. Indeed, while we expect to have only a limited number
gorithms have gained widespread acceptance in fMRI data of voxel active at each time point, it is also expected to have
analysis [29][30][31][32][33][34]. With dictionary learning continuous activity along the time. It is often conventional in
methods, the fMRI time series measured at a specific voxel fMRI analysis to assume that the hemodynamic response is
is approximated by a sparse linear combination of dynamic relatively a smooth temporal function. Therefore, the signal at
components, where each component has different time-series a fixed voxel over time is believed to be smooth and of low
signal patterns. frequency. We therefore further develop a dictionary learning
Given a data set Y ∈ Rn×N , overcomplete dictionary learning algorithm that accounts for such a priori information by
methods find a dictionary matrix D ∈ Rn×K , N > K > n, enforcing smoothness of the dictionary atoms. This is obtained
with unit column norms and a sparse coefficient matrix also by regularizing the dictionary atoms in the dictionary update
known as the sparse codes X ∈ RK×N such that they solve stage where regularization is obtained through penalized rank
one matrix approximation [47][48].
min ||Y − DX||2F s.t. k xi k0 ≤ s, ∀ 1 ≤ i ≤ N. In this paper, we propose to develop dictionary learning algo-
D,X
rithms that are suitable for fMRI data analysis by accounting
where the xi ’s are the column vectors of X, k . k0 is the for the known correlation structure in the fMRI data and
l0 quasi-norm, the sparsity measure that counts the number that incorporates regularization of the dictionary atoms. Our
of nonzero coefficients. Because of the none joint convexity specific contributions include a) dictionary learning algorithms
problem; for most, they consist of two stages: a sparse coding that account for the known correlation structure in the fMRI
stage and a dictionary update stage. In the first stage the data, b) a dictionary learning algorithm that accounts for the
dictionary is kept constant and the sparsity constraint is used to known correlation structure in the fMRI data and incorporates
produce a sparse linear approximation of the observed data. regularization for the dictionary atoms, c) computationally
In the second stage, based on the current sparse codes, the efficient algorithms to compute a) and b). While we have
dictionary is updated to minimize a cost function to achieve focused on fMRI data analysis, the proposed methods are
a certain objective. The dictionary learning methods iterate flexible and general with a wide range of applicability to a
between the sparse coding stage and a dictionary update variety of correlated data sets common in image processing.
stage until convergence. The performance of overcomplete The rest of the paper is organized as follows. In the next
dictionary learning methods strongly depends on the dictionary section the dictionary learning method proposed in [35] is
update stage since most of these methods share a similar sparse reviewed and the problem formulated. Dictionary learning
coding stage. Besides the difference in the approach used to algorithms that account for the known correlation structure
update the dictionary, the dictionary update can be made se- in the fMRI datasets are derived in section III. In section IV
quential where each dictionary atom (column di , i = 1, ..., K, we present the dictionary update stage used to incorporate
of D) is updated separately, as for example in [35][36][37] or a smoothness constraint on the dictionary atoms through
in parallel where the dictionary atoms are updated all at once penalized rank one matrix approximation and present the asso-
as in [38][39][40][41][42]. Most proposed dictionary learning ciated new dictionary learning algorithm. Section V contains
algorithms have kept the two stages optimization procedure, experimental results illustrating the performance of proposed
the difference appearing mainly in the dictionary update stage dictionary learning algorithms on synthetic and real fMRI
with some exceptions having a difference in the sparse coding datasets. Concluding remarks are given in section VI.
stage.
Datasets arising from fMRI experiments exhibit a compli- II. BACKGROUND
cated temporal and spatial correlation noise structure with
a relatively weak signal [25]. Dictionary learning algorithms Given a set of signals Y = [y1 , y2 , ..., yN ], a learned
proposed so far ignore this noise structure. Therefore, they dictionary is a collection of vectors or atoms dk , k = 1, ..K
can fail to capture relevant aspects of structural dependencies that can be used for optimal sparse linear representation.
in fMRI data sets. The noise in fMRI is characterized by Usually this is obtained by minimizing the following objective
spatial correlation (between neighboring voxels) often ampli- n o
fied by spatial smoothing and temporal correlation in the time D̂, X̂ = arg min k Y − DX k2F
D,X (1)
series measured at each voxel. Random fields models have subject to k xi k0 ≤ s, ∀ 1 ≤ i ≤ N.
been widely used to model the spatial correlation whereas
autoregressive models have been used to model the tempo- where s  K and D = [d1 , d2 , ..., dK ]. To prevent D from
ral correlation [25][43]. Given these well understood noise being arbitrarily large and therefore have arbitrarily small
correlations, we first propose modified versions of [35] and values of xi , it is common to constrain its columns to have unit
the associated algorithms that account for these two forms of norm. The generally used strategy to solve (1), not necessarily
correlation directly in the dictionary learning algorithm. The leading to a global minimum consists in splitting the problem
proposed algorithms are obtained by using the square of the into two stages which are alternately solved within an iterative
Q, R-norm [44][45][46] instead of the Frobenius norm for loop. These two stages are, first, the sparse coding stage, where

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
SEGHOUANE et al.: SEQUENTIAL DICTIONARY LEARNING FROM CORRELATED DATA: APPLICATION TO FMRI DATA ANALYSIS 3

D is fixed and the sparse coefficient vectors are found by alternating least square method, sequentially estimates dk with
solving xrow
k fixed and vice versa by solving the least square problem
x̂i = arg min k yi − Dxi k22 ; R
kEk − dk xrow 2
= tr (ER row R row >

xi k kF k − dk xk )(Ek − dk xk )
(2) > R row >
subject to k xi k0 ≤ s i = 1, ..., N = kER 2
k kF − 2dk Ek xk +
2 row 2
kdk k2 .kxk k2 (5)
Although sparse coding as stated in (2) has a combinatorial
complexity, it can be approximately solved by either convex- where tr stands for trace and then rescaling the estimates to
ifying (2) or using greedy pursuit algorithms [49]. After the give
row>
sparse coding stage, the dictionary update stage is performed ERk xk
d̂k = row> k
(6)
by fixing X and deriving D by solving k ER
k xk 2

x̂row = d> R
k Ek . (7)
D̂ = arg min k Y − DX k2F (3) k
D
A dictionary update stage can be obtained by iterating (6)
followed by a normalization of its columns. and (7) until convergence or by applying only few iterations
This is where sequential and parallel update methods differs. of these equations instead of the computationally expensive
In parallel update methods all dictionary atoms are updated in SVD of ER k.
parallel using least squares [38][42] or maximum likelihood To understand why the above approach can fail in the context
[39][40][41] whereas sequential update methods [35][36][37] of correlated data and fMRI data analysis in particular, lets
breaks the global minimization (3) into K sequential mini- examine briefly the dictionary update stage for example. For
mization problems. In the method proposed in [35], which the matrix ER k ∈R
n×|wk |
, the rank one matrix approximation
has become a benchmark in dictionary learning, each column obtained through the SVD considers a remaining rank one
dk of D and its corresponding row of coefficients xrow are approximation error matrix with entries assumed independent
k
updated based on a rank-1 matrix approximation of the error and identically distributed; i.i.d, [51][52]. This assumption is
for all the signals when dk xrow is removed unrealistic for fMRI data which are characterized by strong
k
correlation structures. The Frobenius norm cost function
{d̂k , x̂k } = arg min k Y − DX k2F Xn |w
X k|  2
R row 2
dk ,xrow
k kEk − dk xk kF = EkRij − dki xrow
kj
  2
K i=1 j=1
X
row row

= arg min Y− di xi  − dk xk corresponds to the sums of squares loss function weighting

dk ,xrow
k i=1,i6=k errors associated with each matrix entry equally ignoring
F
= arg min kE k − dk xrow 2
k k F . (4) cross-product errors between EkRij and EkRi0 j0 for example. The
dk ,xrow
k Frobenius norm loss is proportional to the spherical Gaussian
log-likelihood assuming that ER k is from a multivariate normal
The singular value decomposition (SVD) of Ek = U∆V> density ER row
k ∼ N (dk xk , I) with rank one mean. Therefore
is used to find the closest rank-1 matrix approximation of it is clear that the Frobenius norm and thus the dictionary
Ek [50]. In this case, dk could be updated by taking the update stage based on (5) is not fully adapted for fMRI data
first column of U and xrow k by taking the first column of V characterized by correlations among the matrix entries.
multiplied by the first diagonal element of ∆. This form of
update corresponds to a dictionary update stage that ignores III. D ICTIONARY LEARNING FOR CORRELATED DATA
the sparsity pattern information derived in the sparse coding.
A dictionary update stage that uses the sparsity pattern in- A. Dictionary update stage
formation with (4) can be obtained by avoiding the loss of To justify the proposed approach for dictionary learning,
sparsity in xrow
k that will be created by the direct application we continue below to focus on the dictionary update stage
of the SVD on Ek . This solution was adopted in [35] where discussed above. Instead of assuming a remaining rank one
it was proposed to modify only the nonzero entries of xrow k approximation error matrix with entries from a multivariate
by taking into account only the signals yi that use the atom normal density as assumed in the SVD used above in (5) for
dk in (3) or by taking the SVD of ER k = Ek Iwk , where the dictionary update stage, we assume that the error matrix
wk = {i|1 ≤ i ≤ N ; xrow k (i) 6
= 0} and Iw k
is the N × |w k | is structured arising from a separable covariance matrix
submatrix of the N × N identity matrix obtained by retaining with row covariance Λ ∈ R|wk |×|wk | and column covariance
only those columns whose index numbers are in wk , instead Ω ∈ Rn×n [53] to better account for the known correlation
of the SVD of Ek . structure in fMRI datasets. In this case, the vectorization of
Among the motivations of the proposed dictionary learning the error matrix is characterized by a covariance given by the
approaches is the observation that the rank-1 approximation Kronecker product between the row and column covariances
obtained using the SVD and written as dk xrow k can also be N (0, Λ ⊗ Ω) [53] and ER k is from a multivariate Gaussian
approximated by applying few iterations of the power method density ER k ∼ N (d k x row
k , Λ ⊗ Ω). Then the log-likelihood is
for computing the SVD [50]. Recall that the power method, or proportional to

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
4 IEEE TRANSACTIONS ON IMAGE PROCESSING

TABLE I
l(ER
k | Ω, Λ)
S TEPWISE DESCRIPTION OF THE PROPOSED SEQUENTIAL DICTIONARY
LEARNING ALGORITHM ACCOUNTING FOR KNOWN CORRELATION
 > 
∝ tr Ω−1 ER row
 −1 R STRUCTURES IN THE DATA
k − dk xk Λ Ek − dk xrow
k

= kER row 2
k − dk xk kΩ−1 ,Λ−1 . (8) Algorithm A1
Given: Training data Y ∈ Rn×N , initial dictionary Dini , signal
The correlation structure is taken into consideration in the sparsity s and the number of iterations j.
dictionary update stage by changing the cost function from Set D = Dini
the Frobenius norm (5) r to the square of the Q, R-norm (8) For i=1 to j
  1: Sparse Coding Stage:
R>
defined as k Ek kQ,R = tr QER
R
k REk where Q = Ω−1 , Compute the column covariance Q of Y and generate Q1/2
1/2
>
Compute the row covariance Rc of Y and generate Rc
R = Λ−1 . Note that k ER R
k kQ =k Ek kQ,I and k Ek
R
kR =k Find sparse coefficients X, by solving
R> ˆ
x̃ = arg minx̃ k Zỹ − Z(IN ⊗ D)x̃ k22 subject to k x̃ k0 ≤ sN
Ek kR,I define the Q-norm and R-norm respectively. This Compute E = Y − DX
cost function permits unequal weighting of the remaining rank Compute Q as the inverse of the covariance of E
one approximation error matrix entries based on Ω−1 and Compute LQ = Q1/2 = UQ ∆Q U>
1/2
Q
Λ−1 and then accounts for the known correlation structure 2: Dictionary Update Stage:
in fMRI datasets. By finding the rank one approximation of For each column k = 1, 2, ..., K in D,
2.a: Compute the error matrix using
ERk with respect to the square of the Q, R-norm, we develop Ek = E + dk xrowk
a dictionary update stage that directly accounts for the known 2.b: Using wk the set of indices in X that uses dk generate
correlation structure in fMRI datasets ER k = Ek Iwk
>
2.c: Compute R as the inverse of the covariance of ER k
{d̂k , x̂row
k } = arg min
row
kER row 2
k − dk xk kΩ−1 ,Λ−1 . (9) 2.d: Compute LR = R 1/2 = UR ∆R UR
1/2 >
dk ,xk
2.e: Compute the SVD of L> R
Q Ek LR = U∆V
>

By noting that both Q and R are both positive definite as 2.f: Update the dictionary atom dk using
 −1
being the inverses of covariance matrices, decompositions of d̂k = L> Q u where u is the first column of U
the form Q = LQ L> >
Q and R = LR LR exist. They can be 2.g: Update the wk non zero entries of the row xrow k using
obtained from the Cholesky decompositions or the eigenvalue x̂row
k = v> L−1
R where v is the first column of V multiplied
decompositions Q = UQ ∆Q U> > the first element of ∆.
Q and R = UR ∆R UR as LQ = 2.h: Update E = ER row
1/2 1/2 k − dk xk
Q1/2 = UQ ∆Q U> Q and LR = R
1/2
= UR ∆R U> R . The end.
> R > Output: D,X
SVD of LQ Ek LR = U∆V is used to find the closest rank-
1 matrix approximation of ER k with respect to the square of
the Q, R-norm. In this case, dk could be updated by right PN
multiplying the inverse of L> Q by the first column of U and
where k X k0 = i=1 k xi k0 , and Q and Rc are the positive
the xrow
k could be updated by left multiplying the inverse of definite inverse column and row covariance matrices of Y. This
LR by the transpose of the first column of V multiplied by the inevitably also leads to a modification of the sparse coding
first diagonal element of ∆. Details of this derivation are given stage in which X is estimated by fixing D and solving
in appendix A. The proposed dictionary update stage uses an X̂ = arg min k Y − DX k2Q,Rc
SVD of the sphered matrix ER k since left and right multiplying X (11)
by L> Q and L R yields data with identity matrix. Therefore the subject to k xi k0 ≤ s, ∀ 1 ≤ i ≤ N.
proposed dictionary update stage decorrelates the entries of
Let ỹ = (y> > >
1 , ..., yN ) ∈ R
nN
and x̃ = (x> > >
1 , ..., xN ) ∈ R
KN
ERk so that the SVD equally weights the remaining rank one
be the vectors obtained by stacking the columns of Y and X
approximation error matrix entries. Since the nonzero entries
respectively, then (11) can be solved by minimizing
that are modified in xrowk are associated to the signals yi that
use the atom dk [35], R can be taken as I> wk Rc Iwk , where Rc
ˆx̃ = arg min k Zỹ − Z(IN ⊗ D)x̃ k2
2
x̃ (12)
is the row covariance of Y.
Since the Frobenius norm is a special case of the Q, R-norm subject to k x̃ k0 ≤ sN.
taking Q = I and R = I, the proposed dictionary update stage  >
is in fact a generalization of (5) that uses the SVD. where Z = QT /2 ⊗ RTc /2 and QT /2 = Q1/2 . The
derivation of (12) is given in appendix B. The final resulting
B. Sparse coding stage dictionary learning algorithm is depicted in table 1.
This dictionary update stage results also in a modification
or a variation of the objective (1) used to derive the dictionary C. Approximate dictionary update stage
learning procedure defined by (2) and (3). With the above The closest rank-1 matrix approximation of ERk with respect
modification of the dictionary update stage the objective to the square of the Q, R-norm and therefore the dictionary
adapted for fMRI data analysis is update stage steps can also be computed by applying few
n o iterations of a variant of the power method for computing the
D̂, X̂ = arg min k Y − DX k2Q,Rc
D,X (10) SVD. This has the advantage of avoiding the computationally
subject to k X k0 ≤ sN, ∀ 1 ≤ i ≤ N, extensive multiple SVDs required in the dictionary update

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
SEGHOUANE et al.: SEQUENTIAL DICTIONARY LEARNING FROM CORRELATED DATA: APPLICATION TO FMRI DATA ANALYSIS 5

stage of A1. TABLE II


This approach, sequentially estimates dk with xrow
k fixed S TEPWISE DESCRIPTION OF THE PROPOSEDITERATIVE SEQUENTIAL
DICTIONARY LEARNING ALGORITHM ACCOUNTING FOR KNOWN
and vice versa by alternating least square minimization of CORRELATION STRUCTURES IN THE DATA
k ER row 2
k − dk xk kQ,R =
row >
 
Algorithm A2
tr ER row
R ER
 
k − dk xk k − dk xk Q (13) Given: Training data Y ∈ Rn×N , initial dictionary Dini , signal
sparsity s and the number of iterations j.
and then rescaling the estimates to give Set D = Dini
> For i=1 to j
ER row
k Rxk 1: Sparse Coding Stage:
d̂k = R > (14)
k Ek Rxrow
k kQ Compute the column covariance Q of Y and generate Q1/2
1/2
Compute the row covariance Rc of Y and generate Rc
xˆk row = d> R
k QEk . (15) Find sparse coefficients X, by solving
ˆ
x̃ = arg minx̃ k Zỹ − Z(IN ⊗ D)x̃ k22 subject to k x̃ k0 ≤ sN
The derivations of (14) and (15) are described in appendix B. Compute E = Y − DX
These equations define a variation of the power algorithm for Compute Q as the inverse of the covariance of E
computing the Q, R-norm SVD, which, if initialized randomly, 2: Dictionary Update Stage:
For each column k = 1, 2, ..., K in D,
converges almost surely to the Q, R weighted rank-1 matrix 2.a: Compute the error matrix using
approximation of ER k with respect to the square of the Q, Ek = E + dk xrow k
R-norm. Using this observation a dictionary update stage 2.b: Using wk the set of indices in X that uses dk generate
can be obtained by iterating (14) and (15) until convergence ER k = Ek Iwk
>
instead of the computationally expensive procedure used in 2.c: Compute R as the inverse of the covariance of ER k
Update the dictionary atom dk and the wk non zero
algorithm A1 to compute the rank-1 matrix approximation entries of the row xrow by repeating 2.d to 2.f until
k
of ER k with respect to the square of the Q, R-norm. The convergence:
convergence in this case is measured by the relative change of 2.d: Update the dictionary atom dk using
dk = ER row>
the dictionary atom dk with respect to the Q−norm defined k Rxk
||dkj+1 −dkj ||Q 2.e: Normalize the dictionary atom dk using
by ||dkj ||Q < . A simplified dictionary update stage d̂k = kddkk
k Q
can be obtained by applying a single iteration of (14) and 2.f: Update the wk non zero entries of the row xrow
k using
row
(15) instead of alternating until convergence. Within this x̂k = d> R
k QEk .
2.g: Update E = ER row
k − dk xk
stage dk is estimated by solving the least square problem end.
k ERk − dk xk
row 2
kR and rescaling whereas xrow k is estimated Output: D,X
by solving the least square problem k ER k − d row
k xk k2Q .
In comparison to algorithm A1, the solutions are generated
without computed L> > R
Q , LR , their inverses, the matrix LQ Ek LR where Φ here is a non-negative definite roughness penalty
and its SVD. It is thus computationally more attractive. The
matrix used to penalize the second differences [47][48]; i.e.;
resulting dictionary learning algorithm is depicted in table 2.
a larger value of d>k Φdk is associated with a greater penalty
IV. D ICTIONARY UPDATE FOR CORRELATED DATA WITH on differences between adjacent values. Using the previous
REGULARIZED DICTIONARY ATOMS descriptions, the updates of dk and xrow
k that generates smooth
dictionary atoms in the dictionary update stage can be obtained
In the case of reshaped fMRI dataset Y ∈ Rn×N where n is
by alternating minimization of
the number of time points and N the number of voxels, while
we expect to have only a limited number of voxel active at each {dˆk , x̂row
k } = arg min
row
k ER
k − dk xk kQ,R +αd>
row 2
k Φdk
dk ,xk
time point, it is also expected to have continuous activity along
the time. Therefore we are interested in obtaining smooth subject to kdk kQ = 1
dictionary atoms to encourage smoothness in the column (16)
direction of the dataset Y. which gives
−1 >
d̂k = k xrow
k k2R Q + αΦ QERk Rxk
row
A. Regularization by penalization
dk (17)
The fMRI brain signals are known to be slow varying with dk =
k dk kQ
or smooth temporal signals (smoothed by the convolution
with the hemodynamic response function)[54]. Therefore, they where the scaling corresponds to the constraint kdk kQ = 1
don’t contain abrupt or sharp variations. Among the options for and
regularized penalties that can be included in the cost function x̂row
k = d> R
k QEk . (18)
used in the dictionary update stage to encourage smoothness of
the dictionary atoms forming the basing blocks for the sparse The derivations of (17) and (18) are described in appendix C.
linear approximation of the fMRI signal, we adopt here the A dictionary update stage can be obtained by iterating (17)
widely used l2 penalty defined by and (18) until convergence or by applying only few iterations
n−1
of these equations. Within this stage xrowk is estimated by
X solving the least square problem k ER − d row
k xk k2Q . The
d> 2 2
k Φdk = dk (1)+dk (n)+ (dk (i+1)−2dk (i)+dk (i−1))2 k

i=2
resulting dictionary learning algorithm is depicted in table 3.

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
6 IEEE TRANSACTIONS ON IMAGE PROCESSING

TABLE III 1) Dictionary Recovery: The performance comparison of


S TEPWISE DESCRIPTION OF THE PROPOSEDITERATIVE SEQUENTIAL the dictionary learning algorithms in terms of their ability to
DICTIONARY LEARNING ALGORITHM ACCOUNTING FOR KNOWN
CORRELATION STRUCTURES IN THE DATA WITH REGULARIZED recover the underlying original dictionary Dg (the generating
DICTIONARY ATOMS dictionary), which has been used to generate the test signals
matrix Y is provided in this section. We started by initializing
Algorithm A3 the Dg as a dictionary matrix of size 20 × 50 with ran-
Given: Training data Y ∈ Rn×N , initial dictionary Dini , signal dom entries taken from the normal distribution N (0, 1). The
sparsity s and the number of iterations j.
Set D = Dini
columns of Dg were then normalized to have a unit euclidean
For i=1 to j norm ||dj ||2 = 1 with dj representing the columns of Dg .
1: Sparse Coding Stage: 1500 test signals Y of dimensions 20 denoted by {yi }1500 i=1
Compute the column covariance Q of Y and generate Q1/2 were generated by the linear combination of s dictionary
1/2
Compute the row covariance Rc of Y and generate Rc columns (atoms) taken from random locations with uniformly
Find sparse coefficients X, by solving
ˆ
x̃ = arg minx̃ k Zỹ − Z(IN ⊗ D)x̃ k22 subject to k x̃ k0 ≤ sN distributed i.i.d coefficients.
Compute E = Y − DX We generated the noise matrix using a multivariate Gaus-
Compute Q as the inverse of the covariance of E sian density v N (0, Λ ⊗ Ω) where Λ ∈ R1500×1500 and
2: Dictionary Update Stage:
For each column k = 1, 2, ..., K in D, Ω ∈ R20×20 are the row and column covariance matrices
2.a: Compute the error matrix using respectively. The diagonal entries of these covariance matrices
Ek = E + dk xrow k were generated as σ × U[0.1, 0.7], where U[a, b] corresponds
2.b: Using wk the set of indices in X that uses dk generate
R
Ek = Ek Iwk
to entries from a uniform distribution on the interval [a, b].
2.c: Compute R as the inverse of the covariance of ER
> This noise matrix was then added to the test signals and the
k
Update the dictionary atom dk and the wk non zero resulting noisy signals were used by the dictionary learning
entries of the row xrow
k by repeating 2.d to 2.f until algorithms to recover the underlying generating dictionary.
convergence:
2.d: Update the dictionary atom dk using
The initial dictionary was generated using 50 columns from
dk = k xrow k2R Q + αΦ
−1
QER row> random locations from the noisy signal matrix Y. Orthogonal
k k Rxk
2.e: Normalize the dictionary atom dk using Matching Pursuit (OMP) [58] was used in sparse coding stage
d̂k = kddkk with sparsity constraint s giving best s-term approximation of
k Q
2.f: Update the wk non zero entries of the row xrow
k using the test signals Y. The stopping criterion for the dictionary
x̂row
k = d> R
k QEk . learning algorithms was chosen to be at max 11s2 iterations
2.g: Update E = ER row
k − dk xk
end. or when the difference between dictionaries learned at each
Output: D,X iteration became smaller than  1
All the algorithms were repeated 60 times for sparsity
levels s ∈ {2, 3, 4} and different noise levels corresponding
to σ ∈ {0.35, 0.40, 0.45, 0.50}. The generating and learned
This algorithm is similar to the one depicted in table 2 where
dictionaries were then compared as described in [35] with the
the steps 2.d and 2.f associated with the updates of dk and
atom deemed recovered if |dTj di | > 0.99, where dj and di are
xrow
k are replaced by (17) and (18).
atoms from recovered and generating dictionaries respectively.
Table IV contains the mean percentage of recovered atoms
V. E XPERIMENTAL R ESULTS by all the algorithms corresponding to the different sparsity s
and noise levels σ. The hyper parameters for S1 and A3 was
In the following section, we evaluate performance of the determined empirically from the set (0.1, 0.9) with step size
proposed algorithms using two experiments i.e. dictionary re- of 0.05 and selected to be 0.3 and 0.6 respectively. The ODL
covery and sparse GLM (generalized linear model) analysis on 2
algorithm was run with sparsity controlling parameter λ set
simulated data. After establishing their superior performance, to 0.4 while keeping default values for rest of the parameters.
we move on to validate these algorithms on real experimental
The average recovery results presented in table IV shows
fMRI datasets acquired from Human Connectome Project Q1
that as we increase the noise levels, the atom recovery rate
release [55]; the block design motor task [56] and the resting
of KSVD, S1 , and ODL got affected severely, especially at
state datasets [55].
lower signal sparsity s, whereas, the proposed algorithms, by
The details of these experiments are given below:
using the covariance information present in the data, were still
able to recover atoms with higher recovery rate. Moreover, at
A. Simulations higher sparsity levels s, the recovery rates of ODL were similar
to the proposed algorithms. To visualize the convergence rate
In this section, we have presented two simulated experi- of the algorithms, the average percentage of recovered atoms
ments to evaluate the performance of the proposed algorithms vs iteration number are presented in Fig. 2, and Fig. 3 for s = 2
A1 , A2 , and A3 with respect to KSVD [35], S1 [36], Online and s = 3 respectively for σ = 0.45 where the difference
dictionary learning algorithm (ODL) [57], and the method
of optimal directions (MOD) [38]. These experiments are 1 ||D − D
i i−1 ||F /||Di−1 ||F < , where  = 0.001 and i is the iteration
dictionary recovery and sparse GLM analysis on simulated number.
fMRI data. The details are given below: 2 https://github.com/tiepvupsu/DICTOL/tree/master/ODL

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
SEGHOUANE et al.: SEQUENTIAL DICTIONARY LEARNING FROM CORRELATED DATA: APPLICATION TO FMRI DATA ANALYSIS 7

TABLE IV Atom recovery results for 60 Trials with s = 3 and σ = 0.45


80
M EAN ATOM RECOVERY PERCENTAGE FOR DIFFERENT SPARSITY LEVELS KSVD
S AND NOISE LEVELS σ FOR 60 TRIALS WITH THE BEST RESULTS IN BOLD . S1
70
A
2
A3
(s) σ KSVD A1 A2 A3 S1 ODL MOD

Atom Recovery Percentage


60 A1
MOD
0.35 75.13 87.00 87.44 86.33 81.53 72.38 75.40
50
0.40 72.65 82.67 83.47 83.29 80.23 63.50 59.90
2
0.45 44.87 71.33 72.57 70.66 58.03 62.00 40.44
0.50 19.53 54.44 53.11 53.89 25.30 39.45 22.03 40

0.35 84.55 89.65 87.35 87.61 87.67 82.67 77.13 30


0.40 65.76 81.18 80.57 79.55 72.77 76.67 61.60
3
0.45 37.60 72.54 71.28 73.86 52.47 63.00 37.20 20
0.50 19.47 53.22 56.89 54.00 47.23 51.67 17.63

0.35 82.33 86.79 85.66 83.48 84.23 83.33 70.57 10


0.40 52.55 71.63 70.67 70.89 68.13 69.67 48.53
4
0.45 23.80 51.78 52.22 55.89 47.57 51.35 21.47 0
0.50 13.57 43.65 44.55 41.67 37.90 42.45 10.23 0 10 20 30 40 50 60 70 80 90 100
Iterations

Fig. 3. Average percentage of atoms recovered after each iteration for different
in recovery rate observed from proposed algorithms is quite sparsity levels s = 3 with σ = 0.45
distinct. The learning rate of ODL is not included in these
figures because the online code used to generate these results
are shown in Fig. 4. The test signals matrix Y ∈ R120×100
did not return the learning rate of the algorithm.
was generated by mixing these signals together [20].
Atom recovery results for 60 Trials with s = 2 and σ= 0.45 The test signals were corrupted by correlated noise N v
80
KSVD
N (0, Λ ⊗ Ω) with row covariance Λ and column covariance
70
S1 Ω. The row covariance matrix is generated using the model
A2
A3
Λ = θ|i−j| where i and j are indexes of a two dimensional
grid with the correlation controlling parameter θ = 0.8. The
Atom Recovery Percentage

60 A1
MOD
column covariance matrix is generated using an AR(1) process
50
as given in [59] with parameters α = 0.5 and variance
40 σ ∈ {0.2, 0.4}. The resulting noisy signals were then given to
the dictionary learning algorithms to recover the underlying
30
generating dictionary Dg .
20
For a fair comparison, the same size dictionary D ∈ R120×5
was trained using different dictionary learning algorithms
10 with 30 iterations. The number of iterations was chosen by
observing that all the algorithms were converging in terms of
0
0 5 10 15 20 25 30 35 40 45 the ratio ||Di − Di−1 ||F /||Di−1 ||F < 0.001 for no more than
Iterations
15 iterations. The smoothness parameter α for A3 and sparsity
Fig. 2. Average percentage of atoms recovered after each iteration for different parameter for sparse coding stage (performed using orthogonal
sparsity levels s = 2 with σ = 0.45 matching pursuit (OMP) [58]) were chosen empirically and
were set to α = 0.1 and s = 2 respectively. Table V and table
2) Sparse GLM analysis on simulated fMRI data: In this
section, we present a comparative study to investigate po- T1 T1

tential of the proposed algorithms in separating the signal


A

sources from a synthetically generated fMRI mixture dataset


with temporal dependence [20] and corrupted by correlated Case (a)
0 20 40 60
T2
80 100 120
Case (b)
0 20 40 60
T3
80 100 120

noise. Two activation cases were generated, i.e. a spatially


independent case (case a) and a spatial overlap case (case b)
C
B

as shown in Fig. 4. Other than the proposed algorithms, we 0 20 40 60 80 100 120 0 20 40 60 80 100 120

also included KSVD for this study. Three temporal sources,


with 120 sec duration, were constructed to represent the
Fig. 4. Simulated activation patterns for case a) spatially independent events
brain hemodynamics, i.e. block design activation (T1 ), and and case b) spatially overlapping events.
two sinusoids (T2 and T3 ) with frequencies ∈ {1.5, 9.5} Hz
respectively and box signals were used as brain activation VI contains the correlation coefficients of recovered time series
patterns [20]. Three distinct visual patterns of size 10 × 10 with the original ones and recovered spatial maps with the
voxels were created with amplitudes of 1 at voxel indexes original ones respectively averaged over 100 trials. The best
{2, ..., 6} × {2, ..., 6} for pattern A, {8, 9} × {8, 9} for pattern results are highlighted bold in the tables. It can be seen that
B, and {5, ..., 9} × {5, ..., 9} for pattern C, and 0 elsewhere. the proposed algorithms outperform KSVD in both scenarios,
The activation patterns with their corresponding time courses especially the spatial overlap case, even in the presence of high

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
8 IEEE TRANSACTIONS ON IMAGE PROCESSING

KSVD A1 A2 A3
T1 T1 T1 T1

A
0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120

Case a) B T2 T2 T2 T2

B
0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120

T1 T1 T1 T1
A

A
0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120

Case b) T3 T3 T3 T3
C

C
0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120

Fig. 5. Simulated activation patterns for case a) spatially independent events and case b) spatially overlapping events.

amount of noise. According to table V, we can observe that = 33.1ms, flip angle = 52o , BW = 2290 Hz/Px, in-plane FOV
A3 algorithm’s temporal correlations are highest which can = 208 × 180 mm with 2.0 mm isotropic voxels. The obtained
be attributed to the inclusion of smoothness constraint in the data was already preprocessed with the preprocessing pipeline
dictionary learning step. The most well recovered activation consisting of motion correction, temporal pre-whitening, slice
patterns and time series extracted by all algorithms for the time correction, global drift removal, and the scans were
simulated cases given in Fig. 4 are also presented in Fig. spatially normalized to a standard MNI152 template and re-
5 for σ = 0.2. In Fig. 5 it can be seen that the KSVD sampled to 2mm x 2mm x 2mm voxels. The reader is referred
failed to separate the spatial overlap case whereas the proposed to [60] and [55] for more details regarding data acquisition
algorithms were able to successfully separate both activation and preprocessing. Further details of the analysis are given in
maps. their respective sections below.
1) Block Design based fMRI Analysis: This task is based on
TABLE V the task developed in [56] in which participants were presented
AVERAGE CORRELATION COEFFICIENT OF RECOVERED TIME SERIES WITH
ORIGINALS FOR THE TWO CASES OVER 100 TRIALS .
with a visual cue, asking them to tap their left or right fingers,
squeeze their left or right toes, or move their tongue to map
KSVD A1 A2 A3 the motor areas of the brain. Subjects were presented with a
σ 0.2 0.4 0.2 0.4 0.2 0.4 0.2 0.4
T1 0.98 0.94 0.96 0.93 0.97 0.93 0.99 0.94 3 seconds visual cue followed by the cue for a specific task,
Case a
T2 0.97 0.93 0.95 0.87 0.93 0.87 0.97 0.94 with movement block length of 12 seconds (10 movements). A
Case b T1 0.97 0.90 0.98 0.90 0.96 0.89 0.98 0.94 total of 13 blocks with 4 foot movements (2 for each foot), 4
T3 0.98 0.97 0.94 0.93 0.96 0.94 0.97 0.98
hand movements (2 for each hand), 2 tongue movements, and
3 15-second fixation blocks were carried out by the subject.
TABLE VI We used dataset corresponding to subject id 100307 for the
AVERAGE CORRELATION COEFFICIENT OF RECOVERED SPATIAL MAPS analysis.
WITH ORIGINALS FOR THE TWO CASES OVER 100 TRIALS .
The block design motor task fMRI run duration was 3:34
KSVD A1 A2 A3 (min:sec) with a total of 284 scans. We discarded the first 5
σ 0.2 0.4 0.2 0.4 0.2 0.4 0.2 0.4 and used the remaining 279 scans for the sparse GLM analysis.
A 0.87 0.69 0.90 0.82 0.91 0.81 0.91 0.81
Case a
B 0.88 0.85 0.96 0.96 0.98 0.94 0.97 0.94
The scans were spatially smoothed using a 6mm x 6mm x
A 0.79 0.74 0.87 0.78 0.85 0.82 0.80 0.79 6mm FWHM Gaussian kernel. Data outside the brain was
Case b
C 0.87 0.73 0.97 0.93 0.91 0.93 0.93 0.88 masked and the resulting images were vectorized and placed
as rows of the matrix Y ∈ Rn×N , where n = 279 being the
number of time points and N being the number of voxels in an
B. Sparse GLM Analysis image. The DCT basis set with a cut off frequency of 1/170
In this section, we have presented a study of different Hz was used to get rid of the low frequency drifts and the
dictionary learning algorithms on real fMRI datasets and have high frequency noise was removed by temporally smoothing
analyzed their abilities in recovering the underlying neural the BOLD time-series using a 2.0s FWHM Gaussian kernel.
dynamics present in the fMRI data. Datasets used in the The data matrix Y was then down sampled by a factor of
analysis were acquired from the Human Connectome Project 8 along the spatial direction in order to reduce computation
Q1 release [55]. The acquisition parameters for both datasets time of the dictionary learning stage. Starting with a dictionary
are: 90 × 104 matrix, 220mm FOV, 72 slices, TR = 0.72s, TE initialized with data vectors yi taken from random locations

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
SEGHOUANE et al.: SEQUENTIAL DICTIONARY LEARNING FROM CORRELATED DATA: APPLICATION TO FMRI DATA ANALYSIS 9

from data matrix Y, we used this data matrix to learn a Correlation = 0.530

dictionary of size Dl ∈ Rn×40 with the sparse coding stage


performed using correlation based thresholding [20] with the
sparsity level of s = 3 for all algorithms , resulting in a
sparse coefficient matrix X ∈ R40×N with Y = Dl X. The KSVD
first atom of the dictionary was set as a DC component in
order to capture the remaining drift of the signals and was
never updated in the subsequent dictionary update stages. The 20 40 60 80 100 120 140 160 180 200

hyper parameter for A3 algorithm was set as 0.03. We used Correlation = 0.778

different values for sparsity (2−6) and hyper-parameter (0−1)


and selected the ones giving best results for the particular
algorithm. The sparse coding and dictionary update stages
were iterated 20 times upon observing that almost all the A1
dictionary learning algorithms were converging in 12 to 15
iterations.
After decomposition of datasets into dictionary atoms, they 20 40 60 80 100 120 140 160 180 200
were processed in a sparse GLM framework [20] that allowed Correlation = 0.810

the calculation of F-statistics [61] for activation detection. In


order to classify voxels for presence of the activation signal,
the most correlated dictionary atom with the modeled hemo-
dynamic response (MHR), along with other non-zero s − 1 A2
atoms, were chosen as regressors for the sparse GLM analysis.
The MHR was constructed by convolving the canonical HRF
and its derivatives with the 5 stimulus functions and their 20 40 60 80 100 120 140 160 180 200
corresponding correlation coefficients w.r.t the most correlated Correlation = 0.775
dictionary atom are given in table VII. The activation maps
results based on F-test at a p-value threshold of 0.0001 for left
hand and right hand finger tapping task are presented in Fig.
7, and Fig. 8 respectively. The atoms and the corresponding A3
MHR used to generate activation maps shown in Fig. 7 are
also shown in Fig. 6.
Due to the non-availability of the ground truth, we had to 20 40 60 80 100 120 140 160 180 200
rely on the recovered temporal dynamics in order to compare Fig. 6. The most correlated atoms (in red) with MHR corresponding to
the activation results among different algorithms, therefore, Left Hand (LH) movement task (in blue) as recovered by dictionary learning
in correlation terms the proposed algorithms can be seen to algorithms.
have produced superior results compared to other methods.
Moreover, for all the algorithms, the neural activations are
identified in the motor area and the results are more or other tend to have a similar low-frequency temporal fluctuation
less similar, however, the activations maps obtained by A2 pattern. So, if we know that a particular brain region is a part
corresponding to the left finger tapping and A1 for the right of a functionally connected networks (FCN) [62], then we can
finger tapping being more sensitive. select the temporal signal associated with that region as a seed
and use it to check which other regions of the brain have a
TABLE VII similar temporal signal associated with them. Now, in order
C ORRELATION COEFFICIENTS OF HIGHEST CORRELATED ATOM WITH for this analysis technique to work, the chosen seed-voxel has
MHR CORRESPONDING TO L EFT F OOT (LF), R IGHT F OOT (RF), L EFT
H AND (LH), R IGHT H AND (RH), AND T ONGUE MOVEMENT TASKS AS to belong to a set of correlated voxels corresponding to any
RECOVERED BY DIFFERENT DICTIONARY LEARNING ALGORITHMS . FCN [62]. These seeds may correspond to salience network
(SN), dorsal attention network (DAN), default mode network
LF RF LH RH Tongue Mean (DMN) or any other FCN. To be on the safe side, we have
KSVD 0.405 0.527 0.530 0.895 0.846 0.641 chosen to use the seed-voxels that have already been attributed
A1 0.692 0.553 0.778 0.868 0.881 0.754
A2 0.536 0.624 0.810 0.879 0.800 0.730
as belonging to different DMNs in [64] and [65]. The standard
A3 0.526 0.446 0.776 0.879 0.751 0.675 MNI coordinated of these seed-voxels are presented in table
VIII.
2) Seed-Voxel based rsfMRI Analysis: The analysis of The preprocessed [60] resting state fMRI dataset analyzed
resting state fMRI data can be carried out in a few different in this section contained 1200 scans with a scan run duration
ways [62] [63] out of which we have selected to use the of 14:33 (min:sec). We selected the first 420 scans (302.4
seed-voxel-based correlation analysis technique [63]. This sec), discarding the first 15 and used the remaining 405
technique is based on the assumption that in resting state, scans for analysis. The scans were spatially smoothed using a
the brain regions that are functionally connected with each 6mm x 6mm x 6mm FWHM Gaussian kernel and masked to

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
10 IEEE TRANSACTIONS ON IMAGE PROCESSING

remove any artifacts present outside the brain. The resulting


scanned images were vectorized and placed as rows of the
matrix Y ∈ Rn×N where n = 405 being the number of
time points and N being the number of voxels in an image.
a) The DCT basis set with a cut off frequency of 1/150 Hz
was used to get rid of the low frequency drifts and the
high frequency noise was removed by temporally smoothing
the BOLD time-series using a 1.5s FWHM Gaussian kernel.
The data matrix Y was then downsampled along the spatial
b) dimension by a factor of 4 to reduce the computation time of
the dictionary learning algorithms. All algorithms were used to
learn dictionaries D ∈ Rn×80 where n = 405. The dictionaries
were initialized with data elements from the experimental
dataset and correlation based thresholding was used in sparse
c) coding stage [20]. The sparsity level was selected as s = 2
and algorithms were iterated 20 times. The hyper-parameter α
used for algorithm A3 was set to 0.03.

To analyze the dataset, we extracted the mean time-series


d) corresponding to cubes of 3 × 3 × 3 voxels centered at each of
the seed locations as given in table VIII. As the ground truth
is not available, we had to rely on the recovered temporal
dynamics (atoms) in order to compare the performance among
Fig. 7. F-statistics activation maps for Left Hand finger tapping task for different algorithms. To accomplish this, we correlated the
the coronal slices from −19 to −16 mm threshold at random field correction
p < 0.0001 obtained by using the design matrices constructed with dictionary mean time-series from different seed regions with all learned
atoms recovered by algorithms a) KSVD, b) A1 , c) A2 , and d) A3 dictionaries to find the most correlated atoms. The correspond-
ing correlation coefficients are presented in the table VIII. It is
evident from the table that dictionaries learned by the proposed
algorithms were able to deliver the best overall results with
A2 ’s results being best. We used the most correlated atoms
with seed-voxels corresponding to ventral posterior cingulate
(VPC) and ventral medial prefrontal cortex (VMPC) in order
to look for the activated brain regions having similar temporal
dynamics. These activations have been presented in Fig. 9
a) and Fig. 11 respectively showing different regions of the
brain activating at the same time. The activations recovered
by all algorithms are quite similar, however, A2 was able
to recover more activated regions and A3 ’s activation maps
have recovered very distinct peaks as seen in Fig. 9 c) and d)
b) respectively. For sake of completion, the corresponding atoms
used to recover the activations shown in Fig. 9 are also shown
in Fig. 10. The networks shown in Fig. 9 and Fig. 11 are very
similar to the DMNs reported in Fig. 5 of [65] named IC53
and IC25 respectively.
c) In both fMRI experiments, the column covariance Q is
estimated from the data whereas the row covariance R is taken
to be an exponential smoothing matrix Rij = exp(−|i−j|2 /σ)
where σ = 11.46 to model the correlation associated to the
convolution of the functional images with a Gaussian kernel
d) with FWHM = 27. The FWHM is taken (3 times the voxel
size) to satisfy the estimate required to meet the assumptions
of random fields theory [25][66]. A complete analysis of
Fig. 8. F-statistics activation maps for Right Hand finger tapping task for the convergence properties of the proposed algorithms is not
the coronal slices from −30 to −27 mm threshold at random field correction provided in this paper. However, we use the block design
p < 0.0001 obtained by using the design matrices constructed with dictionary motor task dataset to provide a numerical analysis of the
atoms recovered by algorithms a) KSVD, b) A1 , c) A2 , and d) A3
convergence rate for each of the proposed algorithms. In order
to illustrate the convergence, at every iteration, we inspect the
relative change of the dictionary D w.r.t the Frobenius norm,

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
SEGHOUANE et al.: SEQUENTIAL DICTIONARY LEARNING FROM CORRELATED DATA: APPLICATION TO FMRI DATA ANALYSIS 11

(a) (b) (c) (d)


a) b) c) d)

Fig. 11. DMN recovered by atoms which are most correlated with seed-voxel
corresponding to the Ventral Medial Prefrontal Cortex [64] [65] threshold at
random field correction p < 0.005 from dictionaries learned by a) KSVD,
b) A1 , c) A2 , and d) A3 algorithms.
Fig. 9. DMN recovered by the atoms which are most correlated to seed-
voxel corresponding to the Ventral Posterior Cingulate [64] [65] threshold at 1.6
random field correction p < 0.005 from dictionaries learned by a) KSVD, b) A1
A1 , c) A2 , and d) A3 algorithms.
1.4 A2
A3
Most correlated atom vs mean TS from VPC seed-voxel
1.2
0.752

||Di - D i-1 ||F / ||D i-1 ||F

(a) 1

0.8
0.840

(b) 0.6

0.4
0.771

(c) 0.2

0
0 2 4 6 8 10 12 14 16 18 20
Number of Iterations
0.807

(d)
Fig. 12. Relative change in the dictionary D w.r.t Frobenious norm as a
function of iteration number.

Fig. 10. Most correlated atoms (in red) vs the mean time series t (in blue)
extracted from the 6 mm3 cube centered at the seed voxel corresponding to
the Ventral Posterior Cingulate recovered by a) KSVD, b) A1 , c) A2 , and d) learning algorithms ignoring these structures in the data ma-
A3 algorithms. trices will inevitably result in lower performance. Using the
Q, R−norm instead of the widely used Frobenius norm for
matrix approximation, three variants of the K-SVD algorithm
that is defined as
||Di − Di−1 ||F dedicated to fMRI data analysis were proposed in this paper.
||Di−1 ||F The resulting algorithms differ from the K-SVD in both
their sparse coding and dictionary update stages. The first
where Di indicates the dictionary at iteration i. One can two proposed algorithms account for the known correlation
observe from Fig. 12 that the relative change of the dictionaries structure in the fMRI data only whereas the third algorithm
does go to zero as the number of iterations increases. However, accounts for both the known correlation structure in the fMRI
the relative change after 12-15th iteration is very small, thus data and the temporal smoothness. While we used regulariza-
our choice of using 20 iterations to learn the dictionaries. tion via penalization in the dictionary update stage to obtain
smooth dictionary atoms in this last algorithm, other forms or
VI. C ONCLUSION regularization approaches could be used among them one can
Big data sets arising from spatio-temporal measurements in cite basis and sparse basis expansions to adapt to other forms
fMRI studies are structured data sets. The reshaped datasets of prior information. The obtained procedure for the dictionary
are structured matrices with both notions of spatio-temporal update stage in the second and third proposed algorithms can
correlation and temporal smoothness and classical dictionary be seen as variants of the power method or alternating least

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
12 IEEE TRANSACTIONS ON IMAGE PROCESSING

TABLE VIII
MNI COORDINATES OF THE SELECTED S EED -VOXELS IN MM AND THE CORRELATION COEFFICIENT OF THE MOST CORRELATED ATOM WITH THE MEAN
TIME - SERIES OBTAINED FROM THE CUBE CENTERED AT THE GIVEN MNI COORDINATES

MNI Coordinates
x y z KSVD A1 A2 A3
Ventral Posterior Cingulate 2 -46 28 0.752 0.840 0.770 0.807
Dorsal Posterior Cingulate 0 -24 38 0.750 0.730 0.700 0.743
Left Inferior Parietal Lobe -56 -66 24 0.760 0.848 0.869 0.827
Right Inferior Parietal Lobe 50 -62 30 0.830 0.862 0.836 0.874
Ventral Medial Prefrontal Cortex 6 70 14 0.743 0.750 0.827 0.736
Cingulate Gyrus 5 45 10 0.697 0.753 0.747 0.731
Precuneus Cortex 9 -70 43 0.718 0.726 0.746 0.767
Precuneus Cortex -7 -60 22 0.743 0.687 0.750 0.745
Middle Frontal Gyrus -27 30 45 0.804 0.809 0.806 0.795
Mean 0.755 0.778 0.783 0.781

square method for computing the SVD [50]. The presented


results obtained using simulated data showed that the proposed 
>

k Y − DX k2Q,Rc = tr Q (Y − DX) Rc (Y − DX)
algorithms outperform state of art algorithms when the data  
are correlated or smooth. A validation on datasets obtained >
= tr (Y − DX) Q (Y − DX) Rc
from real fMRI experiments was also included. While we have  
>
focused on fMRI data sets, the proposed algorithms can be = tr (Y − DX) [Q (Y − DX) Rc ]
applied to other problem where known correlation is present >
in the data sets. = (vec ((Y − DX))) .vec (Q (Y − DX) Rc )
>
= (vec ((Y − DX))) (Rc ⊗ Q) vec (Y − DX)
= k Zỹ − Z(IN ⊗ D)x̃ k22 (20)
A PPENDIX A  
T /2
where Z = (Rc ⊗ Q) = QT /2 ⊗ RTc /2 , Z> =
{d̂k , x̂row
k } 1/2
 
(Rc ⊗ Q) = R1/2 c ⊗ Q1/2 and vec (Y − DX) = ỹ −
= arg min k ERk − dk xk
row 2
kΩ−1 ,Λ−1 (IN ⊗D)x̃ with ỹ and x̃ being the vectors obtained by stacking
dk ,xrow
k the columns of Y and X respectively.
tr Ω−1 ER row
 −1
= arg min row k − dk xk Λ
dk ,xk
A PPENDIX C
row >
 
. ERk − dk xk Derivations of equations (14) and (15).
row >
  
R row R

= arg min tr Q E k − dk xk R E k − dk xk
dk ,xrow
k {d̂k , x̂row
k }
row >
 
R row R
 
= arg min tr Ek − d k x k R Ek − dk x k Q = arg min k ER k − dk xk
row 2
kQ,R
dk ,xrow
k dk ,xrow
k

tr L> R row

Q Ek − dk xk row >
 
= arg min row = arg min tr E R
− d xrow

R E R
− d x

Q
dk ,xk k k k k k k
dk ,xrow
k
row >

. LR L> R


   
E d x
k k L Q = arg min tr ER R >
row row> >
k REk Q + tr dk xk Rxk dk Q
R k
dk ,xrow
> R
tr LQ Ek LR − L> row  k

= arg min Q dk xk LR

dk ,xrow row> >
k − 2tr ER k Rxk dk Q
> 
. > R
LQ Ek LR − L> Q d k x row
k L R

R R>
 
> row row>

= arg min tr E k RE k Q + tr dk Qdk xk Rx k
> dk ,xrow
>
Y − vv> .  k

= arg min Y − uv (19) 
row>
u,v
− 2tr d> R
k QEk Rxk (21)
By taking the SVD of Y = L> R
Q Ek LR we obtain the fist left For fixed dk and k dk k2Q = 1, the xrow that minimizes (21) is
k
singular vector (first column of U) u = L> Q dk and the first derived from
right singular vector (first column of V) multiplied by the 
row>
 
row>

largest singular value v> = xrow x̂row = arg min tr xrow
k Rxk − 2tr d> R
k QEk Rxk
k LR from which we deduce
k row xk
−1 row > −1
d̂k = L> Q u and x̂k = v LR . which gives
x̂row
k = d> R
k QEk ,

A PPENDIX B whereas, for fixed xrow


k , the dk that minimizes (21) is derived
from
Derivations of equations (12).    
row> row>
d̂k = arg min tr d> row
k Qdk xk Rxk −2tr d> R
k QEk Rxk
dk

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
SEGHOUANE et al.: SEQUENTIAL DICTIONARY LEARNING FROM CORRELATED DATA: APPLICATION TO FMRI DATA ANALYSIS 13

which with the constraint k dk k2Q = 1 gives [10] A. K. Seghouane and A. Shah, “HRF estimation in fMRI data with
unknown drift matrix by iterative minimization of the Kullback-Leibler
> divergence,” IEEE Transactions on Medical Imaging, pp. 192–206,
ERk Rx row
k 2012.
d̂k = > .
k ERk Rxk
row k
Q [11] A. Shah and A. K. Seghouane, “An integrated framework for joint HRF
and drift estimation and HbO/HbR signal improvement in fNIRS data,”
IEEE Transactions on Medical Imaging, vol. 33, pp. 2086–2097, 2014.
A PPENDIX D [12] A. K. Seghouane and Y. Saad, “Prewhitening high dimensional fmri
data sets without eigendecomposition,” Neural Computation, vol. 26,
Derivations of equations (17) and (18). pp. 907–919, 2014.
row
{d̂k , x̂k } [13] K. J. Friston, C. D. Frith, P. F. Liddle, and R. S. J. Frackowiak,
“Functional connectivity: the principal component analysis of large PET
R row 2 > data sets,” Journal of Cerebral Blood Floww and Metabolism, vol. 13,
= arg min k E k − d x
k k k Q,R +αd k Φd k
dk ,xrow
k
pp. 5–14, 1993.
[14] A. Andersen, D. Gash, and M. Avion, “Principal component analysis of
subject to kdk kQ = 1 the dynamic response measured by fMRI: a generalized linear system
   
R R> > row row> framework,” Magnetic Resonance Imaging, vol. 17, pp. 795–815, 1999.
= arg min tr Ek REk Q + tr dk Qdk xk Rxk [15] B. Thirion and O. Faugeras, “Dynamical components analysis of fMRI
dk ,xrow
 k  data through kernel PCA,” Neuroimage, vol. 20, pp. 145–155, 2003.
row>
− 2tr d> R
k QEk Rxk + αd> k Φdk
[16] M. McKeown and T. Sejnowski, “Independent component analysis of
fMRI data: examining the assumptions,” Human Brain Mapping, vol.
subject to kdk kQ = 1 (22) 6, pp. 368–372, 1998.
[17] B. Biswal and J. Ulmer, “Blind source separation of multiple signal
For fixed xrow sources of fMRI data sets using independent component analysis,”
k , the dk that minimizes (22) is derived from Journal of Computer Assisted Tomography, vol. 23, pp. 265–271, 1999.

> >
 [18] V. D. Calhoun, T. Adali, G. D. Pearlson, and J. J. Pekar, “Spatial
d̂k = arg min tr dk Qdk xrow k Rxk
row
and temporal independent component analysis of functional MRI data
dk
  containing a pair ot task-related waveforms,” Human Brain Mapping,
row>
− 2tr d> k QE R
k Rx k + αd >
k Φd k
vol. 13, pp. 43–53, 2001.
[19] T. Adali and V. D. Calhoun, “Unmixing fMRI with independent
component analysis,” IEEE Engineering in Medicine and Biology
which gives Magazine, vol. 25, pp. 79–90, 2006.
[20] K. Lee, S. K. Tak, and J. C. Yee, “A data driven sparse GLM for fMRI
−1 row> d̂k analysis using sparse dictionary learning and MDL criterion,” IEEE
d̂k = k xrow k k2R Q + αΦ QER k Rxk with d̂k = Transactions on Medical Imaging, vol. 30, pp. 1176–1089, 2011.
k d̂k kQ [21] V. Abolghasemi, S. Ferdowsi, and S. Sanei, “Fast and incoherent
dictionary learning algorithms with application to fMRI,” Signal, Image
where the scaling corresponds to the constraint kdk kQ = 1. and Video Processing, vol. 9, pp. 147–158, 2013.
For fixed dk and kdk kQ = 1, the xrow k that minimizes (22) is [22] M. U. Khalid and A. K. Seghouane, “A single SVD sparse dictionary
derived from learning algorithm for fMRI data analysis,” In Proceedings of IEEE
    International Workshop on Statistical signal Processing, pp. 65–68,
row row row> > R row> 2014.
x̂k = arg min tr xk Rxk − 2tr dk QEk Rxk [23] M. U. Khalid and A. K. Seghouane, “Improving functional connectivity
xrow
k
detection in fMRI by combining sparse dictionary learning and canonical
which gives correlation analysis,” In Proceedings of IEEE International Symposium
row > R on Biomedical Imaging, pp. 286–289, 2013.
x̂k = dk QEk . [24] M. U. Khalid and A. K. Seghouane, “Constrained maximum likelihood
based efficient dictionary learning for fMRI analysis,” In Proceedings
of IEEE International Symposium on Biomedical Imaging, pp. 45–48,
2014.
[25] M. A. Lindquist, “The statistical analysis of fMRI data,” Statistical
R EFERENCES Science, vol. 23, pp. 439–464, 2008.
[26] B. A. Olshausen and D. J. Field, “Sparse coding with overcomplete
[1] K. J. Worlsey and K. Friston, “Analysis of fMRI time-series revisited basis: a strategy employed by v1,” Vision Research, vol. 37, pp. 3311–
again,” NeuroImage, vol. 2, pp. 173–181, 1995. 3325, 1997.
[2] K. J. Worlsey, C. H. Liao, J. Aston, V. Petre, G. H. Duncan, F. Morales, [27] I. Daubechies, E. Roussos, S. Takerkart, M. Benharrosh, C. Golden,
and A .C. Evans, “A general statistical analysis for fMRI,” NeuroImage, K. D’Ardenne, W. Richter, J. D. Cohen, and J. Haxby, “Indepedent
vol. 15, pp. 1–15, 2002. component analysis for brain fMRI does not select for independence,”
[3] K. Friston, J. Ashburner, S. Keibel, T. Nichols, and W. E. Penny, Sta- Proceedings of the National Academy of Sciences, vol. 106, pp. 10415–
tistical Parametric Mapping: The Analysis of functional Brain Images, 10422, 2009.
New York Academic, 2006. [28] V. D. Calhoun, V. K. Portluru, R. Phlypo, R. F. Silva, B. A. Pearlmutter,
[4] Mark S. Cohen, “Parametric analysis of fMRI data using linear systems A. Caprihan, S. M. Plis, and T. Adali, “Independent component analysis
methods,” NeuroImage, vol. 6, pp. 93–103, 1997. for brain fMRI does indeed select for maximal independence,” PloSOne,
[5] R. L. Buckner, “Event-related fMRI and the hemodynamic response,” vol. 8, pp. p.e73309, 2013.
Human Brain Mapping, vol. 6, pp. 373–377, 1998. [29] J. Lv, X. Li, D. Zhu, X. Jiang, X. Zhang, X. Hu, T. Zhang, L. Guo,
[6] G. H. Glover, “Deconvolution of impulse response in event-related and T. Liu, “Sparse representation of group-wise fMRI signals,” Med
BOLD fMRI,” Neuroimage, vol. 9, pp. 416–429, 1999. Image Comput Comput Assist Interv. , MICCAI, pp. 608–616, 2013.
[7] D. A. Handwerker, J. M. Ollinger, and M. D’Esposito, “Variation of [30] Z. Jiang, Z. Lin, and L. S. Davis, “Label consistent K-SVD: Learning a
BOLD hemodynamic responses across subjects and brain regions and discriminative dictionary for recognition,” IEEE Transactions on Pattern
their effects on statistical analyses,” Neuroimage, vol. 21, pp. 1639– Analysis and Machine Intelligence, vol. 35, pp. 2651–2664, 2013.
1651, 2004. [31] S. Zhang, X. Li, J. Lv, X. Jiang, D. Zhu, H. Chen, T. Zhang, L. Guo,
[8] M. A. Lindquist and T. D. Wager, “Validity and power in hemodynamic and T. Liu, “Sparse representation of higher order functional interaction
response modeling: A comparison study and a new approach,” Human patterns in task based fMRI data,” Med Image Comput Comput Assist
Brain Mapping, vol. 28, pp. 764–784, 2007. Interv. , MICCAI, pp. 626–634, 2013.
[9] G. Strangman, J. Culver, J. Thompson, and D. Boas, “A quantitative [32] S. Zhao et al., “Supervised dictionary learning for inferring concurrent
comparison of simultaneous BOLD fMRI and NIRS recordings during brain networks,” IEEE Transactions on Medical Imaging, vol. 34, pp.
functional brain activation,” Neuroimage, vol. 17, pp. 719–731, 2002. 2036–2045, 2015.

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2686014, IEEE
Transactions on Image Processing
14 IEEE TRANSACTIONS ON IMAGE PROCESSING

[33] X. Hu et al., “Sparsity constrained fMRI decoding of visual saliency in [57] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionary learning
naturatistic video steams,” IEEE Transactions on Autonomous Mental for sparse coding,” In Proceedings of the International Conference on
Development, vol. 7, pp. 65–75, 2015. Machine Learning (ICML), 2009.
[34] G. Varoquaux, A. Gramfort, F. Pedregosa, V. Michel, and B. Thirion, [58] J. A. Tropp and A. C. Gilbert, “Signal recovery from random mea-
“Multisubject dictionary learning to segment an atlas of brain sponta- surements via orthogonal matching pursuit,” IEEE Transactions on
neous activity,” Information Processing in Medical Imaging, vol. 6801, Information Theory, vol. 53, pp. 4655–4666, 2007.
pp. 562–573, 2011. [59] J. Wise, “The autocorrelation function and the spectral density function,”
[35] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An Anlgorithm Biometrika, vol. 42, no. 1/2, pp. 151–159, 1955.
for desiging overcomplete dictionaries for sparse representation,” IEEE [60] Matthew F. Glasser, Stamatios N. Sotiropoulos, J. Anthony Wilson, Tim-
Transactions on Signal Processing, vol. 54, pp. 4311–4322, 2006. othy S. Coalson, Bruce Fischl, Jesper L. Andersson, Junqian Xu, Saad
[36] A. K. Seghouane and M. Hanif, “A sequential dictionary learning Jbabdi, Matthew Webster, Jonathan R. Polimeni, David C. Van Essen,
algorithm with enforced sparsity,” In Proceedings of the International and Mark Jenkinson, “The minimal preprocessing pipelines for the
Conference Acoustic, Speech and Signal Processing (ICASSP), pp. human connectome project,” NeuroImage, vol. 80, pp. 105–124, 2013.
3876–3880, 2015. [61] B. A. Ardekani, J. Kershaw, K. Kashikura, and I. Kanno, “Activation
[37] S. K. Sahoo and A. Makur, “Dictionary training for sparse representation detection in functional mri using subspace modeling and maximum
as generalization of K-means clustering,” IEEE Signal Processing likelihood estimation,” IEEE Transactions on Medical Imaging, vol.
Letters, vol. 20, pp. 587–590, 2013. 18, no. 2, pp. 101–114, 1999.
[38] K. Engan, S. O. Aase, and J. Hakon-Husoy, “Method of optimal [62] Christian F Beckmann, Marilena DeLuca, Joseph T Devlin, and
directions for frame design,” IEEE Int. Conference on Acoustics, Speech, Stephen M Smith, “Investigations into resting-state connectivity using
and Signal Processing, pp. 2443–2446, 1999. independent component analysis,” Philosophical Transactions of the
Royal Society of London B: Biological Sciences, vol. 360, no. 1457, pp.
[39] M. Hanif and A. K. Seghouane, “Maximum likelihood orthogonal
1001–1013, 2005.
dictionary learning,” IEEE Workshop on Statistical Signal Processing
[63] Joanne R. Hale, Matthew J. Brookes, Emma L. Hall, Johanna M. Zumer,
(SSP), pp. 141–144, 2014.
Claire M. Stevenson, Susan T. Francis, and Peter G. Morris, “Com-
[40] K. Kreutz-Delgado, J. F. Murray, B. D. Rao, K. Engan, T. W. Lee, and parison of functional connectivity in default mode and sensorimotor
T. J. Sejnowski, “Dictionary learning algorithms for sparse representa- networks at 3 and 7t,” Magnetic Resonance Materials in Physics,
tion,” Neural Computation, vol. 15, pp. 349–396, 2003. Biology and Medicine, vol. 23, no. 5, pp. 339–349, 2010.
[41] M. S. Lewicki and T. J. Sejnowski, “Learning overcomplete represen- [64] Robert Leech, Salwa Kamourieh, Christian F Beckmann, and David J
tations,” Neural Computation, vol. 12, pp. 337–365, 2000. Sharp, “Fractionating the default mode network: distinct contributions
[42] K. Skretting and K. Egan, “Recursive least squares dictionary learning of the ventral and dorsal posterior cingulate cortex to cognitive control,”
algorithm,” IEEE Transactions on Signal Processing, vol. 58, pp. 2121– The Journal of Neuroscience, vol. 31, no. 9, pp. 3217–3224, 2011.
2130, 2010. [65] V. D. Calhoun and T. Adali, “Multisubject independent component
[43] P. L. Purdon, V. Solo, R. M. Weissko, and E. Brown, “Locally regu- analysis of fmri: A decade of intrinsic networks, default mode, and
larized spatiotemporal modeling and model comparison for functional neurodiagnostic discovery,” IEEE Reviews in Biomedical Engineering,
MRI,” NeuroImage, vol. 14, pp. 912–923, 2001. vol. 5, pp. 60–73, 2012.
[44] Y. Takane, Constrained Principal Component Analysis and related [66] N. Lazar, The Statistical Analysis of Functional MRI Data, New York:
techniques, Chapman and Hall, 2014. Springer, 2008.
[45] A. Genevera, L. Grosenick, and J. Taylor, “A generalized least-square
matrix decomposition,” Journal of the American Statistical Association,
vol. 109, pp. 145–159, 2014.
[46] A. K. Seghouane and M. U. Khalid, “Learning dictionaries deom
correlated data: Application fo fMRI data analysis,” In Proceedings
of the IEEE International Conference on Image Processing, ICIP, pp.
2340–2344, 2016.
[47] J. Z. Huang, H. Shen, and A. Buja, “Functional principal components
analysis via rank one approximation,” Electronic Journal of Statistics,
vol. 2, pp. 678–695, 2008.
[48] J. O. Ramsay and B. W. Silverman, Functional Data Analysis, Springer-
Verlag, 2005.
[49] J. Tropp and S. J. Wright, “Computational methods for sparse solution
of linear inverse problems,” Proceedings of the IEEE, vol. 98, pp. 948–
958, 2010.
[50] G. H. Golub and C. f. Van Loan, Matrix Computations, Johns Hopkins,
1996.
[51] T. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley,
2003.
[52] K. V. Mardia, J. M. Bibby, and J. T. Kent, Multivariate Analysis,
Academic Press, 1979.
[53] A. K. Gupta and D. K. Nagar, Matrix Variate Distributions, Monographs
and Surveys in Pure and Applied Mathematics, Chapman and Hall, CRC
Press, 1999.
[54] P. CiuCiu, J. B. Poline, G. Marrelec, J. Iider, C. Pallier, and H. Benali,
“Unsupervised robust nonparamtric estimation of the hemodynamic
response function for any fMRI experiment,” IEEE Transactions on
Medical Imaging, vol. 22, pp. 1235–1251, 2003.
[55] Deanna M. Barch, Gregory C. Burgess, Michael P. Harms, Steven E.
Petersen, Bradley L. Schlaggar, Maurizio Corbetta, Matthew F. Glasser,
Sandra Curtiss, Sachin Dixit, Cindy Feldt, Dan Nolan, Edward Bryant,
Tucker Hartley, Owen Footer, James M. Bjork, Russ Poldrack, Steve
Smith, Heidi Johansen-Berg, Abraham Z. Snyder, and David C. Van Es-
sen, “Function in the human connectome: Task-fmri and individual
differences in behavior,” NeuroImage, vol. 80, pp. 169–189, 2013.
[56] Randy L. Buckner, Fenna M. Krienen, Angela Castellanos, Julio C.
Diaz, and B. T. Thomas Yeo, “The organization of the human
cerebellum estimated by intrinsic functional connectivity,” Journal of
Neurophysiology, vol. 106, no. 5, pp. 2322–2345, 2011.

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like