Professional Documents
Culture Documents
)
Progress in Brain Research, Vol. 159
ISSN 0079-6123
Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 6
Laboratory of Brain– Computer Interfaces (BCI-Lab), Institute for Knowledge Discovery, Graz University of
Technology,Graz, Austria
Abstract: In this chapter we review the traditional approach for ERD/ERS quantification and a more
recent approach based on wavelet transform. In particular, we address the visualization of these phenom-
ena and the validation of the results through statistical significance testing. Furthermore, we report on
preprocessing using independent component analysis (ICA) and introduce a novel ERD/ERS maximization
method.
DOI: 10.1016/S0079-6123(06)59006-5 79
80
resolution. Consequently, the frequency bands used while for MEG and EcoG, a broader range would
for ERD/ERS analysis should be narrower for be more appropriate because of the superior SNR
lower frequencies and wider for higher frequencies. of these recording methods.
As an alternative to the previously described Figure 2 shows four ERD/ERS maps calculated
standard method of band-pass filtering, a contin- from ECoG data that was recorded with a sam-
uous wavelet approach can be used which is very pling rate of 200 Hz during a self-paced index
suitable to account for the trade-off between time finger movement experiment. As with the ERD/
and frequency resolution. By using the following ERS map shown in Fig. 1, the vertical line indi-
complex Morlet wavelet, cates movement onset. The reference interval,
pffiffiffi 2 2 however, was calculated over the entire trial
wðt; f 0 Þ ¼ ðst pÞ1=2 et =2st ei2pf 0 t (4) period. This was necessary because the trial length
the time-varying power values Pðt; f 0 Þ of the signal chosen of 6 s resulted in partially overlapping trials
at a specific frequency band with mid-frequency f 0 making it difficult to select one particular short
and bandwidth defined by st is the squared norm time period for calculating the baseline activity.
of the convolution of the complex wavelet with the The frequency range displayed in Fig. 2 is from
signal: 7 to 95 Hz, which provides for the fact that in
2 ECoG data not only alpha and beta frequencies
Pðt; f 0 Þ ¼ wðt; f 0 ÞnsðtÞ (5) are of interest but also gamma activity. Map A
With st ¼ 1=ð2psf 0 Þ and sf 0 a function of the was calculated by the standard method with 2-Hz
frequency to analyze determined by bands and a Hilbert transform to capture the
envelope of the filtered signal. The continuous
sf 0 ¼ f 0 =c, (6)
wavelet approach was used to produce maps B and
the compromise between time and frequency C. In the former, the trade-off factor between
localization is defined by the constant c, which is frequency and temporal resolution c was set to
usually set to 7 (Tallon-Baudry and Bertrand, 1999). 1 (cf. Eq. (6)), while in the latter, it was set to 7.
Higher values increase the frequency resolution A small value of c results in very good temporal
while lower values increase the time resolution. localization at the expense of poor frequency lo-
Both the classical method and the method calization. Setting the value to 7 gives reasonable
based on wavelets can be used to construct temporal and frequency resolution over the entire
time–frequency representations for multiple fre- frequency range investigated. By contrast, map A
quencies. In the former case, partially overlapping shows that using a fixed narrow frequency band
frequency bands are used while in the latter for the whole frequency range obscures the gamma
case the frequency to be analyzed, f 0 ; is varied activity that is clearly visible in the other maps.
over a specific frequency range. The result is a Broader frequency bands would be necessary to
time–frequency map — the so-called ERD/ERS capture the higher frequency components. A com-
map — representing ERD and ERS patterns cov- parison of the alpha and beta frequency ranges
ering the entire frequency spectrum of interest. from maps A and C shows that for this range both
Figure 1 illustrates the construction of such an methods produce equivalent results. Thus, the
ERD/ERS map. decision concerning which method should be cho-
ERD values are colored red, while ERS values sen for the frequency range typically used for in-
are shown in blue. The trigger time point (event vestigating EEG data is not critical. Map D is
onset) is marked by a dashed-dotted vertical line. equivalent to map C, but depicts only statistically
The reference period is indicated by the two dotted significant ERD/ERS values. This results in an
vertical lines in the ERD/ERS map. ERD/ERS map that is much clearer and easier to
As mentioned previously, the choice of the interpret. Without this enhancement, some of the
frequency range and the frequency bands depends features visible in the maps are not significant
on the actual application. For EEG analysis, a despite their appearance. It is therefore crucial to
frequency range of 7–40 Hz may be sufficient, suppress such information since it can lead to the
82
Fig. 1. Assembly of event-related desynchronization and event-related synchronization (ERD/ERS) maps. The time course of the
bandpower for each frequency band of interest is calculated and displayed as colors representing the instantaneous proportional
change relative to the baseline average. This information (examples on the left) is assembled to form the map on the right. Here the
center frequency of each band is denoted on the scale on the left of the map, while the time in seconds appears under the map. As
shown in the color legend below the map, values from 100% to +150% are scaled appropriately as various colors. The time courses
on the left also show the corresponding confidence intervals that can be used to display only statistically significant values (see Section
‘‘Statistical significance of ERD/ERS’’).
Fig. 2. ERD/ERS maps produced in different ways. All of the above maps represent ERD/ERS recorded from the same electrode.
Map A was calculated by the standard method. Maps B and C were calculated using the continuous wavelet approach with a ‘‘c’’
constant of 1 and 7, respectively. The former shows higher temporal resolution while the latter shows better frequency resolution. Map
C shows most clearly the gamma activity. Map D is the same as map C, but with only the statistically significant features shown. There,
it is much clearer where activity of interest exist.
Fig. 3. Normalized power distribution in the alpha and beta bands. The samples are skewed toward zero power for both bands.
84
estimate of the corresponding population param- since there is at least a weak dependency between
eter. That is, more robust statistical estimators consecutive samples. The block bootstrap (Chernik,
such as the median or the 95th percentile of 1999) may be used to overcome this problem. In
the bootstrap distribution can be considered as the block bootstrap, blocks of consecutive values
well (Chernik, 1999). The following procedure of the given data are considered and aligned into
shows how a bootstrap procedure, specifically a bootstrap sample. Blocks of samples are then
the t-percentile bootstrap (Graimann et al., resampled to form a new bootstrapped time series.
2002), may be applied to calculate confidence This bootstrap is more difficult to implement, and
intervals for the ERD/ERS estimates. more computationally intensive than the suggested
Let N be the number of trials, and B be the t-percentile bootstrap. To overcome or at least re-
number of boostrap resamples. E j denotes the set duce the independence problem of the t-percentile,
of all eij ¼ ðyij RÞ=R of the jth sample and of all the idea of building blocks can be applied by di-
trials quantified according to Eqs. (1) and (2) and viding the trials into small overlapping blocks and
calculated with the standard ERD/ERS method, calculating the mean values of the blocks which
the continuous wavelet, or any other suitable are then resampled by the t-percentile bootstrap.
method. The sample mean and the sample vari- This takes the weak dependency of consecutive
ance of E j are denoted by ēj and s2j ; respectively. samples into account.
For each sample j:
Draw N values from E j ; where each value is
Spatial Filters
selected at random, with replacement. The N
drawn values are the bootstrap resample E^ j :
ECoG recordings do not usually require any
Calculate the mean m^ E^ j and the standard devi-
pre-processing for quantifying ERD/ERS because
ation s^ E^ j of all N values in E^ j :
of their good spatial resolution and high SNR
Calculate m^ j ¼ m^ E^ j ȳj =s^ E^ j Repeat these steps
(Huggins et al., 1999; Graimann et al., 2002).
to obtain B boostrap estimates m^ j1 . . . m^ jB : B
By contrast, EEG signals show relatively poor
should be larger than 500.
spatial resolution due to the characteristic spatial
To approximate the distribution of m^ j ; sort all
blurring of its signal, caused by the large distance
estimates so that m^ jð1Þ m^ jð2Þ . . . m^ jðBÞ :
between signal sources and detectors and volume
The 100(1a)% confidence interval is deter-
conduction through scalp, skull, and other layers
mined by ½ēj sj m^ jðk2 Þ ; ēj sj m^ jðk1 Þ ; where k1 ¼
(Babiloni et al., 2001). In fact, conventional mono-
B a=2 and k2 ¼ B k1 þ 1:
polar EEG recordings, which are typically used if
Once the confidence intervals are calculated, the the number of electrodes is large, have insufficient
assessment of whether or not a value is significant spatial resolution to reveal localized ERD/ERS
is straightforward. An ERD/ERS value may be without the application of spatial filtering tech-
considered as significant with 100(1a)% confi- niques such as CAR and Laplacian derivations.
dence when both confidence values of this sample In the CAR, the average value of the entire
show the same sign. That is, an ERS value is sig- electrode montage (the common average) is sub-
nificant with, for example, 95% confidence when tracted from that of the channel of interest.
both 95% confidence limits of this value are pos- Because CAR emphasizes signal patterns that are
itive. And likewise, an ERD value is significant present in a large proportion of the electrode pop-
when both confidence limits are negative. ulation, it reduces such patterns and thereby func-
There is one difficulty that arises when attempt- tions as a high-pass spatial filter. That is, CAR
ing to apply the t-percentile bootstrap on a time accentuates components with highly focal distri-
series since it is designed to capture the statistics of butions (Nunez et al., 1994). For each electrode
data that are random samples from a distribution. location, the Laplacian method calculates the sec-
Unfortunately, in a time series the samples are not ond derivative of the instantaneous spatial voltage
mutually independent and therefore not random distribution, and thereby enhances signal patterns
85
originating in radial sources immediately below solely to the statistical relationships between the
each electrode (Nunez et al., 1994). Thus, this probability distributions of the signals involved. It
method is also a high-pass spatial filter that is the basic assumption upon which ICA is based.
accentuates localized patterns and moderates more However, this conjecture does not necessarily
diffuse activity. The value of the Laplacian at each imply that the generating neural structures are
electrode location is calculated by combining the independent as well. The last assumption, station-
value at that location with the values of a set of arity, is difficult to meet in general because it is
surrounding electrodes. The small Laplacian, well known that EEG is not stationary. In con-
which is an approximation of the surface Lapla- trast, ICA requires stationary stochastic processes
cian and is calculated from the four nearest neigh- to guarantee a meaningful decomposition of the
bors (Hjorth, 1975), can be seen as the standard linearly mixed sources. The easiest way to deal
pre-processing method in ERD/ERS analysis with this problem is to simply ignore the station-
(Pfurtscheller and Lopes da Silva, 1999). Recently, arity condition and hope that despite this, ICA will
it was shown that this simple approximation of the be able to yield reasonable results. Another way is
surface Laplacian yields equivalent results to to set up an experimental situation in which the
spherical spline interpolation (Tandonnet et al., recorded EEG signal can be viewed as stationary.
2005). However, modern signal processing meth- This is in fact the case in the experimental para-
ods like ICA offer an alternative to the conven- digm of ERD/ERS analysis. The controlled con-
tional spatial filters. In the following, we review ditions of the experimental paradigm in which
and discuss ICA in sufficient detail for ERD/ERS a task or event (e.g., motor activity) is repeated
analysis, and we also derive a new spatial filter several times implies that quasi-stationary EEG
technique based on CSPs which can be seen as segments are recorded some seconds prior to and
optimal for ERD/ERS analysis. some seconds after the task or event. Therefore,
it can be concluded that all three conditions
underlying the ICA model are quite reasonable
Independent component analysis for ERD/ERS analysis.
Owing to the weak assumptions that are made
ICA is a statistical signal processing technique for the ICA model, the order and the scale
that decomposes a multivariate input signal into (including the sign) of the independent compo-
statistically independent components (Hyvarinen nents are undetermined. The scale is unimportant
and Oja, 2000). The application of ICA to EEG or for ERD/ERS analysis because it is quantified as a
other bioelectrical recordings assumes that several relative measure. However, a completely undeter-
conditions are at least approximately fulfilled mined order of the components requires analyzing
(Vigario et al., 2000). The most important condi- all components because no a priori information
tions which are assumed to be met are: about the importance of each component is avail-
An EEG signal is a linear mixture of the able. If the number of channels is large, this might
activities of cerebral and other sources. become a problem, which could be alleviated
The source signals are statistically inde- by using principal component analysis (PCA) to
pendent. reduce the dimensionality of the problem before
The mixing of the sources and the sources ICA is applied (Jung et al., 2000; Vigario et al.,
themselves are stationary. 2000).
In addition to the temporal information of the
Since the propagation delays from the sources to independent components, ICA also provides
the electrodes are negligible and the summation of information about the topography of the compo-
brain potentials can be considered linear (Jung nents. The independent components are calculated
et al., 2000), the assumption that the mixing is by Y ¼ WX, where X denotes the EEG and W
instantaneous is met. The second assumption, is the unmixing matrix (the output of the ICA
statistical independence of the sources, refers algorithm). The equivalent expression X ¼ AY,
86
with A the mixing matrix (inverse of W), deter- words, to extract ERD we want to find a linear
mines the projection of the independent compo- transformation of the data that has maximal
nents onto the EEG channels. More specifically, power in the baseline period and minimal power
the projection of the ith independent component in the activity period. For ERS, we want minimal
onto the EEG channels is given by the outer prod- power in the baseline period and maximal power
uct of the ith row of Y with the ith column of in the activity period. Mathematically, this opti-
the mixing matrix A. That is, the columns of A are mization criterion can be expressed as
the projection strengths (or weights) of the inde- P 2
yi
pendent components onto the original signals. p1
This topographic information can be employed to w ¼ arg max ¼ P (7)
y2i
verify the physiological origins of the components. p2
It has been shown that ICA can be successfully
where w is the weighting vector (the first row in the
applied to reduce artifacts in EEG (Jung et al.,
spatial filter matrix), and pk ; k 2 ½1; 2 denotes
2000; Vigario et al., 2000). Since artifacts are quite
the baseline or reference period and the activity
independent from the rest of the signal, ICA is able
period, respectively. The general solution for this
to separate a wide variety of artifacts from EEG
problem is the principal component of the jointly
data by linear decomposition. Ocular artifacts, for
pre-whitened samples of the two periods under
example, can be easily eliminated by ICA. Muscle
consideration (Fukunaga, 1990), which is also
activity, however, is more difficult because, in
known as the method of CSPs (Koles et al., 1995;
general, the statistical properties of muscle arti-
Muller-Gerking et al., 1999). A derivation of the
facts do not correspond to the basic assumptions
solution in the context of ERD/ERS analysis is
of the ICA model. Nonetheless, this capability of
given in the appendix.
ICA has an important implication for the analysis
The result of this method is again a spatial filter
of ERD/ERS. Since the quantification of ERD/
matrix W that decomposes the multivariate EEG
ERS requires artifact-free trials, the EEG data has
signal into components that are optimal (accord-
to be visually inspected and trials with artifacts
ing to the optimization criterion formulated in
have to be precluded from further analysis. This
Eq. (7)) for ERD/ERS analysis. Since the optimi-
can result in a considerable loss of information.
zation criterion is directly formulated to optimize
Since ICA can separate most of the artifacts, a
ERD or ERS, we call this method ERDmax. As
selection of artifact-free trials is not necessary and
with ICA and all the other spatial filter methods
the full amount of information available (all trials)
discussed so far, the ERDmax approach does not
can be used to quantify ERD/ERS, which can
need reference-free EEG data. It can be applied to
increase the statistical significance of the results.
monopolar derived EEG or other bioelectrical
signals such as ECoG or MEG.
Maximization of ERD/ERS
Application of ERD/ERS analysis to movement-
Independent component analysis is a so-called related data
unsupervised method. That is, ICA does not use
explicit timing information such as which samples To demonstrate how well the Laplacian, ICA, and
of the recording belong to the baseline periods and ERDmax spatial filters perform in the quantifica-
which belong to the activity periods. In ERD/ERS tion of ERD/ERS, we present an analysis of a
analysis, however, this information is easily avail- brisk finger movement task. For the sake of clarity
able from the experimental paradigm used. In fact, of the comparison, we focus here only on the abil-
this information is necessary for the quantification ity of the three different methods to reveal ERD.
of ERD/ERS. Thus one might want to find a spa- We will also include in our comparison the result
tial filter that maximizes or minimizes the power of attempting to produce ERD/ERS maps directly
ratio in the baseline and activity periods. In other from raw monopolar data to emphasize the
87
with the topographic information of the scalp selected to be from 3.5 to 2 s before trigger onset.
electrodes were used to calculate topographic The t-percentile bootstrap with 1000 bootstrap
maps by means of cubic spline interpolation. repetitions were performed, to determine ERD/
These maps were individually normalized so ERS values with a significance of p ¼ 0.01.
that the largest absolute value was 1. This
was done not only to increase the color con-
trast, but also to obtain a single color legend Results
that could be used for all topographic maps.
3. ERDmax: To achieve ERD/ERS maps with ERD/ERS analyses for all 12 subjects using mono-
maximal ERD, ERDmax was applied to the polar data and data filtered by the three different
data according to the algorithm defined in the spatial filters were performed. To conserve space,
appendix. The reference period was selected we show topographically arranged ERD/ERS
to be from 3.5 to 2 s before trigger offset, maps for only one particular subject. This dataset
which is the same reference period used for was chosen because not only it showed the char-
calculating the ERD/ERS maps (see below). acteristic movement-related patterns but also the
The optimization time frame for the ERD data were contaminated by artifacts. This provides
samples was selected to be 0.5 s prior and 0.5 s the opportunity to observe how the algorithms
after trigger offset. Reference and ERD sam- work in the presence of artifacts. Figure 5 shows
ples were band-pass filtered between 7 and the ERD/ERS maps that are topographically
13 Hz.Hence, the optimization process was arranged for monopolar (Fig. 5A) and Laplacian
focused on this limited frequency band and data (Fig. 5B). The spatial resolution has been
the 1-s time period around the trigger. Similar considerably increased by the application of the
to ICA, the columns of the inverse of the Laplacian spatial filter. Pronounced ERD activity
ERDmax unmixing matrix were used to cal- that is almost nonexistent in the monopolar data is
culate topographic maps. Again, the topo- clearly visible in the maps that are derived from
graphic maps were individually normalized to electrodes that are contralateral to the movement
an absolute value of 1 to increase the color side (electrodes C3 and neighboring electrodes),
contrast. and — although not so prominent — ipsilateral
(electrode C4). Laplacian filtering also reveals
All channels pre-processed by the three different artifacts in electrodes 1, 7, and the bordering
spatial filters were subjected to ERD/ERS analysis right-hand electrodes.
according to the standard procedure outlined in Figure 6 depicts the ERD/ERS maps for the
Section ‘‘Quantification of ERD/ERS’’. EEG seg- same subject after ICA spatial filtering (Fig. 6A)
ments (trials) of a length of 8 s and starting at 4 s and ERDmax spatial filtering (Fig. 6B). An impor-
prior to movement offset were extracted. ERD/ tant difference here from the maps shown in Fig. 5
ERS maps comprised frequency bands of 2 Hz is the fact that they are not topographically
with 1-Hz overlap in a frequency range of 7–34 Hz arranged anymore. Since these two methods do
were calculated. Filtering was done in the fre- not employ a simple spatial relationship of the
quency domain using Hamming windows. Filter- EEG channels like the Laplacian derivation, a
ing in the frequency domain had the advantage priori information about the topography of the
that band-pass filtering and calculation of the components is not available. Hence, the number in
Hilbert transform of the filtered signal (by setting the lower left corner of the maps does not indicate
the corresponding Hermitian part of the signal to the channel number, but the component of the
zero) could be combined. By back-transforming corresponding spatial filter method. Consequently,
the signal in the time domain and calculating the the order in which relevant ERD components can
absolute value, the envelope of the band-pass fil- occur is widely arbitrary for ICA, while for
tered signal was obtained which was then squared ERDmax, relevant ERD components should be
to obtain power values. The reference interval was contained in the first few components, because the
89
Fig. 5. ERD/ERS maps derived from monopolar data (A) and Laplacian spatially filtered data (B). The maps are topographically
arranged. In the cases where the electrode locations are compatible with the international 10/20-placement system, the appropriate
electrode names (C3, Cz, and C4) are shown. The vertical dashed-dotted line denotes the movement onset. The dotted lines indicate the
reference period. ERD is colored in red; ERS is colored in blue.
ERDmax components are sorted according to the ICA component 18 can be easily identified as the
value of the eigenvalues that appear in the most prominent contralateral activity next to com-
ERDmax algorithm. In fact, for all 12 datasets in- ponents 3 and 21, which have a similar origin but
vestigated, the desynchronization pattern contra- show less pronounced activity. ICA component 16
lateral to the movement side, that is the pattern is the ipsilateral activity, and components 6, 22, 32,
that should be most prominent in a movement- and 34 are artifactual components. Similar results
related task like this, was almost always found in can be found displayed in Fig. 6B for the data
the first and in few cases (due to artifacts) in the spatially filtered according to ERDmax. The two
second ERDmax component. most prominent activities (both are contralateral)
Since the maps generated by ICA and ERDmax are found (as expected) in the two first maps.
do not contain information about the topography ERDmax component 7 shows ipsilateral activity,
of the sources of the components, it becomes nec- and components 3, 24, and 34 show artifactual
essary to produce this missing information in the activity. To permit a better comparison between
form of topographic maps. Eight such maps the ERD/ERS maps derived from the three differ-
depicting interesting topographic distributions or ent spatial filters, enlarged ERD/ERS maps that
being associated with components showing prom- are associated with contralateral activity for the
inent ERD/ERS are presented immediately below same subject are shown in Fig. 7A, and for a
the respective ERD/ERS maps. They can be used different exemplary dataset in Fig. 7B. The num-
to identify the topographic origin of the corre- bers in the lower left corner indicate the channel
sponding components. In Fig. 6A, for instance, numbers and the component numbers. In addition
90
Fig. 6. ERD/ERS maps derived from independent component analysis (ICA, A) and ERDmax (B). Since the components are no longer
associated directly with the physical locations of the electrodes, normalized topographic maps are provided for a selected number of
maps to help establish the source locations of these components. The number on the left-hand side of each topographic map indicates
the number of the corresponding ERD/ERS map. Color legends for the ERD/ERS maps and for the topographic maps (small color
bar) are shown.
91
Fig. 7. Enlarged ERD/ERS maps for all three spatial filters for subject S1 (A), subject S2 (B). These maps depict the most prominent
contralateral ERD for the corresponding method and subject. The channel numbers for the Laplacian maps and the component
numbers for the other spatial filter maps are shown in the lower left corners. ERD/ERS time courses for the 9–11-Hz frequency band
are shown on the far right. The color coding is done in the same way as in the previous figures.
to the maps, ERD/ERS time courses for the For the dataset used to produce the maps in
9–11 Hz frequency band are displayed. This spe- Fig. 7B, the situation was different in that both
cific band was chosen because it was in the fre- ERDmax and ICA decomposed the contralateral
quency range used for optimizing the ERDmax activity into only two components. Consequently,
spatial filter, and it showed pronounced ERD and the maps for ICA and ERDmax look very similar
ERS activity for most of the datasets analyzed. (and also similar to Laplacian). The time courses
All three maps in Fig. 7A show basically the now show that the ICA curve captures more ERD
same activity patterns over the whole frequency than the Laplacian. The ERDmax curve, however,
range. ICA produced components are very pro- still captures the greatest amount of ERD activity.
nounced in the beta range, but produced only To test if there is a significant difference for the
short-lasting ERD components in the alpha range. quantification of ERD between the three spatial
The ERD/ERS curves show that the ERD indi- filters, the grand average of the most prominent
cated for each filtering method is maximal for contralateral ERD/ERS curves for all 12 datasets
ERDmax. Unexpectedly, Laplacian filtering shows in the frequency band from 9 to 11 Hz was calcu-
more prominent ERD than ICA filtering does; lated, and is shown in Fig. 8. For each curve, the
however, this is an anomaly compared with the ERD value in the range of 0.5 s before to 0.5 s after
rest of the data analyzed. This might be explained the trigger was integrated. The resulting 12 values
in the following way. In the vast majority of the were used to test the following three null hypoth-
other datasets, ICA decomposed the contralateral eses: (1) The integrated ICA ERD value is less
activity into only two components, but in this pronounced or equal to the integrated Laplacian
particular case, three components were produced. value. (2) The integrated ERDmax value is less
Since the energy present in the contralateral pronounced or equal to the integrated ICA value.
activity was divided into three components rather (3) The integrated ERDmax value is less pro-
than only two, each component contained less nounced or equal to the integrated Laplacian
energy offering a possible explanation for the value. Using a t-test, all three hypotheses could be
reduced ERD. rejected. That is, for the 12 datasets analyzed, the
92
Fig. 8. The grand average over all 12 subjects was combined to Fig. 9. The grand average over all 12 subjects for the most
produce time courses corresponding to the 9–11-Hz frequency prominent contralateral components for the spatial filters
band for all three spatial filters from the most prominent cont- ICA** and ERDmax**. ICA** denotes an ICA spatial filter
ralateral components. constructed from pre-filtered data in the narrow frequency
band of 7–13 Hz.ERDmax** denotes an ERDmax spatial filter
constructed from pre-filtered data in the 5–40 Hz band.
contralateral ICA components showed more pro-
nounced ERD in this specific frequency range and achieved by ICA**. In fact, the same statistical
time period than the signal filtered with the small analysis of the integrated ERD values as done before
Laplacian method (p ¼ 0.032). The ERDmax com- did not reveal any statistical difference between these
ponent showed more pronounced ERD than the two spatial filters. Consequently, to maximize the
ICA component (p ¼ 0.029), and the ERDmax performance of ERDmax filters, it is important to
component showed more pronounced ERD than filter the data in the frequency band of interest.
the corresponding Laplacian signal (p ¼ 0.003).
Since the construction of the ERDmax spatial
filter involves narrow pre-filtering in the frequency Discussion
band of interest, it is interesting to know how sig-
nificant this step alone is in yielding the good The ERD/ERS maps derived from all spatial fil-
results produced by the algorithm. To test that ters investigated shows the characteristic contra-
question, ICA spatial filters denoted as ICA** lateral patterns of mu and beta ERD prior to and
were constructed from data that was filtered in the during movement followed by postmovement
frequency band of 7–13 Hz (the same band used ERS. Compared with the Laplacian, ICA, and
for ERDmax). Additionally, ERDmax** filters were ERDmax generally produced maps with more
calculated from data filtered between 5 and 40 Hz localized ERD/ERS. It is especially interesting
(the same band used for ICA). The grand averages that both ERDmax and ICA seem to decompose
of the 9–11 ERD/ERS time courses produced the ERD/ERS patterns that are contralateral to
by the application of these spatial filters for all the movement side and focused at C3 (channel 11)
12 datasets are depicted in Fig. 9. The grand in the Laplacian maps into two independent com-
average of the ICA** filtered time courses is very ponents, whereas one of the two components is
similar to the grand average of the ICA filtered localized slightly anterior to C3 and the other
time courses in Fig. 8, demonstrating that narrow more posterior to C3. The more posterior compo-
band-pass filtering is not required for ICA. In con- nent principally represents the mu ERD, whereas
trast to the result shown in Fig. 8, the grand average the anterior component pre-dominantly contains
achieved with ERDmax** is now similar to the result beta ERS. The separation of mu ERD and beta
93
procedure, while ERDmax is directly focused on To simplify the derivation, only the maximizat-
optimizing ERD/ERS. ion of ERD and the projection onto a single vector
From the practical point of view, ERD/ERS w is considered at first. That is, the task is to find a
analysis with ERDmax has the following advantages. vector w that linearly combines the input signals so
that the resulting component y, where
1. The transformation yielded by ERDmax can
be seen as optimal in terms of maximizing y ¼ wT X, (A2)
ERD or ERS. contains most of the ERD activity.
2. The order of the components is fixed. The For maximizing ERD, the minimum of the fol-
first component contains most of the ERD or lowing equation has to be found (see also Eq. (3)
ERS. in Section ‘‘Quantification of ERD/ERS’’).
3. ERDmax can be performed with all channels
AR
or a small subset of channels (e.g., the nearest E ERD ðwÞ ¼ (A3)
R
neighbors) independent from the channel
topography. with
4. The algorithm is very fast. A ¼ EfðwT Xa Þ2 g and xaij 2 S a (A4)
ICA has already been applied to the analysis of
R ¼ EfðwT Xr Þ2 g and xrij 2 S r (A5)
event-related potentials in EEG. Makeig et al.
(1997, 1999) have been shown that ICA can effec- where E fg denotes the expectation value, Sr
tively decompose multiple overlapping compo- denotes the set of all samples in the reference
nents from selected sets of related ERP averages. period, and S a is the set of all samples in the time
Little previous work has been done to investigate frame where ERD can be expected (e.g., half a
the usefulness of ICA in the analysis of oscillatory second prior and after the onset of movement
activity. A complete functional interpretation of tasks).
the different independent components was not A further simplification of the problem can be
within the scope of this study; however, it seems achieved if Xa and Xr are jointly whitened
that both ERDmax and ICA could reveal new in- (sphered). Whitening means the input matrix is
sights into the functional interpretation of brain transformed so that the rows are uncorrelated and
oscillations. The preliminary results of this study have unit variances. There are many ways to
suggest that further research on this topic should whiten the input data (Hyvarinen and Oja, 2000).
be undertaken. A possible linear whitening transform is given by
V ¼ D1=2 ET (A6)
Appendix where D ¼ diagðd 1 . . . d n Þ is the diagonal matrix of
the eigenvalues, and E the matrix whose columns
In this section, the new algorithm for finding the are the unit-norm eigenvectors of the covariance
optimal linear transformation to maximize ERD/ matrix EfXTa;r Xa;r g with xa;r
ij 2 fS a ; S r g:
ERS is derived. The goal is to find a matrix W that After this transformation, the channels are
transforms the multivariate input X into compo- uncorrelated and have unit variance. Thus, the
nents that contain maximal ERD/ERS. In matrix optimization problem can be reduced to minimizing
notation:
A ¼ EfðwT Xa Þ2 g and xaij 2 S a (A7)
Y ¼ WX (A1)
or equivalently to maximizing
Following the terminology of ICA, W is the
R ¼ EfðwT Xr Þ2 g and xrij 2 S r (A8)
unmixing matrix and Y is the transformed input
signal. The rows of Y are the components that Since only the direction of w is interesting, the con-
contain maximal ERD/ERS. straint that the norm of w is unity has to be made.
95
Then, the constraint optimization problem can be whitening transform and the largest eigenvector
formulated by the technique of the Lagrange of the covariance matrix of Xr (reference sam-
method (Luenberger, 1968): ples). The first approach gives a transformation
with minimal variance in the ERD time frame;
Lðw; lÞ ¼ EfðwT Xa Þ2 g lðjjwjj 1Þ (A9)
the latter gives a transformation with maximal
The equation consists of the part that is to be variance in the reference period. As all samples
optimized and the constraint equation gðwÞ ¼ were jointly whitened before PCA is applied, both
jjwjj 1 ¼ wT w 1 multiplied by the Lagrange approaches are equivalent. More than one com-
multiplier l. The constraint optimization prob- ponent can be easily derived by using the other
lem can be solved by setting the gradient of (A9) to eigenvectors. So, for example, the component
zero. containing second most ERD activity can be
@Lðw; lÞ derived by using the eigenvector corresponding to
¼ 2EfwT XTa Xa g 2lwT ¼ 0 (A10) the second smallest eigenvalue of the covariance
@w
matrix Xa :
or equivalently The same derivation can be made for the max-
@Lðw; lÞ imization of ERS. The only difference is that
¼ 2EfXTa Xa wg 2lw ¼ 0 (A11)
@w samples of a time frame where ERS is expected
With Ca ¼ EfXTa Xa g being the covariance matrix of have to be used (e.g., samples in a time frame of
the ERD samples (samples in the time frame where 1–2 s after movement onset). To increase the SNR,
ERD can occur), Eq. (A11) can be written as: the signal can be band-pass filtered before the
optimization procedure is applied. This would
Ca w lw ¼ ðCa lIÞw ¼ 0 (A12) result in optimal ERD or ERS time courses for a
Thus, the problem can be solved by calculating specific frequency band. Although the derivation
the eigenvalues and eigenvectors of the covariance presented here considers only one reactive time
matrix Ca : The eigenvector ws corresponding to the frame (either ERD or ERS) against the baseline
smallest eigenvalue is the solution of the problem. In activity in the reference period, a combination of
fact, the new transformation is given by combining ERD and ERS is also possible. For example, a
the whitening transform V and the eigenvector ws : combined optimization of ERD and ERS in differ-
ent frequency bands may be useful. Applying the
wTs V (A13) Lagrange method again, the problem could be
The component that contains maximal ERD can formulated as
then be calculated by
Lðw; l1 ; l2 Þ ¼ EfðwT Xb Þ2 g l1 EfðwT Xc Þ2 g
y ¼ ðwTs VÞ X (A14)
l2 ðjjwjj 1Þ ðA15Þ
In fact, the optimization problem described by
the Eqs. (A7)–(A12) is equivalent to PCA. That is, where Xb and Xc denote the samples of different
the new method can be seen as a two-step proce- time frames and/or frequency bands; l1 and l2 are
dure. First, the samples of the reference period the Lagrange multipliers. The solution of this
and the ERD samples are jointly whitened by a equation, however, does not result in a simple
whitening transform (any transform that yields eigenvalue problem as before. More sophisticated
uncorrelated data and unity variance can be used). methods like gradient descent, Newton iteration,
Second, PCA is applied to the ERD samples. The or genetic algorithms have to be applied to find a
resulting linear transformation that gives a compo- solution for this optimization problem.
nent with maximal ERD is obtained by combining The algorithm calculates two matrices: W, the
the whitening transform and the eigenvector corre- transformation matrix or unmixing matrix, and A,
sponding to the smallest eigenvalue. Since reference the inverse of W. Similar to the mixing matrix of
and ERD samples are jointly whitened, ERD max- ICA, the columns of A can be used to display the
imization can also be achieved by combining the topographic distribution of the new components.
96
Tandonnet, C., Burle, B., Hasbroucq, T. and Vidal, F. (2005) of EEG and MEG recordings. IEEE Trans. Biomed. Eng.,
Spatial enhancement of EEG traces by surface Laplacian es- 47: 589–593.
timation: comparison between local and global methods. Zygierewicz, J., Durka, P.J., Klekowicz, H., Franaszczuk, P.J.
Clin. Neurophysiol., 116: 18–24. and Crone, N.E. (2005) Computationally efficient approaches
Vigario, R., Sarela, J., Jousmaki, V., Hamalainen, M. and Oja, to calculating significant ERD/ERS changes in the time-
E. (2000) Independent component approach to the analysis frequency plane. J. Neurosci. Methods, 145: 267–276.