You are on page 1of 12

Clinical Neurophysiology 110 (1999) 787±798

Designing optimal spatial ®lters for single-trial EEG classi®cation in a


movement task
Johannes MuÈller-Gerking a,*, Gert Pfurtscheller b,1, Henrik Flyvbjerg c, d
a
HLRZ, Forschungszentrum JuÈlich, D-52425 JuÈlich, Germany
b
Department of Medical Informatics, Graz University of Technology, Brockmanngasse 41, A-8010 Graz, Austria
c
Optics and Fluid Dynamics Department, Risù Nat'l Lab, DK-4000 Roskilde, Denmark
d
The Niels Bohr Institute. Blegdamsvej 17, DK-2100 Copenhagen é, Denmark
Accepted 8 September 1998

Abstract
We devised spatial ®lters for multi-channel EEG that lead to signals which discriminate optimally between two conditions. We demon-
strate the effectiveness of this method by classifying single-trial EEGs, recorded during preparation for movements of the left or right index
®nger or the right foot. The classi®cation rates for 3 subjects were 94, 90 and 84%, respectively. The ®lters are estimated from a set of multi-
channel EEG data by the method of Common Spatial Patterns, and re¯ect the selective activation of cortical areas. By construction, we obtain
an automatic weighting of electrodes according to their importance for the classi®cation task. Computationally, this method is parallel by
nature, and demands only the evaluation of scalar products. Therefore, it is well suited for on-line data processing. The recognition rates
obtained with this relatively simple method are as good as, or higher than those obtained previously with other methods. The high recognition
rates and the method's procedural and computational simplicity make it a particularly promising method for an EEG-based brain±computer
interface. q 1999 Elsevier Science Ireland Ltd. All rights reserved.
Keywords: EEG classi®cation; Mu rhythm; Voluntary movement; Sensorimotor cortex; Human; Event-related desynchronization

1. Introduction states are event-related potentials (EP) (Farwell and


Donchin, 1988; Sutter and Tran, 1990), and localized
The study of surface EEG as a possible new communica- changes in spectral power of spontaneous EEG related to
tion channel for severely disabled persons has a long history sensorimotor processes (see e.g. Wolpaw and McFarland,
(Nirenberg et al., 1971), and has received increased atten- 1994; Kalcher et al., 1996; Pfurtscheller et al., 1997).
tion recently (e.g. Farwell and Donchin, 1988; Sutter and It is well known that planning and execution of move-
Tran, 1990; Wolpaw and McFarland, 1994; Kalcher et al., ment leads to a short-lasting and circumscribed attenuation
1996; Pfurtscheller et al., 1997). The EEG allows the obser- known as event-related desynchronization (ERD,
vation of gross electrical ®elds of the brain, and re¯ects Pfurtscheller and Aranibar, 1979) of rhythmic EEG compo-
changes in neural mass activity associated with various nents in the alpha and beta band (Gastaut, 1952; Chatrian et
mental processes. Since some mental processes result in al., 1959; Kuhlman, 1978; Pfurtscheller et al., 1996). In the
distinguishable EEGs, a person that can produce such case of ®nger or hand movement, the desynchronization
mental processes at will, has the potential to use them for starts in the contralateral sensorimotor cortex during the
communication. The feasibility of this communication planning phase and stays asymmetrical over both hemi-
depends on the extent to which the EEGs associated with spheres until movement onset. Recordings from subdural
these mental processes can be reliably recognized automa- electrodes show similar behavior, but the responses are
tically. The electro-physiological phenomena investigated more localized and the changes in spectral power are
most in the quest for an automatic discrimination of mental much enhanced (Toro et al., 1994).
The ERD of mu and central beta rhythms can be seen as a
* Corresponding author. Tel.: 1 49-2461-612-318; fax: 1 49-2461- correlate of exited or activated sensorimotor areas, where
612-430. thalamo-cortical information exchange and processing takes
E-mail address: j.mueller-gerking@fz-juelich.de (J. MuÈller-Gerking) place (Steriade and Llinas, 1988). An interesting observa-
1
Supported in part by the Austrian `Fonds zur FoÈrderung der
tion is that at the same moment in time, different cortical
wissenschaftlichen Forschung', project P11208MED.

1388-2457/99/$ - see front matter q 1999 Elsevier Science Ireland Ltd. All rights reserved. CLINPH 98577
PII: S 1388-245 7(98)00038-8
788 J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

areas can display focal attenuated (ERD) and focal of a committee, the size of which is chosen for optimal
enhanced mu and beta components. The latter phenomenon performance of the committee. Classi®cation is decided
is known as event-related synchronization (ERS) and may by a `vote' among the committee members. This classi®er
be seen as a correlate of active inhibited or deactivated works well only when some channels have a high signal-to-
cortical areas (Pfurtscheller, 1992). It may be hypothesized noise ratio. The channels left out of the committee are those
that the structure controlling the patterns of simultaneous that bring more noise than signal to the decision-making. In
cortical desynchronization and synchronization, and hence, a situation of low signal-to-noise ratios in all channels, this
the gating of the thalamo-cortical information transfer, is the method will fail, even if a high signal-to-noise ratio can be
reticular thalamic nucleus (Yingling and Skinner, 1977). In achieved by spatial ®ltering.
the case of hand movement, ERD can be found over the The possibility to achieve better signal-to-noise ratios by
hand area and ERS over the foot area. Foot or toe movement making use of the inherent correlation between neighboring
can result in a foot area ERD and simultaneously in a hand channels has so far not been exploited. On the contrary, one
area ERS (Pfurtscheller et al., 1996). The observation of of the commonly extracted features, spectral properties from
simultaneously attenuated and enhanced EEG rhythms can the individual time-series, no longer contain this a priori
be used to classify brain states related to the planning or information.
even imagination of different types of limb movements. The method we advocate here is based on a decomposi-
Recently, imagined movements of the right and left hand tion of the raw signals into spatial patterns that are extracted
were classi®ed correctly in 80% of single trials in 3 trained from the data of two populations of EEGs in a manner that
subjects (Pfurtscheller et al., 1997). maximizes their differences. These spatial patterns provide
Such imagination prompted changes of sensorimotor a weighting of the electrodes, which is derived directly from
rhythms have been suggested as a possible means to re- the data. We will show how these patterns re¯ect the under-
establish communication in patients with severe motor lying physiological processes.
disturbances (Wolpaw et al., 1991; Wolpaw and McFarland, The method used for the extraction of the patterns from
1994). In order to be useful in practical applications, such a the data is based on the method of Common Spatial Patterns
system must achieve close to 100% classi®cation accuracy. (CSP) which was introduced in the ®eld of EEG analysis by
In many of these experiments, the EEG is recorded on a Koles et al. (1990). They used the method to classify normal
multitude of channels placed in a dense grid covering large versus abnormal EEGs (Koles et al., 1994), to both extract
parts of the brain. Given that the sensorimotor rhythms abnormal components from EEGs (Koles, 1991), and to
originate from very localized areas in the cortex, we expect localize sources (Koles et al., 1995; Soong and Koles,
that not all signals recorded from different sites contribute 1995).
the same amount of information to the classi®cation, and In brief, this method takes as input two sets of spatial
some may only contribute noise. The central electrodes patterns representing two classes into which other sets of
overlying primary sensorimotor areas will be most impor- spatial patterns are later to be classi®ed. Below, an element
tant for discrimination. Since the skull and the scalp cause a in such a set, a spatial pattern will be the amplitudes of an N-
spatial smearing of the cortical signals, electrodes close to channel EEG at a given instant in time. A set of patterns may
sensorimotor areas will also contain relevant information. consist of the T spatial patterns that make up the EEG of a
However, with increasing distance from sensorimotor areas, single trial, recorded at T consecutive points in time. Or a set
the recorded signal will be increasingly contaminated by may consist of the union of recordings from several trials.
cortical activity unrelated to the movement to be discrimi- The latter is the case for the sets used to calibrate the
nated. Two consequences arise from this situation: ®rst, the method, that is, the sets of spatial patterns from which the
signals from different electrodes have to be weighted in method extracts the spatial features later used for classi®ca-
some way, in order to re¯ect their relevance for the classi- tion. The former is the case when the EEG of a single trial is
®cation task. Second, the correlations between signals from to be classi®ed. In either case, we note that time is no more
neighboring electrodes can be used to suppress the noise in than an index distinguishing different patterns recorded at
individual channels. different times. Due to the temporal high-pass ®ltering, the
The weighting problem has been attacked mostly with ad signals have zero mean. Therefore, average patterns are
hoc procedures, i.e. the features derived from an electrode unsuited to distinguish the classes and covariances have to
are weighted or selected not by a criterion determined from be used instead.
the data directly, but a posteriori by their importance for the As output, calibration of the method gives an ordered list
classi®er. This is done, for example, by the modi®cation to of characteristic spatial patterns. These characteristic
the Learning Vector Quantization scheme (LVQ) intro- patterns de®ne directions in pattern space that are optimally
duced by Pregenzer et al. (1994), in order to render it suited for distinguishing between the two classes. A time-
distinction sensitive (DSLVQ). Another example is Peters series of patterns that belongs to either one or the other
et al. (1997), who trained neural networks on the AR coef®- class, will, after an appropriate transformation, scatter maxi-
cients of each individual channel. Those networks that mally along the ®rst direction and minimally along the last,
perform best on a validation set of trials become members if it belongs to the ®rst class, and vice versa if it belongs to
J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798 789

Table 1 comprising the movements of left index ®nger (in the sequel
The number of artifact-free trials analyzed below, for each of 3 subjects, denoted by `L') right index ®nger (`R'), and toes (`F'). The
A4, B6, and B8 and 3 movements, L, R, F, denoting respectively left index
®nger, right index ®nger, and right foot
recordings were checked for artifacts, leaving a total of 223,
173, and 124 trials for the 3 subjects, (Table 1).
Subject L R F Total The EEG was recorded monopolarly from 56 Ag/AgCl
scalp electrodes against a reference electrode ®xed on the tip
A4 73 73 77 223
B6 55 54 64 173 of the nose. The electrodes were arranged in a grid of 2.5 cm
B8 37 48 39 124 spacing, covering the pre- and post central areas. The EEG
was ®ltered between 0.15 and 60 Hz, and sampled at 128
Hz. In addition, horizontal and vertical electro-oculograms
the other class. The second and the second-to-last directions (EOGs) were recorded to detect eye-movements. For further
in the list are the second best directions for this discrimina- details about the experiment and recordings, see Pregenzer
tion, and so on for the other directions in the list. Given an et al. (1994), Pfurtscheller et al. (1994a,b).
EEG, one then treats it as a set of T spatial patterns, projects
these onto the most discriminatory characteristic patterns, 2.2. Data analysis
and calculates the variance of the T values resulting from
each projection. These variances are the features on which 2.2.1. The method of Common Spatial Patterns
classi®cation of the EEG is based, using a simple linear A classi®cation task is usually broken up into two main
classi®er. Typically only a few directions in pattern space, parts. The ®rst part is the extraction of relevant features that
i.e. a few characteristic patterns, are suf®cient for the discri- capture the class-invariant characteristics from some trial.
mination task, so these patterns may be thought of as spatial The second part is the classi®cation proper, performed on
®lters that select the most relevant spatial aspects for the the extracted features. The core of the classi®cation method
discrimination task. We thus obtain a drastic reduction in that we propose here is that the feature extraction is realized
the dimensionality of the problem while making use of the by projections of the high-dimensional, spatial-temporal
information contained in all channels. raw signals onto very few speci®cally designed spatial
The goal of the work presented in this paper was to ®lters. These ®lters are designed in such a way that the
pioneer the application of optimal spatial ®lters to the task variances of the resulting signals carry the most discrimina-
of single trial EEG classi®cation. The data used were tive information. The adjunct of the ®lters are called
recorded during the planning phase of 3 different types of Common Spatial Patterns, and they are obtained from a
movement; left and right index ®nger movement and right set of calibration data by the method of CSP.
foot movement. Some of the results presented here were The features for the classi®cation proper are vectors
reported in preliminary form in MuÈller-Gerking et al., whose elements are the variances of the projected signals.
(1997). These feature vectors are ®nally classi®ed by simple linear
or quadratic Bayes classi®ers, whose parameters are
obtained again from the same set of calibration data, after
2. Methods and materials projection onto the CSPs obtained in the ®rst step.
An intrinsic limitation of the method of CSPs is that the
2.1. Experiment and data design of the most discriminative spatial ®lters is possible
only for the distinction between two conditions. So, we will
The experimental protocol followed a classical memor- ®rst consider only the design of optimal ®lters for this case,
ized delay task. Three subjects were asked to perform one of and defer the extension to the classi®cation of 3 and more
4 movements (pressing a micro-switch with the left or right conditions to the end of the next subsection.
index ®nger, ¯exing the toes of the right foot, or moving the We begin with two groups A and B of recordings of
tongue to the upper gum) after a series of stimuli. Each trial relevant EEG data, represented as two groups of matrices.
started with a short warning tone (warning stimulus, WS). The rows of each matrix contain the signals on one record-
One second after a WS, a visual cue (CUE) appeared on a ing electrode, while the columns contain the recordings of
computer screen in front of the subject, indicating which all electrodes at some particular point in time. The dimen-
movement was to be done. The movements asked for sion of each matrix is, therefore, N £ T, with N the number
were chosen at random among the 4 movements studied. of channels and T the number of samples in time. In our
The actual movement was to be performed only after a case, we take the recording on 56 electrodes during the 500
third, acoustic stimulus (reaction stimulus, RS) which ms interval immediately preceding movement onset, or 64
occurred 2 s after the cue. Reaction times between RS and samples. Let V i denote such a matrix containing the raw data
the actual movement were broadly distributed between 0.75 of trial i. The method of CSPs now ®nds a decomposition of
and 1.75 s. A total of 150 trials of each class of movement the two groups of recordings into modes that are common to
were recorded in blocks of 50 trials, at inter-trial intervals of both groups, but maximally suited to distinguish between
about 12 s. Here, we only use a subset of these data, the groups. Mathematically, the method relies on the simul-
790 J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

taneous diagonalization of two matrices closely related to


the covariance matrices (Fukunaga, 1990). To summarize
brie¯y, for each of the two groups the columns of all
matrices are pooled to form two origin-centered point
clouds in an N-dimensional space. It is always possible to
®nd a linear transformation that renders a point cloud isotro-
pic. This transformation is called the `whitening transforma-
tion'. It is composed of a change of coordinate frame to the
frame de®ned by the point cloud's principal axes, followed
by stretching/compression of the cloud along the directions
of these axes by amounts changing all of its principal
moments to unit value. The essence of the CSP method
can now be stated as follows: if the transformation that Fig. 2. The most important common spatial patterns for the distinction right
whitens the union of the two point clouds is applied to the ®nger movements (left column) versus right foot movements (right
individual clouds, the two resulting clouds will have the column). Same notations as in Fig. 1. Right ®nger movements show
same principal axes, and their corresponding eigenvalues comparatively high EEG amplitudes over mid-central areas (left column,
bottom), while movements of the right foot are characterized by enhanced
will add up to the value one. This is easily seen: since the
amplitudes over C3.
whitening transformation is a linear operation, it commutes
with summation of the two point clouds. So since the
whitening transformation turns their union into an isotropic suited to project data on, when the task is to discriminate
distribution, the union of the individually transformed point whether the data has the statistics of one cloud or the other.
clouds is also isotropic. This is only possible if the two So an additional rotation onto these directions yields the
individually transformed point clouds complement each ®nal transformation sought. For a detailed mathematical
other as just described. discussion see Appendix A. The decomposition of a trial
In consequence, the directions with the largest and the V i can be written as
smallest eigenvalues of one of the point clouds will be those h i21
with the smallest and largest eigenvalues for the other cloud. V i ˆ P² Z i …1†
Similarly, the directions with the m largest eigenvalues of
one of the point clouds will be those with the m smallest
that is, each raw EEG V i is represented as a linear combina-
eigenvalues for the other cloud, and vice versa. So, in these
tion of time-invariant modes, the CSP in the N columns of
directions we see the largest differences between the proper-
matrix [P ²] 21, with the new time series Z i as expansion
ties of the two clouds, i.e. these directions are optimally
coef®cients. [P ²] 21 is a square N £ N matrix, and Z i has
the same dimensions as V i, namely, N £ T. As described
above, the method of CSPs determines the modes in such a
way that the variance of the ®rst row of Z i is maximal for the
trials of group A and at the same time minimal for the trials
of group B. On the other hand, the variance of the last row of
Z i is minimal for the trials of group A, and at the same time
maximal for the trials of group B. Similar, but weaker prop-
erties characterize the variance of the ®rst and last few rows
of Z i. This property of the decomposition into CSPs ensures
that the variances of the ®rst and last rows of Z i contain the
most relevant information for discriminating between the
two conditions A and B.
Note that the variance of a row of Z i can be high only if
the spatial distribution of the raw signals' amplitudes is
similar to the corresponding pattern, or mode. Since the
CSPs are obtained from movement-related EEG, these
patterns themselves provide information about the charac-
Fig. 1. The most important (upper row) and second most important (lower
teristic EEG amplitude distribution relative to a comparison
row) common spatial patterns for the distinction of left ®nger movement
from right ®nger movement. Data set B6, ®ltered to 8±30 Hz. Left ®nger of two types of movement. Examples of the most discrimi-
movement leads, when compared with right ®nger movement, to enhanced native CSPs for the pairwise comparison of ®nger and foot
EEG amplitudes over C3, where the right hand representation is located movements are shown in Figs. 1±3. See Appendix A and the
(left column, top). Similarly, right ®nger movement leads to increased EEG ¯ow-diagram in Fig. 4 for a detailed mathematical discus-
amplitudes over the left hand representation (right column). See Section 4.1
sion.
for further explanation.
J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798 791

trial is counted as `L' only if both the L-R and the L-F
classi®cation returns `L'.

2.2.3. Cross-validation statistics


In order to test the method optimally on the data available
to us, we performed the usual cross-validation procedure,
i.e. we repeatedly separated the whole data set into calibra-
tion- and test-sets, and classi®ed the data from the test set
with the classi®er obtained from the calibration set. A set of
calibration data was selected at random from the whole data
set, using about 80% of the data for this purpose, and the
projection matrix P ² was calculated from this subset. More
Fig. 3. Same as Fig. 2, but for the distinction left ®nger movements versus precisely, we randomly selected 45 trials in the case of
foot movements. In this comparison, right foot movements lead to higher subjects A4 and B6 (15 trials from each movement class),
EEG amplitudes over C4, while movements of the left ®nger shows higher and 30 trials in the case of subject B8 for testing. From the
amplitudes over mid-central areas, as did right ®nger movement in Fig. 2, remaining trials we randomly selected the maximal number
where that was the ®nger movement to be distinguished from foot move-
ment. See Section 4.1 for further explanation.

2.2.2. Classi®cation using the method of CSP


The features we use for classi®cation are obtained from
the variances of the ®rst and last m rows of the expansion
coef®cients Z i, since, by construction, they are the best
suited to distinguish between the two conditions. From a
set of reference or calibration data for two conditions we
calculate the projection matrix P ² as described above. From
this matrix we retain the ®rst and last m rows. They are our
spatial ®lters onto which all reference data are projected. Let
varpi be the variance of the p-th row of Z i, i.e. the variance of
the expansion to mode p. The feature vector for trial i is
composed of the 2 m variances varpi for p running from 1...m
and from N 2 m 1 1¼N, normalized by the total variance of
the projections retained, and log-transformed,
0 1
B C
B varip C
fpi B
ˆ logB 2m C …2†
C
@ P i A
varp
pˆ1

The transformation to logarithmic values is done in order to


make the distribution of the elements in f i normal. The
feature vectors from the calibration data are used to estimate
the parameters of a linear (or quadratic) Bayesian classi®er
(Fukunaga, 1990). To classify new data, we again obtain the
feature vector Eq. (2) for the new data using the spatial
®lters obtained from the calibration data, and feed this
feature vector into the classi®er. See Appendix B and Fig.
5 for more details on the classi®cation procedure.
The method of CSPs only works for the distinction of two Fig. 4. Illustration of how the classi®ers are obtained from the calibration
groups of data. For 3 and more conditions, we therefore data in two steps. The ®rst step obtains the most discriminating spatial
perform independent pairwise classi®cations between all ®lters from two sets of calibration data by the method of CSPs (a). These
conditions in the way described above. In our case, we spatial ®lters are subsequently used to transform the same calibration data
into feature space. The two sets of feature vectors obtained in this way are
classify L versus R, L versus F, and R versus F. A trial is
used to build a linear or quadratic classi®er (b). Mathematical details are
recognized for some of the classes, only if that class explained in Appendix A. The complete classi®er consists of the spatial
obtained the majority in all pairwise classi®cations. Other- ®lters (*) and the classi®ers for the feature vectors (**). Both are obtained
wise, the trial is classi®ed as indecisive. So, in our case a from the same set of calibration data.
792 J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

distant next-nearest neighbors. Such electrodes in the mono-


polar montage that do not have 4 equidistant neighbors
(boundary electrodes) are omitted from further processing.
Therefore, in the case of the small Laplacian derivation, we
obtain 30 re-referenced electrodes from the original 56 ones.
In the case of the large Laplacian, ten channels remain.
Projection to spatial patterns obtained from all monopo-
larly recorded channels amounts to a common average refer-
ence, since the patterns are reference-free up to a constant.
Alternatively we also applied the CSP method to data that
have ®rst been transformed to a Laplacian derivation.

2.2.5. Temporal ®ltering


The common spatial patterns are sensitive only to the
spatial distribution of the variances of the EEGs from the
two populations; all time structure has been lost by the
averaging. The only information contained in those patterns
Fig. 5. The process of classi®cation of some EEG recording. After temporal
is where the variance of the EEG varies most when compar-
®ltering of all individual channels, the spatial-temporal recording is ®ltered ing two conditions. Given the organization of the cortical
spatially by scalar multiplication with the 2m most discriminative spatial motor areas, this is the most relevant information when we
patterns (*), obtained in step (a) of the classi®er design procedure (see Fig. try to distinguish the types of movement in our experiment:
4). We obtain the feature vector by calculating the log of the normalized we expect the ®nger movements to mainly activate the
variances of the thus obtained expansion coef®cients. This 2m dimensional
feature vector is classi®ed using the linear or quadratic classi®er (**) from
respective contralateral hand area, while foot movement
step (b) of the classi®er design procedure. For the discrimination of 3 should mainly activate the medial foot area. However, as
classes, use the same scheme for each pair of conditions. already stated, the effects of these activations on the
measured EEG are not re¯ected to the same degree in all
frequency bands. In order to make use of this knowledge, we
of trials for calibration in such a way that the calibration sets worked with data which had been ®ltered to various
were of equal size for all classes. The surplus trials were left frequency bands. With ®ltered data, the common spatial
out of this run of the cross-validation. All data were then patterns provide insight into how the typical spatial distri-
transformed into feature space, a classi®er was built on bution of EEG variance looks for speci®c frequency bands.
calibration data in feature space, and the test data were Classi®cation accuracy was estimated for data ®ltered in 6
classi®ed. The classi®cation rates reported here were aver- different frequency bands: a (8±12 Hz), lower a (La, 8±10
aged over 50 repetitions of this procedure. Hz); upper a (Ua, 10±12 Hz); b (19±26 Hz); g (38±42 Hz),
and a broad band of 8±30 Hz.
2.2.4. Reference electrode The ®lters we use are 25-point wide ideal-band-pass
An important point in EEG analysis is the question of the ®lters, which are convoluted with the signals (Percival and
reference electrode. Pfurtscheller reports that the desyn- Walden, 1993). These ®nite impulse response ®lters are
chronization of sensorimotor rhythms is enhanced and causal, i.e. they use only data prior in time. Times given
best localized when using the surface Laplacian derivation below are relative to the un®ltered recordings.
(Pfurtscheller, 1988). Kuhlman already reported that m
activity was rarely seen in referential recordings and that 2.2.6. Implementation
closely spaced bipolar derivations should be used instead All analysis and classi®cation routines were implemented
(Kuhlman, 1978). Most recently, McFarland et al. (1997) in LISP using Tierney's freely available XLISP-STAT
compared different electrode references, performance in program (Tierney, 1990). For ®ltering, we used slightly
EEG classi®cation, and found either common-average- modi®ed routines from Percival and Walden's SAPA pack-
reference or large Laplacian derivation to yield the best age (Percival and Walden, 1993).
results.
The common-average-reference is obtained by re-refer-
encing each electrode to the mean of all electrodes. The 3. Results
Laplacian derivation is calculated by Hjorth's method
(Hjorth, 1975), i.e. each electrode is re-referenced to the In this paper, we concentrate on the time segment 0.5±1 s
average over its 4 nearest neighbors, assuming the electro- after RS. For most trials, this segment immediately precedes
des are arranged in the pattern of a square lattice, as in the the actual movement; in some trials, movement actually
®gures below. In contrast to this small Laplacian, the happens within this time window. Classi®cation rates as
large Laplacian is obtained by re-referencing to the 4 equi- functions of experimental time will be presented else-
J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798 793

Table 2 show a ¯at region of EEG signal over the left cortical areas,
The recognition rates (%, rounded) and sample standard deviations (in and strongly enhanced activity on the right hemisphere, in
parentheses) for 3 types of movement, based on features from the 4 most
important common spatial patterns. Data with common reference ®ltered
the one pattern frontal to C4 (right column, top), in the other
into different frequency bands. Bayes linear classi®er more parietal (right column, bottom). The second most
important pattern for condition L is less conclusive. In
Subject a La Ua b g 8±30 Hz brief, when comparing the EEG amplitudes for the condi-
A4 89 (4) 84 (6) 90 (4) 79 (5) 64 (7) 91 (4) tions L versus R, we ®nd the characteristic patterns to show
B6 82 (5) 81 (5) 81 (5) 83 (4) 73 (7) 90 (4) enhanced amplitudes over the respective ipsilateral somato-
B8 69 (7) 55 (9) 73 (8) 60 (8) 35 (9) 82 (6) sensory representations.
In the distinction of condition R to condition F, shown in
Fig. 2, the most important pattern for R shows a diffusely
where (MuÈller-Gerking, Pfurtscheller, and Flyvbjerg, in increased amplitude over the right hemisphere shifted
preparation). towards parietal areas (left column, top), while the second
pattern shows strongly enhanced EEG amplitude over mid-
3.1. Common Spatial Patterns central sites, with a maximum at Cz (left column, bottom).
The corresponding ®rst pattern for condition F shows highly
Figs. 1±3 show the most important common spatial enhanced amplitude over the left somatosensory area with a
patterns for the pairwise distinction of all types of move- maximum at electrode C3 (right column, top). The second
ment. Fig. 1 presents the four most distinctive patterns for pattern of condition F is less conclusive (right column,
the discrimination of left versus right ®nger movements, bottom). So, in the comparison R versus F, movement of
Fig. 2 shows those for the discrimination of right ®nger the right index ®nger is characterized by enhanced EEG
versus foot movement, and Fig. 3 indicates those for the amplitude over central or right post-central areas, while
distinction left ®nger versus foot movements. These patterns movement of the toes is characterize by increased ampli-
were obtained with all data from subject B6, time-segment tudes over the left hand representation in sensorimotor
0.5±1 s after RS, and ®ltered in the wide range 8 to 30 Hz. areas.
The area shown is covered by the 56 recording electrodes, Fig. 3 concludes this series with the comparison L±F. The
which are arranged in a rectangular grid, 11 in each row. most important patterns in the top row show the mirrored
The third row of electrodes (from bottom) comprises the behavior to the R±F patterns in Fig. 2: movement of the left
central electrodes; the missing frontal electrodes are left ®nger results in highly enhanced EEG amplitude over the
blank. Each pattern has electrode positions marked with central area, while movement of the toes is characterized by
small squares, except for 4 electrode positions marked greater EEG amplitudes near and over C4. The second most
with larger black dots. The latter 4 positions correspond to important patterns show only little modulation. Note, that
C3, Cz, C4 and Fz in the international 10-20 system. The since the quantity of interest is the variance of the raw
contour maps have been obtained by interpolation. Within signals projected onto the respective pattern, the sign of
each pattern, the zero line is rarely crossed, and then only by the modulation does not play any role; the only relevant
very small amounts. Due to this, and the fact that the sign of feature is the magnitude and location of the modulation.
the pattern is irrelevant, the textures representing the isova- The patterns obtained from data of subjects A4 and B8 are
lues were chosen symmetrical around zero. This gave qualitatively the same.
clearer pictures. The top row of each ®gure shows the Patterns obtained from the data ®ltered to the lower a ,
most important pattern for the conditions to be discrimi- upper a , and b bands all show the same general behavior,
nated (corresponding to ®rst and last column of matrix justifying our use of the wide frequency range 8 to 30 Hz.
[P ²] 21 in Eqs. (1) and (14). The second row shows the
second most important patterns (second and second to last 3.2. Classi®cation results
columns of matrix [P ²] 21) The left column in each ®gure
shows the two patterns that are most important for the char- 3.2.1. In¯uence of the frequency band
acterization of the ®rst condition, the right column shows Table 2 shows the mean recognition rates (in percent,
the patterns most important for the characterization of the rounded) obtained for discrimination of 3 types of move-
second condition. ment in different frequency bands. Values in parentheses are
In Fig. 1, we see that the most important pattern for sample standard deviations over 50 repetitions of the cross-
distinguishing condition L from condition R ± i.e. the validation procedure (see Section 2.2.3). Features are from
upper left pattern ± shows a ¯at region over the right hemi- projections of the ear-referenced data to the 4 most impor-
sphere, where the sensorimotor representation of the left tant spatial patterns (m ˆ 2 in Eq. 2). In most cases, this led
hand is located and a strong spot of increased EEG activity to best performance. The best results (91 ^ 4, 90 ^ 4 and
right over C3, where the sensorimotor representation of the 82 ^ 6% for subjects A4, B6 and B8, respectively) were
right hand is approximately located. The two most impor- obtained with a broad band-pass of 8±30 Hz. In the case of
tant patterns for distinguishing condition R from condition L subject A4, however, the information contained in the upper
794 J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

Table 3 improvement compared with the simple method is sensible


The recognition rates (%, rounded) and sample standard deviations (in for subjects A4 and B8.
parentheses) for 3 types of movement, based on features from the 4 most
important common spatial patterns. CSP method on data with common
reference (column CSP-CR), small Laplacian derivation (CSP-SL) and 3.2.3. In¯uence of the numbers of patterns
large Laplacian derivation (CSP-LL). Columns VAR-SL and VAR-LL The main free parameter of the classi®cation scheme is
list recognition rates with variances on channels C3, Cz and C4 as features, the number of projections to common spatial patterns used
referenced to small and large Laplacian, respectively. Bayes linear classi-
®er
to build the feature vector. Table 5 lists the mean scores for
all subjects and two frequency ranges, using different
Subject CSP-CR CSP-SL CSP-LL VAR-SL VAR-LL numbers of common spatial patterns: from 2 £ 1 to 2 £ 5
(m ˆ 1 to m ˆ 5). These results show that it barely matters
A4 91 (4) 94 (4) 94 (3) 87 (4) 88 (4)
B6 90 (4) 90 (4) 89 (4) 74 (6) 85 (5) how many patterns are used when the number falls in the
B8 82 (6) 84 (7) 79 (7) 66 (8) 77 (7) range shown. But m ˆ 2 and m ˆ 3 seems a slightly better
choice than the other values shown.

alpha band (10±12 Hz) leads to classi®cation results almost 3.2.4. Linear versus quadratic Bayes classi®er
as good as those obtained in the 8 ±30 Hz band (90 ^ 4 vs. We also tested a Bayes quadratic classi®er against the
91 ^ 4%). In the case of the two other subjects taking the linear one discussed so far. Classi®cation successes turned
broad band clearly improves the result. out to be almost identical between the two, with the linear
classi®er giving the better result with few exceptions.
3.2.2. In¯uence of the electrode reference Therefore, we give no results for the Bayes quadratic clas-
Table 3 shows a comparison of ®ve classi®cation schemes si®er here.
for the data ®ltered in the wide band 8±30 Hz. The ®rst
column (CSP-CR) repeats the last column of Table 2 for
ease of comparison. Column CSP-SL lists the recognition 4. Discussion
rates obtained with the CSP method for data after applica-
tion of the small Laplacian ®lter, while column CSP-LL 4.1. Spatial patterns
reports these rates for the case of data ®ltered by the large
Laplacian. In order to compare our method with a simple The most discriminative spatial patterns shown in Figs.
alternative, columns VAR-SL and VAR-LL list the classi- 1±3 are easily related to the ERD patterns found with move-
®cation rates obtained with the logarithm of the variances in ment preparation and execution (e.g. Pfurtscheller et al.,
the band 8±30 Hz of channels C3, Cz, and C4 as features in a 1996). For example, Fig. 1 shows that movements of the
linear classi®er, after application of the small and large left index ®nger as compared with the movements of the
Laplacian, respectively. This corresponds approximately right index ®nger are characterized by increased EEG activ-
to the approach taken by McFarland et al. (1997). Overall, ity on the electrodes overlying the ipsilateral hand represen-
the CSP method on data pre-®ltered by the small Laplacian tations of sensorimotor cortex, and vice versa for right ®nger
performs best, followed by the CSP method applied to the movements.
common referenced data. When we take the variances of the Stated equivalently, movements of an index ®nger lead to
channels overlying sensorimotor areas as features, we see comparatively little EEG amplitude over the contralateral
that indeed the large Laplacian references leads to sensible hand representation within sensorimotor cortex. Comparing
improvements over the small Laplacian, as has been found ®nger movements to movements of the toes, we see that
by McFarland et al. (1997). Taking the CSPs as additional ®nger movement is now characterized by comparatively
spatial ®lters improves the classi®cation rates by 5±7% over high EEG amplitudes over mid-central areas, while foot
the simple alternative. movements lead to high EEG amplitudes over the right
Similarly, Table 4 shows the recognition rates for the (left) sensorimotor cortex, when compared with movements
same classi®cation methods as in Table 3, but for move- of the left (right) index ®nger. Turned around, we see that
ments L and R only. Here, columns VAR-SL and VAR-LL foot movement leads to comparatively little EEG activity
are based on the variances of channels C3 and C4 only. The over mid-central areas, and ®nger movements are character-
ized by decreased EEG amplitudes over the respective
Table 4 contralateral areas. It is also seen that ®nger movements
Same notations as in Table 3, but for two-class (L-R) discriminations. lead to more localized patterns than movements of the
VAR-SL and VAR-LL are based on channels C3 and C4 only foot. This is due to the fact that hand representations in
Subject CSP-CR CSP-SL CSP-LL VAR-SL VAR-LL sensorimotor areas are found on the cortical surface, while
foot representations are deep in the medial ®ssure, leading to
A4 92 (5) 94 (5) 93 (4) 86 (6) 87 (6) a wider spreading over the scalp of these cortical signals.
B6 94 (4) 93 (4) 92 (5) 84 (6) 90 (4)
These patterns are the basis for our feature extraction.
B8 86 (7) 88 (7) 82 (9) 72 (10) 81 (7)
Used as spatial ®lters, these patterns lead to new signals
J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798 795

Table 5
The recognition rates as function of the number of common spatial patterns (®rst column) for two frequency ranges. Data referenced to small Laplacian.

# A4 Ua A4 8±30 Hz B6 Ua B6 8±30 Hz B8 Ua B8 8±30 Hz

2 £ 1 91 (4) 93 (4) 78 (5) 89 (4) 73 (8) 82 (8)


2 £ 2 92 (4) 94 (4) 81 (5) 90 (4) 75 (8) 84 (7)
2 £ 3 93 (4) 94 (4) 82 (5) 90 (4) 74 (6) 82 (6)
2 £ 4 93 (3) 94 (4) 81 (5) 88 (4) 73 (8) 81 (7)
2 £ 5 93 (3) 94 (4) 80 (5) 87 (4) 72 (8) 81 (6)

that are linear combinations or weighted averages of the of the EEG is of almost no importance as long as we restrict
recordings from all electrodes in such a way that the ampli- the signals to the broad frequency band of 8±30 Hz. Inspec-
tudes of the new signals vary maximally between pairs of tion of the EEG data, however, demonstrated that in all 3
conditions. The weighting coef®cients are obtained automa- subjects, the desynchronization was present in both the
tically from the calibration data with respect to the impor- alpha and beta bands. Therefore, a good classi®cation result
tance of each electrode in discriminating two conditions. for the broad band could be expected. This broad band
Compared with other methods of selection and weighting desynchronization is also the reason why almost as good
of electrodes, the method proposed here does not need their recognition rates were observed with the simple alternative
extra classi®cation step used for ®nding the relevant elec- method and only two or 3 channels. It must be kept in mind
trodes. The results, however, are compatible. Comparing, that many subjects do not display a signi®cant ERD in the
for example, our Fig. 1 with Fig. 3 in Pregenzer et al. alpha or beta band. The fact that the quadratic classi®er
(1994), we see that the latter is similar to a merge of the performed slightly worse than the linear one, suggests that
most important CSPs for the discrimination of the two ®nger the data are very well separable by a linear separator. More
movements. While ®nding the same areas important for sophisticated classi®cation schemes, such as neural
classi®cation, our method also ®nds the classi®cation rele- networks, will most probably not only introduce unneces-
vant features of the signal from these areas. sary complications and computational demands, but will
often also perform worse, because of the bias-variance
4.2. Recognition rates dilemma. Comparison of a linear classi®er with a neural
network based classi®er, both applied to single-trial EEG
The recognition rates that we obtained are better than data recorded during right and left hand movement, revealed
those obtained previously with other methods. In the case similar results (Lugger et al., 1998).
of discrimination of 3 brain states (preparation of left hand, The method we propose here is more complicated than
right hand and foot movement) we found a classi®cation the simple alternatives considered in Tables 3 and 4.
rate of 84±94% for the 3 subjects (randomness level for However, there is ®rstly a sensible gain in recognition accu-
the discrimination of 3 states is 33.3%). In a similar para- racy. Second, the increased effort does not induce longer
digm with feedback presentation, Kalcher et al. (1996) processing times for classi®cation. Once the characteristic
reported a classi®cation accuracy of 60% for 3 class discri- patterns are calculated, the only additional processing
minators and 3 bipolar EEG channels. Differentiation required are a few scalar products, which any signal proces-
between two brain states (preparation of left vs. right hand sing hardware can perform in real time. A software solution
movement increased the classi®cation rate to 78±88% would induce at most a minimal delay. In particular for two-
(Pregenzer et al., 1994). This comparison between our class discriminations, e.g. right versus left ®nger move-
results obtained with spatial pre®ltering and linear discrimi- ments, the additional effort is negligible. Thus, this method
nation and the results obtained with single-trial band power seems quite a promising ingredient for a system for online
estimates and DSLVQ algorithm (Pregenzer et al., 1994) for classi®cation. Furthermore, the method is highly robust to
a two-class problem shows that higher classi®cation rates changes in its parameters. Therefore is no need for ®ne-
can be achieved with the spatial ®ltering technique. tuning to a particular data set, a fact which suggests the
We have to acknowledge, however, that there is a high use of this method as a standard tool.
variability of the classi®cation rates between subjects, and The only problem that we encountered using our method
even within the same dataset, e.g. when varying the time is connected with the estimation of the covariance matrices.
segments' length or the segments' position in time. A fair Evidently, these matrices are crucial for the derivation of the
comparison of methods is, therefore, dif®cult in general, and spatial ®lters. So far, we only used the sample covariance as
a comparison of recognition rates alone will not provide full estimator. This estimator is non-robust with unbounded
insight into some methods' capabilities. With this reserva- in¯uence function, which basically means that a single
tion, we believe that the comparison of methods presented misbehaving trial or even a recording channel in the cali-
here gives a valid account of the methods' abilities. bration set can make its results meaningless. Obviously, the
Initially, we were surprised to ®nd that the time evolution chance of some recording artifact increases linearly with the
796 J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

number of channels. Temporal ®ltering, in particular the it reads


lower cut off, alleviates some common sources of EEG
artifacts, such as sudden shifts in DC level. Nevertheless, Vai Vai²
Ria ˆ   …3†
the use of a robust estimator would be highly advisable. A trace Vai Vai²
second dif®culty arises from the high dimension of these
covariance matrices. The number of parameters to be esti- The matrix products in this expression amounts to aver-
mated goes with the square of the number of recording aging over time. Let Rbi denote the corresponding normal-
channels. In consequence, recording from a larger number ized covariance matrices for the trials of condition b. The
of electrodes does not necessarily lead to better recognition normalization is done in order to eliminate trial-to-trial
results, simply due to the number of parameters that have to variations in the absolute values of the moments. Next,
be estimated rises much faster than the information gained the normalized covariance matrices are averaged over trials,
by additional electrodes. The solution to this dif®culty Ra ˆ kRia ltrials …4†
would be a structured estimator that explicitly uses a priori
information about the correlations of neighboring electro- and Rb equivalently for condition B.
des. Currently such a robust structured estimator of the We then build the composite covariance matrix
spatial covariance, especially tailored for surface EEG Rc ˆ Ra 1 Rb …5†
recordings, does not exist. Its development would be highly
desirable. which can be factored into its eigenvectors by
To conclude, we have demonstrated a method to devise
Rc ˆ Bc lB²c …6†
spatial ®lters that extract features relevant for the classi®ca-
tion of movement-related multi-channel EEG recordings. where Bc is an N £ N matrix of normalized eigenvectors,
These ®lters are optimal, in the sense that data projected BcBc² ˆ 1N£N, and l is the corresponding diagonal N £ N
onto these ®lters differ maximally (in the least squares matrix of eigenvalues.
sense) between pairwise conditions. While already provid- The whitening transformation
ing excellent recognition rates, the method presented here
probably still has margins of improvement, in particular in W ˆ l21=2 B²c …7†
the use of robust statistics and the use of a tailored estimator equalizes the variances in the space spanned by the eigen-
for the spatial covariance matrix. vectors in Bc. Now (Fukunaga, 1990), if the individual
covariance matrices Ra and Rb are transformed by
Acknowledgements Sa ˆ WRa W ² and Sb ˆ WRb W ² …8†

J.M.G. gratefully acknowledges fruitful discussions with then Sa and Sb share the same eigenvectors, since Sa 1 Sb ˆ
P. Grassberger, J. Martinerie, C. Neuper, B. Peters, B. WRcW² ˆ 1N£N. That is, if the eigendecomposition of Sa
Renault and F. Varela. H.F. thanks W. Bialek for useful reads
discussions. The research was partially supported by the Sa ˆ U c a U ² …9†
`Fonds zur FoÈrderung der wissenschaftlichen Forschung'
in Austria, project P11208MED. with orthonormal U, then
S b ˆ U cb U ² …10†
Appendix A. The method of Common Spatial Patterns
and the corresponding eigenvalues for the two matrices sum
We describe here the mathematical part of the method of up to 1
Common Spatial Patterns as used in the present article. See ca 1 cb ˆ I …11†
Fig. 4 for a ¯ow-diagram of the procedure of classi®er
design, and Fig. 5 for ¯ow-diagram of the process of classi- In consequence, the projection of whitened EEG epochs
®cation. on U will give us feature vectors that are optimal in the least
Let Vai denote the raw data of trial i, condition a repre- squares sense for discriminating between the two popula-
sented as an N £ T matrix with N the number of channels and tions. With the projection matrix
T the number of samples in time. Thus, the recording at a P² ˆ U ² W …12†
given point in time can be represented as point in N-dimen-
sional Euclidean space, and one EEG can be seen as a the decomposition of each trial writes
distribution of T such points. Since the constant part of Z i ˆ P² V i …13†
the EEGs has been removed by frequency ®ltering, the
mean of this distribution is zero. So the ®rst place to look Inverting this equation
for characteristic information is in its second moments, or h i21
covariance matrix. After normalizing with its total variance, V i ˆ P² Z i …14†
J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798 797

we see that each raw EEG is decomposed into the set of Koles ZJ. The quantitative extraction and topographic mapping of the
CSPs (columns of [P ²] 2l) with the new time series Zi as abnormal components in the clinical EEG. Electroenceph clin Neuro-
physiol 1991;79:440±447.
expansion coef®cients. The CSPs can, therefore, be seen Koles ZJ, Lazar MS, Zhou SZ. Spatial patterns underlying population
as source distribution matrix with Zi the corresponding differences in the background EEG. Brain Topogr 1990;2(4):275±
source wave form matrix. 284.
Koles ZJ, Lind JC, Flor-Henry P. Spatial patterns in the background EEG
underlying mental disease in man. Electroenceph clin Neurophysiol
Appendix B. Classi®er design 1994;91:319±328.
Koles ZJ, Lind JC, Soong ACK. Spatio-temporal decomposition of the
We recall brie¯y the equations of the quadratic and linear EEG: a general approach to the isolation and localization of sources.
Electroenceph clin Neurophysiol 1995;95:219±230.
Bayes classi®er for Gaussian distributed
!
data that we used to
Kuhlman WN. Functional topography of the human mu rhythm. Electro-
obtain the present results. Let M i be the center of the cali- enceph clin Neurophysiol 1978;44:83±93.
bration vectors for class i, Si their sample covariance matrix, Lugger K, Flotzinger D, SchloÈgl A, Pregenzer M, Pfurtscheller G. Feature
and Pi the a priori probability of this class. In our particular extraction for on-line EEG classi®cation using principal components
case, we perform only two-class discriminations on this and linear discriminants. Med Biol Eng Comp 1998;36:309±314.
McFarland DJ, McCane LM, David SV, Wolpaw JR. Spatial ®lter selection
level of processing, and the a priori probabilities are
for EEG-based communication. Electroenceph clin Neurophysiol
equal. But the equations hold for any number of classes. 1997;103:386±394.
The classi®cation rule for the quadratic Bayes classi®er MuÈller-Gerking J, Pfurtscheller G, Flyvbjerg H. Prompt classi®cation of
can be written in distance notation. That is, we de®ne a EEG signals based on their spatial distribution prior to movement. Soc
!
distance between a sample point x that is to be classi®ed, Neurosci Abstr 1997;23(2):1278.
Nirenberg LM, Hanley J, Stear EB. A new approach to prosthetic control:
and each of the classes i:
EEG motor signal tracking with an adaptively designed phase-locked
 ² X  
1 ! ! ! 1 X loop. IEEE Trans Biomed Eng 1971;18(6):389±398.
diQ ˆ x 2Mi 21
i ~x 2 M i 1 lnu i u 2 lnPi Percival DB, Walden AT. Spectral analysis for physical applications: multi-
2 2 taper and conventional univariate techniques. Cambridge : Cambridge
…15† University Press. Package available from http://lib.stat.cmu.edu/sapa-
clisp/.
where ² means the transpose, and u´udenotes the determinant. Peters BO, Pfurtscheller G, Flyvbjerg H. Prompt recognition of brain states
x~ is classi®ed into that class i, for which diQ is minimal over by their EEG signals. Theory Biosci 1997;116:290±301.
all classes. Pfurtscheller G. Mapping of event-related desynchronization and type of
Eq. (15) can be linearized by setting Si ˆ S for all classes, derivation. Electroenceph clin Neurophysiol 1988;70:190±193.
Pfurtscheller G. Event-related synchronization (ERS): an electrophysiolo-
i.e. estimating S as the covariance matrix of all calibration
gical correlate of cortical areas at rest. Electroenceph clin Neurophysiol
vectors from all classes. A simple calculation reduces Eq. 15 1992;83:62±69.
to Pfurtscheller G, Aranibar A. Evaluation of event-related desynchronization
! X (ERD) preceding and following self-paced movement. Electroenceph
21 ! 1 ! X 21 !
diL ˆ M ²i x 2 M ²i M i 1 lnPi …16† clin Neurophysiol 1979;46:138±146.
2 Pfurtscheller G, Flotzinger D, Neuper C. Differentiation between ®nger, toe
! and tongue movement in man based on 40 Hz EEG. Electroenceph clin
In the linearized case, some sample point x is classi®ed
Neurophysiol 1994a;90:456±460.
into that class i, for which diL is maximized (due to a change Pfurtscheller G, Pregenzer M, Neuper C. Visualization of sensorimotor
in sign). For more information about classi®er design, see areas involved in preparation for hand movement based on classi®ca-
Fukunaga (1990). tion of m and central b rhythms in single EEG trials in man. Neurosci
Lett. 1994b;181:43±46.
Pfurtscheller G, StancaÂk Jr A, Neuper C. Event-related synchronization
References (ERS) in the alpha band-an electrophysiological correlate of cortical
idling: a review. Int J Psychophysiol 1996;24(1/2):39±46.
Chatrian GE, Petersen MC, Lazarete JA. The blocking of the rolandic Pfurtscheller G, Neuper C, Flotzinger D, Pregenzer M. EEG-based discri-
wicket rhythm and some central changes related to movement. Electro- mination between imagination of right and left hand movement. Elec-
enceph clin Neurophysiol 1959;11:497±510. troenceph clin Neurophysiol 1997;103(5):1±10.
Farwell LA, Donchin E. Talking off the top of your head: toward a mental Pregenzer M, Pfurtscheller G, Flotzinger D. Selection of electrode positions
prosthesis utilizing event-related brain potentials. Electroenceph clin for an EEG-based Brain Computer Interface (BCI). Biomed. Technik
Neurophysiol 1988;70:510±523. 1994;39(10):264±269.
Fukunaga K. Introduction to statistical pattern recognition. second ed.. Soong ACK, Koles ZJ. Principal-component localization of the sources of
Boston: Academic Press, 1990. the background EEG. IEEE Trans Biomed Eng 1995;42(1):59±67.
Gastaut H. EÂtude eÂlectrocorticographique de la reÂactivite des rythmes Steriade M, Llinas RR. The functional states of the thalamus and the asso-
rolandique. Rev Neurol (Paris) 1952;87:176±182. ciated neuronal interplay. Physiol Rev 1988;68(3):649±742.
Hjorth B. An on-line transformation of EEG scalp potentials into orthogo- Sutter, E.E. Tran, D. Communication through visually induced electrical
nal source derivations. Electroenceph clin Neurophysiol 1975;39:526± brain responses. In: Proceedings of the 2nd International Conference:
530. Computers for Handicapped Persons, vol. 55. Schriftenreihe der O È ster-
Kalcher J, Flotzinger D, Neuper C, GoÈlly S, Pfurtscheller G. Graz brain± reichischen Computer Gesellschaft, 1990;279-288.
computer interface II: towards communication between humans and Tierney L. LISP-STAT an object-oriented environment for statistical
computers based on online classi®cation of three different EEG computing and dynamic graphics. New York: John Wiley. Program
patterns. Med Biol Eng Comput 1996;34:1±7. available from http://stat.umn.edu/ , luke/xls/xlsinfo/xlsinfo.html.
798 J. MuÈller-Gerking et al. / Clinical Neurophysiology 110 (1999) 787±798

Toro C, Deuschl G, Thatcher R, Sato S, Kufta C, Hallett M. Event-related computer interface for cursor control. Electroenceph clin Neurophysiol
desynchronization and movement-related cortical potentials on the 1991;78:252±259.
ECoG and EEG. Electroenceph clin Neurophysiol 1994;93:380±389. Yingling CD, Skinner JE. Gating of thalamic input to cerebral cortex by
Wolpaw JR, McFarland DJ. Multichannel EEG-based brain-computer nucleus; reticularis thalami. In: Desmedt JE, editor. Attention, volun-
communication. Electroenceph clin Neurophysiol 1994;90:444±449. tary contractions and ERPs, Karger: Basel, 1977. pp. 70±96.
Wolpaw JR, McFarland DJ, Neat GW, Forneris CA. An EEG-based brain

You might also like