This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

**Spatio-Spectral Filters for Improving the Classiﬁcation of
**

Single Trial EEG

Steven Lemm, Benjamin Blankertz, Gabriel Curio and Klaus-Robert Müller

Abstract—Data recorded in EEG based Brain-Computer Interface

experiments is generally very noisy, non-stationary and contaminated

with artifacts, that can deteriorate discrimination/classiﬁcation methods.

In this work we extend the Common Spatial Pattern (CSP) algorithm

with the aim to alleviate these adverse effects. In particular we suggest

an extension of CSP to the state space, which utilizes the method of time

delay embedding. As we will show, this allows for individually tuned

frequency ﬁlters at each electrode position and thus yields an improved

and more robust machine learning procedure. The advantages of the

proposed method over the original CSP method are veriﬁed in terms

of an improved information transfer rate (bits per trial) on a set of

EEG-recordings from experiments of imagined limb movements.

Index Terms—feature extraction, CSP, classiﬁcation, BCI

I. INTRODUCTION

T

He development of a Brain-Computer Interface (BCI) aims to

provide a communication channel from a human to a com-

puter that directly translates brain activity into sequences of control

commands. Such a device may give disabled people direct control

over a neuro-prosthesis or over computer applications as tools for

communicating solely by their intentions that are reﬂected in their

brain signals (e.g. [1]–[9]). We record brain activity by means of

multi-electrode electroencephalogram (EEG) which is non-invasive as

opposed to invasive work by e.g. [10]–[13]. An ideal BCI should only

need short adaptation and preparation times and should yield high

information transfer rates. In practice the user is behaving according

to a well-deﬁned paradigm (such as movement imagination) that

allows for an effective discrimination between different brain states

(see e.g. [3], [14]). From the signal processing perspective this

requires the deﬁnition of appropriate features that can be effectively

translated into a control signal, either by simple threshold criteria

(cf. [3]), or by means of machine learning, where given some

training examples for each brain state, the task is to infer a possibly

nonlinear discriminating function to distinguish between different

states (e.g. [3], [6]–[9], [15], [16]). Non-invasive data acquisition,

makes automated feature extraction challenging, since the signals of

interest are ’hidden’ in a highly ’noisy’ environment as EEG signals

consist of a superposition of a large number of simultaneously active

brain sources that are typically distorted by artifacts and even subject

to non-stationarity. However, outliers and artifacts can strongly distort

the classiﬁer performance [15], yielding bad generalization, i.e. the

performance on previously unseen data, can become arbitrarily poor.

This work was supported in part by the DFG SFB 618/B4, the Bundesmin-

isterium für Forschung (BMBF) Grant FKZ 01IBB02A,B, and the PASCAL

Network of Excellence (EU #506778).

S. Lemm is with the Department of Intelligent Data Analysis, FIRST Fraun-

hofer Institute, 12489 Berlin, Germany and also with the Neurophysics Group,

Department of Neurology, Campus Benjamin Franklin, Charité, University

Medicine 12200 Berlin, Germany (e-mail: steven.lemm@ﬁ rst.fhg.de)

B. Blankertz is with the Department of Intelligent Data Analysis, FIRST

Fraunhofer Institute, 12489 Berlin, Germany

G. Curio is with the Neurophysics Group, Department of Neurology,

Campus Benjamin Franklin, Charité, University Medicine Berlin, 12200

Germany

K.-R. Müller is with the Department of Intelligent Data Analysis, FIRST

Fraunhofer Institute, 12489 Berlin, Germany and also with the Computer

Science Department, University of Potsdam, 14482 Potsdam, Germany

So it is important to strive for robust machine learning and signal

processing methods that are as invariant as possible against such

distortions (e.g. [16], [17]). This paper contributes an extension to

CSP based BCI approaches [18], [19] in order to improve both: the

accuracy and the generalization ability of a brain state classiﬁer.

The paper is organized as follow: Section II introduces the underly-

ing neurophysiological principles and elaborates on the mathematical

background of CSP. Based on the latter the subsequent section

introduces the Common Spatio-Spectral Pattern algorithm, as an

extension of CSP to the state space and provides insights into the

implications of the extended model. The performance of the two

methods is then compared on a broad set of experiments in section

IV and a concluding discussion follows.

II. NEUROPHYSIOLOGICAL AND MATHEMATICAL BACKGROUND

A. Neurophysiology

According to the concept known as homunculus, for each part of

the human body there exists a respective region in the motor and

somatosensory area of the neocortex. The ’mapping’ from the body

to the respective brain areas preserves topography, i.e., neighboring

parts of the body are represented in neighboring parts of the cortex.

While the region of the feet is at the center of the vertex, the

left hand is represented lateralized on the right hemisphere and the

right hand on the left hemisphere. One possible feature of brain

activity, that can be exploited for brain-computer interfaces relies

on the neurophysiological observation, that large populations of

neurons in the respective cortex are ﬁring in rhythmical synchrony

when a subject is not engaged with one of his limbs (movements,

tactile senses, or just mental introspection). These are so-called idle

rhythms that are attenuated when engagement with the respective

limb takes place and that can be measured at the scalp in the

EEG as a brain rhythm around 10 Hz (µ-) or 20 Hz (β-rhythm).

As the attenuation effect is due to loss of synchrony in the neural

populations, it is termed event-related desynchronization (ERD), see

[20]. In opposite, the dual effect of an enhanced rhythmic activity

is called event-related synchronization (ERS). Such modulations of

the µ- and β-rhythm have been reported for different physiological

manipulations, e.g., by motor activity, both actual and imagined [21]–

[23], as well as by somatosensory stimulation [24]. In order to setup

a BCI we will utilize these physiological phenomena, in particular

caused by imaginary movements or sensations at different limbs.

The discrimination between different limbs, e.g. left hand vs. right

hand vs. foot will exploit the dissimilarities in the spatio-spectral

topography of the attenuation of the µ and/or β rhythm.

The strength of the sensorimotor idle rhythms as measured by scalp

EEG is known to vary strongly between subjects. This introduces a

high inter-subject variability on the accuracy with which an ERD-

based BCI system works. That is reﬂected in a high varying classiﬁ-

cation accuracy for different subjects. Hence [19], [25] suggested

to combine the oscillation feature of ERD with another feature

reﬂecting imagined or intended movements, the movement related

potentials (MRP), denoting a negative DC shift of the EEG signals

in the respective cortical regions. In this paper we are focusing on

2

the improvement of the classiﬁcation based only on the oscillatory

feature (ERD/S), nevertheless the suggested algorithm can be straight

forwardly integrated into the combination framework.

B. Common Spatial Pattern

The common spatial pattern (CSP) algorithm [26] is highly suc-

cessful in calculating spatial ﬁlters for detecting ERD/ERS effects

[27] and for ERD-based BCIs, see [18] and has been extended to

multi-class problems in [19]. Given two distributions in a high-

dimensional space, the CSP algorithm ﬁnds directions (i.e., spatial

ﬁlters) that maximize variance for one class and that at the same

time minimize variance for the other class. After having bandpass

ﬁltered the EEG signals in the frequency domain of interest, high

or low signal variance reﬂect a strong respective a weak (attenuated)

rhythmic activity. Let us take the example of discriminating left hand

vs. right hand imaginary movement. According to Section II-A, if

the EEG is ﬁrst preprocessed in order to focus on the µ and β

band, i.e. bandpass ﬁltered in the frequency range 7–30 Hz, then a

signal projected by a spatial ﬁlter focusing on the left hand area

is characterized by a strong motor rhythm during the imagination of

right hand movements (left hand is in idle state), and by an attenuated

motor rhythm if movement of the left hand is imagined. This can be

seen as a simpliﬁed exemplary solution of the optimization criterion

of the CSP algorithm: maximizing variance for the class of right hand

trials and at the same time minimizing variance for left hand trials.

Furthermore the CSP algorithms calculates the dual ﬁlter that will

focus on the area of the right hand (and it will even calculate several

ﬁlters for both optimizations by considering orthogonal subspaces).

To be more precise, let X

k

=

X

k

c,t

, c = 1, . . . ,C, t = t

0

, . . . , T

denote the (potentially bandpass ﬁltered) EEG recording of the k-th

trial, where C is the number of electrodes. Correspondingly Y

k

∈

{1; 2} represents the class-label of the k-th trial. Using this notation

then the two class-covariance matrices are given as,

Σ

1

= X

k

X

k

{k:Y

k

=1}

and Σ

2

= X

k

X

k

{k:Y

k

=2}

(1)

WΣ

1

W

= D and WΣ

2

W

= I −D. (2)

This can be accomplished in the following way: First whiten the

matrix Σ

1

+Σ

2

, i.e., determine a matrix P such that

P(Σ

1

+Σ

2

)P

= I. (3)

This decomposition can always be found due to positive deﬁniteness

of Σ

1

+Σ

2

. Second deﬁne S

1

= PΣ

1

P

and S

2

=PΣ

2

P

respectively

and calculate an orthogonal matrix R and a diagonal matrix D by

spectral theory such that

S

1

= RDR

. (4)

From S

1

+S

2

= I it follows that S

2

= R(I −D)R

**. Note that the
**

projection given by the p-th row of matrix R has a relative variance

of d

p

(p-th element of D) for trials of class 1 and relative variance

1−d

p

for trials of class 2. If d

p

is close to 1 the ﬁlter given by the

p-th row of R maximizes variance for class 1, and since 1 −d

p

is

close to 0, minimizes variance for class 2. The ﬁnal decomposition,

that satisﬁes Eq.(2) can be obtained from,

W := R

P. (5)

Using this decomposition matrix W the EEG recordings X

k

are

projected onto

Z

k

=WX

k

. (6)

The interpretation of W is two-fold, the rows of W are the stationary

spatial ﬁlters, whereas the columns of W

−1

can be seen as the

common spatial patterns or the time-invariant EEG source distribution

vectors.

III. COMMON SPATIO-SPECTRAL PATTERN (CSSP)

A. Feature Extraction

In this section we will extend the CSP algorithm to the state

space. Therefore we ﬁrst introduce an extension to the state space for

Eq.(6), subsequently we discuss its consequences for the optimization

problem and give a mathematical interpretation.

The concept of deterministic low-dimensional chaos has proven to

be fruitful in the understanding of many complex phenomena despite

the fact that very few natural systems have actually been found to be

low-dimensional deterministic in the sense of theory. Also a number

of attempts have been made to analyze various aspects of EEG time

series in the context of nonlinear deterministic dynamical systems.

Determinism in a strict mathematical sense means that there exist

an autonomous dynamical system, deﬁned typically by a ﬁrst order

differential equation ˙ y = f (y) in a state space Γ ⊂ R

D

, which

is assumed to be observed through a single measurable quantity

s = h(y). The system thus possesses D natural variables, but the

measurement is usually a nonlinear projection onto a scalar value.

In order to recover the deterministic properties of such a system,

we have to reconstruct an equivalent of the state space Γ. Therefore

the time delay embedding method is one way to do so. From the

sequence of scalar observations s

1

, s

2

, . . . , s

N

overlapping vectors

s

n

= (s

n

, s

n−τ

, . . . , s

n−(m−1)τ

) are formed, with τ as the delay time.

Then according to Takens Theorem [28] under certain conditions,

i.e. for mathematically perfect, noise free observations s

n

and m

sufﬁcient large, there exist a one-to-one relation between s

n

and the

unobserved vectors y

n

. Thus the attractor of any non-linear dynamic

can be reconstructed in the state space using an appropriate delay

coordinate function.

Since it is not our aim to reconstruct the entire dynamics of the

EEG-signal, but rather to extract robust (invariant) features, we extend

Eq. (6) just by one delayed coordinate, i.e.,

Z

k

=W

(0)

X

k

+W

(τ)

δ

τ

X

k

. (7)

Where, for notational convenience, δ

τ

denotes the delay operator, i.e,

δ

τ

(X

·,t

) = X

·,t−τ

. (8)

Once again, the optimization criterion is to ﬁnd projections W

(0)

and W

(τ)

such that signal variance of different Z

p

discriminates two

given classes best, i.e. maximizing the variance for one class while

minimizing it for the opposite class. In order to use the identical

mathematical concepts, introduced in section II-B, we append the

delayed vectors δ

τ

X

k

as additional channels to X

k

, i.e.

ˆ

X

k

=

X

k

δ

τ

X

k

. (9)

Then the optimization criterion can be formulated equivalent to

Eq. (2) using the class covariance matrices

ˆ

Σ

l

, l ∈ {1, 2} obtained

from

ˆ

X

k

. Following the steps of Eq. (3)–(5), this yields a solution

to this modiﬁed optimization problem, especially a decomposition

matrix

ˆ

W, whose columns divide in two submatrices:

ˆ

W

(0)

that

applies to X

k

and

ˆ

W

(τ)

that applies to the delayed channels δ

τ

X

k

,

i.e.,

ˆ

W

ˆ

X

k

=

ˆ

W

(0)

ˆ

W

(τ)

ˆ

X

k

.

Based on this, we will now further explore the implications of

this decomposition. Especially we will derive an interpretation into

a spatial and a spectral ﬁlter. Therefore let w denote the p-th row

3

of the decomposition matrix

ˆ

W, then the projected signal

ˆ

Z

k

p

= w

ˆ

X

k

can be expressed as

ˆ

Z

k

p

= w

(0)

X

k

+w

(τ)

δ

τ

X

k

=

C

∑

c=1

w

(0)

c

X

k

c,·

+w

(τ)

c

δ

τ

X

k

c,·

=

C

∑

c=1

γ

c

w

(0)

c

γ

c

X

k

c,·

+

w

(τ)

c

γ

c

δ

τ

X

k

c,·

, (10)

where (γ

c

)

c=1,...,C

is a pure spatial ﬁlter and (

w

(0)

c

γ

c

,

τ−1

. .. .

0, . . . , 0,

w

(τ)

c

γ

c

)

deﬁnes a Finite Impulse Response (FIR) ﬁlter for each electrode c.

This decomposition into a spatial and a FIR ﬁlter is not unique, but

there exists a very intuitive partitioning, that we will use throughout

this paper, i.e.

γ

c

:=

w

(0)

c

2

+w

(τ)

c

2

s

¯

ign

w

(0)

c

, (11)

where

s

¯

ign(w) =

−1, w < 0

+1, w ≥ 0

.

(12)

The use of the signed norm γ of the coefﬁcients vector as spatial

ﬁlter allows us to examine the origin of the projected source signals

since each column of the inverse of the entire spatial projection

matrix Γ = (γ)

p,c

corresponds to the coupling strength of one source

with the electrodes. Note that γ

c

maps the non-zero coefﬁcients of

corresponding FIR ﬁlter on to one half of the two dimensional unit-

sphere. Consequently we can easily parameterize the FIR ﬁlters by

the angle φ

(τ)

c

, deﬁned as

φ

(τ)

c

:= atan

w

(0)

c

w

(τ)

c

∈

−

π

2

,

π

2

. (13)

Fig. 1 and 2 illustrate the effect of these FIR ﬁlters by means of

the resulting ﬁlter responses curves for various values of τ and at

different angles φ

(τ)

. Note that at each electrode the FIR ﬁlter is

additional to the global bandpass ﬁlter, that focuses on the frequency

band of interest.

This additional property of the decomposition matrix allows for a

ﬁne tuning of the overall frequency ﬁlters, e.g. an adaptation to the

spectral EEG peaks.

B. Classiﬁcation

Finally, the features used for the classiﬁcation are obtained by

decomposing the EEG according to Eq.(6) respectively (7). Typically

one would retain only a small number 2m of projections that contain

most of the discriminative information between the two classes, i.e.

the signal variance. These projections are given by the rows of

ˆ

W that

correspond to the m largest and m smallest eigenvalues d

p

i

. Based on

the projected single trials Z

k

p

i

,t

, i = 1, . . . , 2m, a classiﬁer is estimated

on the log-transformed signal variances, i.e.

f

k

i

= log

var

Z

k

p

i

,·

. (14)

Speciﬁcally we applied a linear discriminant analysis (LDA) as

classiﬁcation model. Since the introduced delay τ appears as an

underlying hyper-parameter in the overall optimization scheme, it

has to be subject of a model selection procedure in order to ﬁnd the

optimal τ for the speciﬁc classiﬁcation task. Note that using τ = 0

in the set of feasible hyper-parameters, will incorporate the original

CSP algorithm into the model selection procedure.

Fig. 1: Magnitude responses of the FIR ﬁ lters at different values of φ

(τ)

for a single ﬁ xed delay τ =50 ms. Increasing φ

(τ)

from −

π

2

to

π

2

keeps the

position of the extreme values in the frequency domain ﬁ xed, but turns

maxima into minima. Since a minimum corresponds to a suppression

of the spectral information at this frequency, the FIR ﬁ lter for φ

(τ)

=

−

2

5

π focuses mainly on {10, 30, 50} Hz. In opposite the ﬁ lter given by

φ

(τ)

=

1

5

π can be associated with the contrary effect, i.e. it cancels these

frequencies. The shaded region denotes the frequency range (7–30 Hz) of

interest, that deﬁ nes the bandpass ﬁ lter used for preprocessing the data

in the experimental session.

C. Online Applicability

A major concern in online applications is to implement an algo-

rithm as efﬁcient as possible. In case of a CSP based classiﬁcation

procedure the involved operations such as the bandpass ﬁltering and

the spatial projections deﬁne the bottleneck for the processing speed.

Especially the ﬁltering of each EEG channel (64-128) is quite time

consuming and dramatically effects the overall processing speed. But

fortunately the bandpass ﬁltering, if realized by convolution with

h, an Inﬁnite Impulse Response (IIR) ﬁlter (h X), and the spatial

ﬁltering (WX) are strictly linear, and it follows immediately that these

operations can be executed in an arbitrary order, i.e.

W (hX) = h(WX). (15)

This implies, that for an online application one does not need to

bandpass ﬁlter all the channels before projecting onto the few spatial

features. Instead we can ﬁrst apply the spatial projection and ﬁlter

only the few resulting signals with the desired bandpass, what makes

the resulting algorithm applicable online.

In case of CSSP, due to the embedding operation δ

τ

this does not

hold in general. But we can easily work around this and are allowed

to exchange the operations in the following manner:

ˆ

W

hX

δ

τ

(hX)

=

I

C

I

C

h

ˆ

W

(0)

X

δ

τ

(h(

ˆ

W

(τ)

X))

. (16)

Where I

C

denotes the C-dimensional identity-matrix. Again one can

arbitrarily exchange the order of ﬁrst decomposing the EEG using

W

(0)

and W

(τ)

and then bandpass ﬁltering the projected channels,

as long as the delay operator and the ﬁnal summation is applied

afterwards. From this it can be directly seen, that the computational

demands are just doubled compared to the original CSP. Hence the

proposed extended model is applicable online as well.

4

Fig. 2: Magnitude responses of the FIR ﬁ lters at different values of τ

for a single ﬁ xed angle φ = −

2

5

π. Increasing τ changes the position

and increases the number of the extreme values in the frequency domain.

Since a minimum corresponds to a suppression of the spectral information

at this frequency, the FIR ﬁ lters for different τ focuses on different

sub-bands of the frequency spectrum. The shaded region denotes the

frequency range (7–30 Hz) of the additional bandpass ﬁ lter used for

preprocessing the data in the experimental session.

IV. APPLICATION

A. Experimental Design

In this section we apply the CSP and the proposed CSSP algorithm

to data from 22 EEG experiments of imaginary limb movements

performed by 8 different subjects and compare the resulting classi-

ﬁcation performances. The investigated mental tasks were imagined

movements of the left hand (l), the right hand (r), and the right

foot (f ). Two experiments were carried out with only 2 classes l and

r. In this study we investigate all resulting two-class classiﬁcation

problems, i.e. all possible combination of two classes (l-r,l-f and

r-f ), yielding 62 different classiﬁcation tasks.

All experiments start with training sessions in which the subjects

performed mental motor imagery tasks in response to a visual cue.

In such a way examples of brain activity can be obtained during

the different mental tasks. In the original experiment, these recorded

single trials were then used to train a classiﬁer which was in a second

sessions applied online to produce a feedback signal for (unlabeled)

continuous brain activity (results will be reported elsewhere).

In this off-line study we will only take data from the ﬁrst (training

session) into account to evaluate the performance of the algorithms

under study. This reﬂects, that if any feedback is provided to the

subject, he/she will adapt to the feedback (output of the classiﬁer)

itself. Hence the data obtained from a feedback session are biased

towards the speciﬁc classiﬁer used to produce the particular feedback

and for that reason we decided to exclude the data of the feedback

session for the evaluation process.

During the experiment the subjects were sitting in a comfortable

chair with arms lying relaxed on the armrests. In the training period

every 4.5–6 seconds one of 3 different visual stimuli indicated for 3–

3.5 seconds which mental task the subject should accomplish during

that period. The brain activity was recorded from the scalp at a

sampling rate of 100 Hz with multi-channel EEG ampliﬁers using 32,

64 resp. 128 channels. Additional to the EEG channels, we recorded

the electromyogram (EMG) from both forearms and the right leg

as well as horizontal and vertical electrooculogram (EOG) from the

eyes. The EMG and EOG channels were exclusively used to ensure

that the subjects performed no real limb or eye movements correlated

with the mental tasks that could directly (artifacts) or indirectly

(afferent signals from muscles and joint receptors) be reﬂected in the

EEG channels and thus be detected by the classiﬁer, which should be

constrained to operate on the CNS (central nervous system) activity

only. For each involved mental task we obtained between 120 and

200 single trials of recorded brain activity.

B. Training

After choosing all channels except the EOG and EMG and a

few outermost channels of the cap that are known to have non-

stationary signal quality due to changing conductive properties, we

applied a causal band-pass ﬁlter from 7–30 Hz to the data, which

encompasses both the µ- and the β-rhythm. The single trials were

extracted from the temporal frame 750–3500 ms after the presentation

of the visual stimulus, since during this period discriminative brain

patterns are present in most subjects. On these preprocessed single

trials we perform the feature extraction by the CSP and the proposed

CSSP method separately. For each method we project the data to

the three most informative directions of each class, yielding a 6-

dimensional subspace. For these six dimensions we calculate the

logarithms of the variances as feature vectors. Finally we apply a

linear discriminant analysis (LDA) to the feature vectors of all trials,

to ﬁnd a separating hyperplane. Note that in contrast to [6], [14] we

omitted the regularization (cf. [29]), since the dimensionality of the

features is rather small compared the number of training examples.

In order to compare the results of the two methods (CSP vs. CSSP

with selected τ) we split the data set in two. On the (chronological)

ﬁrst half we perform the training of the classiﬁer, i.e. the feature

extraction, model selection and the LDA. The performance of the

estimated models were then evaluated on the second half of the data,

to which both algorithms had have no access before. In the following

we will refer to these halves of the data set as “training data” and

“test data”.

To select the best CSSP model for each binary classiﬁcation

problem, i.e. ﬁnd the optimal τ, we estimate the performance of the

algorithms by means of a leave one (trial) out (LOO) cross-validation

Fig. 3: The left panel compares the LOO classiﬁ cation error on the

training data of the CSP based model and the selected CSSP based

model for 62 binary classiﬁ cation problems. In either case the error

decreases. For 6 out of 62 datasets the model selection suggest to keep

the CSP-based model, indicated by dots on the diagonal line, in any

other case the CSSP-based classiﬁ cation improves the LOO-training error.

The right panel shows the identical information but instead the error the

corresponding information transfer rate measured in bits per trial is given.

5

Fig. 4: The left panel compares the classiﬁ cation error of the CSP based

model and the selected CSSP based model on the test data taken from

the second half of the experiment. Except for a very few cases the CSSP

based classiﬁ cation model improves the test error. Again the right panel

provides the same information in terms of the corresponding information

transfer rate per trial.

procedure on the corresponding training data. Especially we run a

model selection procedure over τ = 0, ..15. Since the CSP resp. the

CSSP algorithm make explicit use of the label information, these

techniques have to be repeatedly applied to each LOO training set

within the cross-validation procedure. Otherwise the cross-validation

error could underestimate the generalization error. Fig. 3 compares

the LOO error respective the information transfer rate of the CSP

model and the best CSSP on the training data for each experiment.

Note that in this particular case CSSP always performs the same or

better than the CSP on the training data, since the CSP model is part

of the model selection procedure (τ = 0).

C. Results

After the model selection the best CSSP and the CSP based models

for each classiﬁcation problem were then ﬁnally trained on the entire

training data and afterwards applied to the corresponding test data,

obtained from the second unseen half of the training session of

each experiment. A comparison of the resulting test errors/bitrate

for all datasets is summarized in Fig. 4. The results on that database

Fig. 5: The left panel gives the resulting LOO information transfer rate

in bits per trial on the training set for one exemplarily chosen dataset

at all hyper-parameter τ = 0, . . . , 15 ( at a sampling rate of 100 Hz this

corresponds to 0, . . . 150ms). The right panel shows the corresponding

information transfer rate on the test data. For that particular classiﬁ cation

problem almost any delay parameter would improve the standard CSP-

based classiﬁ cation, shown in the lowest row of each panel. The best

model that has been selected by the model selection procedure based on

the informations provided in the left panel corresponds to τ = 7.

strongly suggest that the proposed algorithm outperforms the CSP-

based approach, in terms of an improved classiﬁcation accuracy and

an increased generalization ability.

For further illustrations of the properties of the proposed method,

we will now pick one speciﬁc dataset, especially we will focus on

a classiﬁcation task of imaginary foot and right hand movement

that serve this purpose best. For this selected dataset we will relate

the spatial ﬁlters found by the CSSP method to those of the CSP-

method and will discuss the impact of the additional spectral ﬁlters

(cf Eq. (13)).

To show the general performance of the CSSP-based algorithm, we

trained CSSP models for all values of τ ∈ {0, . . . , 15} on the training

data and applied them to the test data. Fig.5 shows the information

transfer rate on the training and on the test data for all models (τ =

0, . . . , 15) for the selected dataset, where τ = 0 corresponds to the

original CSP algorithm. In that particular case almost any additional

delay improves the classiﬁcation both on the training data as well as

on the unseen test data. According to the model selection procedure,

described in section IV-B, the model with the highest LOO bitrate

(lowest LOO error) on the training data (τ = 7) would be chosen for

the ﬁnal application. The spatial and spectral ﬁlters of the selected

CSP CSSP

spatial spectral spatial

Fig. 6: The scalp maps show the three spatial and spectral ﬁ lters for

class feet in descending order of the eigenvalues for both the CSP and the

CSSP method. The ﬁ lters were calculated for a classiﬁ cation of foot and

right hand movements. The spectral ﬁ lters are gray-scale-coded according

Eq.(13) in the range [−

π

2

,

π

2

]. Note that the ﬁ rst spatial ﬁ lters are almost

identical for CSP and CSSP, but already those for the second largest

eigenvalue diverges. While the spatial ﬁ lters found by CSP exhibit no

clear structure, the second spatial CSSP ﬁ lter resembles the ﬁ rst one.

The main difference in the projection occurs only in the spectral ﬁ lter,

where these ﬁ lters have opposite sign in the central region, indicating

that at the same location different spectral information is exploited for

classiﬁ cation.

model for each class are visualized in Fig. 6 and 7. These ﬁgures

also provide the ﬁlters found by CSP method. The ﬁrst two spectral

6

CSP CSSP

spatial spectral spatial

Fig. 7: The scalp maps correspond to the spatial and spectral ﬁ lters of the

three leading eigenvalues for class right hand of the CSP and the CSSP

method in descending order. The ﬁ lters were calculated for a classiﬁ cation

of foot and right hand movements. Again the ﬁ rst spatial ﬁ lters are almost

identical for CSP and CSSP (except for the sign). For the second largest

eigenvalue the CSSP ﬁ lters work still at the same location, whereas the

CSP ﬁ lter exhibit no clear focal point at the area of the left motor cortex.

ﬁlters for class foot supply insight how additional spectral information

is exploited. The corresponding spatial ﬁlters found by the CSSP

method are almost identical and focusing on the central region, while

the spectral ﬁlters have opposite signs in this area. This indicates, that

spectral information of disjoint frequency bands is obtained from

the same location. If we look at the FIR ﬁlter that corresponds to

τ = 7 in Fig. 2, it turns out that its maxima and minima respectively

are roughly at 14,21,28 Hz in the bandpass selected frequency range.

Remember that for different signs of Φ

(τ)

the maxima and the minima

are exchanged. Aggregating these facts, the ﬁrst spectral ﬁlter focuses

on β band, whereas the second spectral ﬁlter has its focus close to the

upper α band (11–13 Hz). So instead of having a spatial projection

onto a broad band (7–30 Hz) signal as a solution given by the CSP,

CSSP can split the information furthermore by projecting onto two

signals of the same local origin, but stemming from different sub-

bands, such that each projection fulﬁlls the optimization criterion of

maximizing the variance for one class, while having minimal variance

for the other class.

In such a way the CSSP algorithm is not only able to automatically

adapt to the spectral EEG characteristics of a subject, but also to

treat different spectral information, originating from closely adjoint

(or identical) focal areas independently. Summarizing, this yields

an improved spatio-spectral resolution of the discriminative signals

and hence can improve the robustness and accuracy of the ﬁnal

classiﬁcation.

V. CONCLUSION

The paper utilized the method of delay embedding in order to

extend the CSP-algorithm to the state space. The advantages of the

proposed method were proved by its application to the classiﬁcation

of imagined limb movements on a broad set of experiments. We found

that the CSSP algorithm introduced here outperforms the current

state-of-the-art CSP algorithm in terms of classiﬁcation accuracy as

well as generalization ability.

It is worth to mention, that in principle it is possible to further

extend the suggested model by incorporating more than one temporal

delay. But this will come at the expense of a quadratically increasing

number of parameters for the estimation of the covariance matrices,

while the number of single trials for training remain the same. Hence

and in consistence with our observations, this approach will tend

to over ﬁt the training data, i.e. the simultaneous diagonalization

of the class covariance matrices ﬁnds directions that explain the

training data best, but might have poor generalization ability. But

this directly raises an interesting question for further studies, i.e. how

to appropriately regularize the existing diagonalization methods to

achieve better generalization.

VI. ACKNOWLEDGEMENTS

The authors would like to thank Guido Dornhege and Matthias

Krauledat for valuable discussions.

REFERENCES

[1] J. del R. Millán, F. Renkens, J. Mouriñ, and W. Gerstner, “Noninvasive

brain-actuated control of a mobile robot by human EEG,” IEEE Trans

Biomed Eng, vol. 51, no. 6, pp. 1026–33, 2004.

[2] D. J. McFarland, W. A. Sarnacki, and J. R. Wolpaw, “Brain-computer

interface (BCI) operation: optimizing information transfer rates,” Biol.

Psychol., vol. 63, pp. 237–251, 2003.

[3] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M.

Vaughan, “Brain-computer interfaces for communication and control,”

Clin. Neurophysiol., vol. 113, pp. 767–791, 2002.

[4] N. Birbaumer, N. Ghanayim, T. Hinterberger, I. Iversen, B. Kotchoubey,

A. Kübler, J. Perelmouter, E. Taub, and H. Flor, “A spelling device for

the paralysed,” Nature, vol. 398, pp. 297–298, 1999.

[5] G. Pfurtscheller, C. Neuper, C. Guger, W. Harkam, R. Ramoser,

A. Schlögl, B. Obermaier, and M. Pregenzer, “Current trends in Graz

brain-computer interface (BCI),” IEEE Trans. Rehab. Eng., vol. 8, no. 2,

pp. 216–219, June 2000.

[6] B. Blankertz, G. Curio, and K.-R. Müller, “Classifying single trial EEG:

Towards brain computer interfacing,” in Advances in Neural Inf. Proc.

Systems (NIPS 01), T. G. Diettrich, S. Becker, and Z. Ghahramani, Eds.,

vol. 14, 2002, pp. 157–164.

[7] L. Trejo, K. Wheeler, C. Jorgensen, R. Rosipal, S. Clanton, B. Matthews,

A. Hibbs, R. Matthews, and M. Krupka, “Multimodal neuroelectric

interface development,” IEEE Trans. Neural Sys. Rehab. Eng., no. 11,

pp. 199–204, Jun 2003.

[8] L. Parra, C. Alvino, A. C. Tang, B. A. Pearlmutter, N. Yeung, A. Osman,

and P. Sajda, “Linear spatial integration for single trial detection in

encephalography,” NeuroImage, vol. 7, no. 1, pp. 223–230, 2002.

[9] W. D. Penny, S. J. Roberts, E. A. Curran, and M. J. Stokes, “EEG-based

communication: A pattern recognition approach,” IEEE Trans. Rehab.

Eng., vol. 8, no. 2, pp. 214–215, June 2000.

[10] M. Laubach, J. Wessberg, and M. Nicolelis, “Cortical ensemble activity

increasingly predicts behaviour outcomes during learning of a motor

task,” Nature, vol. 405, no. 6786, pp. 523–525, 2000.

[11] S. P. Levine, J. E. Huggins, S. L. BeMent, R. K. Kushwaha, L. A.

Schuh, M. M. Rohde, E. A. Passaro, D. A. Ross, K. V. Elsievich, and

B. J. Smith, “A direct brain interface based on event-related potentials,”

IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp. 180–185, 2000.

[12] D. Taylor, S. Tillery, and A. Schwartz, “Direct cortical control of 3d

neuroprosthetic devices,” Science, vol. 5574, no. 296, pp. 1829–32, Jun

2002.

[13] E. Leuthardt, G. Schalk, J. Wolpaw, J. Ojemann, and D. Moran, “A

brain-computer interface using electrocorticographic signals in human,”

Journal of Nerual Engineering, vol. 1, no. 2, pp. 63–71, Jun 2004.

7

[14] B. Blankertz, G. Dornhege, C. Schäfer, R. Krepki, J. Kohlmorgen, K.-

R. Müller, V. Kunzmann, F. Losch, and G. Curio, “Boosting bit rates

and error detection for the classiﬁ cation of fast-paced motor commands

based on single-trial EEG analysis,” IEEE Trans. Neural Sys. Rehab.

Eng., vol. 11, no. 2, pp. 127–131, 2003.

[15] K.-R. Müller, C. W. Anderson, and G. E. Birch, “Linear and non-linear

methods for brain-computer interfaces,” IEEE Trans. Neural Sys. Rehab.

Eng., vol. 11, no. 2, 2003, 165–169.

[16] K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf, “An

introduction to kernel-based learning algorithms,” IEEE Transactions on

Neural Networks, vol. 12, no. 2, pp. 181–201, 2001.

[17] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A. Smola, and K.-R.

Müller, “Invariant feature extraction and classiﬁ cation in kernel spaces,”

in Advances in Neural Information Processing Systems, S. Solla, T. Leen,

and K.-R. Müller, Eds., vol. 12. MIT Press, 2000, pp. 526–532.

[18] H. Ramoser, J. Müller-Gerking, and G. Pfurtscheller, “Optimal spatial

ﬁ ltering of single trial EEG during imagined hand movement,” IEEE

Trans. Rehab. Eng., vol. 8, no. 4, pp. 441–446, 2000.

[19] G. Dornhege, B. Blankertz, G. Curio, and K.-R. Müller, “Boosting

bit rates in non-invasive EEG single-trial classiﬁ cations by feature

combination and multi-class paradigms,” IEEE Trans. Biomed. Eng.,

vol. 51, no. 6, pp. 993–1002, 2004.

[20] G. Pfurtscheller and F. H. L. da Silva, “Event-related EEG/MEG

synchronization and desynchronization: basic principles,” Clin. Neuro-

physiol., vol. 110, no. 11, pp. 1842–1857, Nov 1999.

[21] H. Jasper and W. Penﬁ eld, “Electrocorticograms in man: Effect of

voluntary movement upon the electrical activity of the precentral gyrus,”

Arch. Psychiatrie Zeitschrift Neurol., vol. 183, pp. 163–74, 1949.

[22] G. Pfurtscheller and A. Arabibar, “Evaluation of event-related desyn-

chronization preceding and following voluntary self-paced movement,”

Electroenceph. clin. Neurophysiol., vol. 46, pp. 138–46, 1979.

[23] A. Schnitzler, S. Salenius, R. Salmelin, V. Jousmäki, and R. Hari, “In-

volvement of primary motor cortex in motor imagery: a neuromagnetic

study,” Neuroimage, vol. 6, pp. 201–8, 1997.

[24] V. Nikouline, K. Linkenkaer-Hansen, H. Wikström, M. Kesäniemi,

E. Antonova, R. Ilmoniemi, and J. Huttunen, “Dynamics of mu-rhythm

suppression caused by median nerve stimulation: a magnetoencephalo-

graphic study in human subjects,” Neuroscience Letters, vol. 294, 2000.

[25] G. Dornhege, B. Blankertz, G. Curio, and K.-R. Müller, “Increase

information transfer rates in BCI by CSP extension to multi-class,” in

Advances in Neural Inf. Proc. Systems (NIPS 03), vol. 16, 2004, in press.

[26] K. Fukanaga, Introduction to Statistical Pattern Recognition. Academic

Press, 1972.

[27] Z. J. Koles and A. C. K. Soong, “EEG source localization: implementing

the spatio-temporal decomposition approach,” Electroencephalogr. Clin.

Neurophysiol., vol. 107, pp. 343–352, 1998.

[28] F. Takens, “Detecting strange attractors in ﬂuid turbulence,” in Dynam-

ical Systems and Turbulence, D. Rand and L. Young, Eds. Berlin:

Springer-Verlag, 1981, pp. 366–381.

[29] J. H. Friedman, “Regularized discriminant analysis,” J. Amer. Statist.

Assoc., vol. 84, no. 405, pp. 165–175, 1989.

s2 .e. . we have to reconstruct an equivalent of the state space Γ. i. see [18] and has been extended to multi-class problems in [19]. Second deﬁne S1 = PΣ1 P and S2 = PΣ2 P respectively and calculate an orthogonal matrix R and a diagonal matrix D by spectral theory such that S1 = RDR . and by an attenuated motor rhythm if movement of the left hand is imagined. Σ1 = X k X k W Σ1 W {k:Y k =1} common spatial patterns or the time-invariant EEG source distribution vectors. where C is the number of electrodes. spatial ﬁlters) that maximize variance for one class and that at the same time minimize variance for the other class.(6). k To be more precise. . (8) From S1 + S2 = I it follows that S2 = R(I − D)R . introduced in section II-B. let X k = Xc.e. According to Section II-A. . If d p is close to 1 the ﬁlter given by the p-th row of R maximizes variance for class 1. ˆk X = Xk δτXk . whereas the columns of W −1 can be seen as the Then the optimization criterion can be formulated equivalent to ˆ Eq. we append the delayed vectors δ τ X k as additional channels to X k ..t ) = X·. . W := R P. Thus the attractor of any non-linear dynamic can be reconstructed in the state space using an appropriate delay coordinate function. Z k = W (0) X k +W (τ ) δ τ X k . Also a number of attempts have been made to analyze various aspects of EEG time series in the context of nonlinear deterministic dynamical systems. Especially we will derive an interpretation into a spatial and a spectral ﬁlter. Using this decomposition matrix W the EEG recordings projected onto Zk = W X k . . This can be seen as a simpliﬁed exemplary solution of the optimization criterion of the CSP algorithm: maximizing variance for the class of right hand trials and at the same time minimizing variance for left hand trials. This decomposition can always be found due to positive deﬁniteness of Σ1 + Σ2 . Therefore we ﬁrst introduce an extension to the state space for Eq. sn−τ .(2) can be obtained from. III. but the measurement is usually a nonlinear projection onto a scalar value. T denote the (potentially bandpass ﬁltered) EEG recording of the k-th trial. δ τ denotes the delay operator. . Common Spatial Pattern The common spatial pattern (CSP) algorithm [26] is highly successful in calculating spatial ﬁlters for detecting ERD/ERS effects [27] and for ERD-based BCIs.. we will now further explore the implications of this decomposition. B.e. Following the steps of Eq. In order to use the identical mathematical concepts. . After having bandpass ﬁltered the EEG signals in the frequency domain of interest. that satisﬁes Eq. this yields a solution from X to this modiﬁed optimization problem. minimizes variance for class 2. C OMMON S PATIO -S PECTRAL PATTERN (CSSP) A. t = t0 .e.t−τ . bandpass ﬁltered in the frequency range 7–30 Hz. sN overlapping vectors sn = (sn . This can be accomplished in the following way: First whiten the matrix Σ1 + Σ2 . (9) The interpretation of W is two-fold. Then according to Takens Theorem [28] under certain conditions. determine a matrix P such that P(Σ1 + Σ2 )P = I. The system thus possesses D natural variables. Since it is not our aim to reconstruct the entire dynamics of the EEG-signal.e. (6) just by one delayed coordinate. noise free observations s n and m sufﬁcient large. whose columns divide in two submatrices: W (0) that ˆ applies to X k and W (τ ) that applies to the delayed channels δ τ X k . (4) δ τ (X·. which is assumed to be observed through a single measurable quantity s = h(y). (2) using the class covariance matrices Σl . nevertheless the suggested algorithm can be straight forwardly integrated into the combination framework. deﬁned typically by a ﬁrst order ˙ differential equation y = f (y) in a state space Γ ⊂ RD . and since 1 − d p is close to 0. Based on this. 2} represents the class-label of the k-th trial.e. then a signal projected by a spatial ﬁlter focusing on the left hand area is characterized by a strong motor rhythm during the imagination of right hand movements (left hand is in idle state). Given two distributions in a highdimensional space. i. . Correspondingly Y k ∈ {1. the optimization criterion is to ﬁnd projections W (0) and W (τ ) such that signal variance of different Z p discriminates two given classes best. .e. . i. if the EEG is ﬁrst preprocessed in order to focus on the µ and β band. (7) and and Σ2 = X k X k W Σ2 W {k:Y k =2} (1) (2) =D = I − D. . W X = W (0) W (τ ) X . Determinism in a strict mathematical sense means that there exist an autonomous dynamical system. Note that the projection given by the p-th row of matrix R has a relative variance of d p (p-th element of D) for trials of class 1 and relative variance 1 − d p for trials of class 2. right hand imaginary movement. maximizing the variance for one class while minimizing it for the opposite class.C. 2} obtained ˆ k . for notational convenience. i. In order to recover the deterministic properties of such a system. the rows of W are the stationary spatial ﬁlters. the CSP algorithm ﬁnds directions (i. .. but rather to extract robust (invariant) features. . (3)–(5). Using this notation then the two class-covariance matrices are given as.e. high or low signal variance reﬂect a strong respective a weak (attenuated) rhythmic activity. The ﬁnal decomposition. we extend Eq.2 the improvement of the classiﬁcation based only on the oscillatory feature (ERD/S). i. ˆ ˆk ˆ ˆ ˆk i. l ∈ {1. especially a decomposition ˆ ˆ matrix W . Xk (5) are (6) Once again. Furthermore the CSP algorithms calculates the dual ﬁlter that will focus on the area of the right hand (and it will even calculate several ﬁlters for both optimizations by considering orthogonal subspaces). . Therefore the time delay embedding method is one way to do so.e. i. Therefore let w denote the p-th row . From the sequence of scalar observations s1 . (3) Where. i. . c = 1. for mathematically perfect. . Feature Extraction In this section we will extend the CSP algorithm to the state space. sn−(m−1)τ ) are formed. The concept of deterministic low-dimensional chaos has proven to be fruitful in the understanding of many complex phenomena despite the fact that very few natural systems have actually been found to be low-dimensional deterministic in the sense of theory. there exist a one-to-one relation between s n and the unobserved vectors yn . with τ as the delay time. Let us take the example of discriminating left hand vs.t . subsequently we discuss its consequences for the optimization problem and give a mathematical interpretation..

(6) respectively (7). Note that γc maps the non-zero coefﬁcients of corresponding FIR ﬁlter on to one half of the two dimensional unitsphere. . Classiﬁcation Finally. . Since the introduced delay τ appears as an underlying hyper-parameter in the overall optimization scheme. it cancels these 5 frequencies. ˆ the signal variance. then the projected signal Z p = w X can be expressed as ˆk Zp = = = w(0) X k + w(τ ) δ τ X k c=1 C ∑ wc ∑ γc C (0) k (τ ) X c. deﬁned as Fig. 2 2 (13) C. 2m. i. In case of a CSP based classiﬁcation procedure the involved operations such as the bandpass ﬁltering and the spatial projections deﬁne the bottleneck for the processing speed. This decomposition into a spatial and a FIR ﬁlter is not unique. Again one can arbitrarily exchange the order of ﬁrst decomposing the EEG using W (0) and W (τ ) and then bandpass ﬁltering the projected channels.c corresponds to the coupling strength of one source with the electrodes. 30. +1. In opposite the ﬁlter given by φ (τ ) = 1 π can be associated with the contrary effect. Instead we can ﬁrst apply the spatial projection and ﬁlter only the few resulting signals with the desired bandpass. if realized by convolution with h. will incorporate the original CSP algorithm into the model selection procedure. In case of CSSP. . Increasing φ(τ ) from − π to π keeps the 2 2 position of the extreme values in the frequency domain ﬁxed. that focuses on the frequency band of interest.e. B. Since a minimum corresponds to a suppression of the spectral information at this frequency. i. But we can easily work around this and are allowed to exchange the operations in the following manner: ˆ W h X δ τ (h X) = IC IC ˆ h W (0) X ˆ δ τ (h (W (τ ) X)) . . This additional property of the decomposition matrix allows for a ﬁne tuning of the overall frequency ﬁlters. W (h X) = h (W X) .e. 1 and 2 illustrate the effect of these FIR ﬁlters by means of the resulting ﬁlter responses curves for various values of τ and at different angles φ (τ ) . an adaptation to the spectral EEG peaks. that we will use throughout this paper. the FIR ﬁlter for φ(τ ) = 2 − 5 π focuses mainly on {10.· . Hence the proposed extended model is applicable online as well. 0. Especially the ﬁltering of each EEG channel (64-128) is quite time consuming and dramatically effects the overall processing speed.. but turns maxima into minima. i = 1. But fortunately the bandpass ﬁltering.e. . due to the embedding operation δ τ this does not hold in general. that for an online application one does not need to bandpass ﬁlter all the channels before projecting onto the few spatial features.. it has to be subject of a model selection procedure in order to ﬁnd the optimal τ for the speciﬁc classiﬁcation task. i. φc (τ ) := atan wc (0) (τ ) wc π π .. a classiﬁer is estimated p on the log-transformed signal variances. The use of the signed norm γ of the coefﬁcients vector as spatial ﬁlter allows us to examine the origin of the projected source signals since each column of the inverse of the entire spatial projection matrix Γ = (γ ) p. i. Typically one would retain only a small number 2m of projections that contain most of the discriminative information between the two classes. w<0 w≥0 (12) .g. Where IC denotes the C-dimensional identity-matrix. γc c. ( wγcc (0) (τ ) γc := where sign(w) = wc (0) 2 + wc (0) (τ ) 2 . i. (14) This implies.e. From this it can be directly seen. that the computational demands are just doubled compared to the original CSP. what makes the resulting algorithm applicable online. wγcc ) where (γc )c=1. fik = log var Z k i .· c=1 wc τ k wc Xk + δ X c. an Inﬁnite Impulse Response (IIR) ﬁlter (h X). (16) Speciﬁcally we applied a linear discriminant analysis (LDA) as classiﬁcation model. and it follows immediately that these operations can be executed in an arbitrary order. The shaded region denotes the frequency range (7–30 Hz) of interest. . . Note that at each electrode the FIR ﬁlter is additional to the global bandpass ﬁlter. ∈ − . Online Applicability A major concern in online applications is to implement an algorithm as efﬁcient as possible. 1: Magnitude responses of the FIR ﬁlters at different values of φ(τ ) for a single ﬁxed delay τ =50 ms. Consequently we can easily parameterize the FIR ﬁlters by (τ ) the angle φc .C is a pure spatial ﬁlter and deﬁnes a Finite Impulse Response (FIR) ﬁlter for each electrode c. (11) sign wc −1.· p . These projections are given by the rows of W that correspond to the m largest and m smallest eigenvalues d pi . the features used for the classiﬁcation are obtained by decomposing the EEG according to Eq. .t . (15) Fig.e. but there exists a very intuitive partitioning. 50} Hz.· + wc δ τ X k c. and the spatial ﬁltering (W X) are strictly linear.3 ˆ ˆk ˆk of the decomposition matrix W . Based on the projected single trials Z k i . Note that using τ = 0 in the set of feasible hyper-parameters. 0. e. that deﬁnes the bandpass ﬁlter used for preprocessing the data in the experimental session. as long as the delay operator and the ﬁnal summation is applied afterwards. .· γc τ −1 (0) (τ ) (10) ..

Note that in contrast to [6].e. the feature extraction. The investigated mental tasks were imagined movements of the left hand (l). we estimate the performance of the algorithms by means of a leave one (trial) out (LOO) cross-validation Fig. which encompasses both the µ . On these preprocessed single trials we perform the feature extraction by the CSP and the proposed CSSP method separately. A PPLICATION A. model selection and the LDA. On the (chronological) ﬁrst half we perform the training of the classiﬁer. i. All experiments start with training sessions in which the subjects performed mental motor imagery tasks in response to a visual cue. yielding a 6dimensional subspace. to which both algorithms had have no access before. [14] we omitted the regularization (cf. The single trials were extracted from the temporal frame 750–3500 ms after the presentation of the visual stimulus. To select the best CSSP model for each binary classiﬁcation problem. The shaded region denotes the frequency range (7–30 Hz) of the additional bandpass ﬁlter used for preprocessing the data in the experimental session. In such a way examples of brain activity can be obtained during the different mental tasks. that if any feedback is provided to the subject. In this study we investigate all resulting two-class classiﬁcation problems. Hence the data obtained from a feedback session are biased towards the speciﬁc classiﬁer used to produce the particular feedback and for that reason we decided to exclude the data of the feedback session for the evaluation process. since during this period discriminative brain patterns are present in most subjects. In this off-line study we will only take data from the ﬁrst (training session) into account to evaluate the performance of the algorithms under study. This reﬂects. For each method we project the data to the three most informative directions of each class.e. Finally we apply a linear discriminant analysis (LDA) to the feature vectors of all trials.4 sampling rate of 100 Hz with multi-channel EEG ampliﬁers using 32. The brain activity was recorded from the scalp at a Fig. we applied a causal band-pass ﬁlter from 7–30 Hz to the data. and the right foot (f ). In order to compare the results of the two methods (CSP vs. B.l-f and r-f ). i. i. The performance of the estimated models were then evaluated on the second half of the data. 128 channels. For each involved mental task we obtained between 120 and 200 single trials of recorded brain activity. In the original experiment. 64 resp. In either case the error decreases. For 6 out of 62 datasets the model selection suggest to keep the CSP-based model. since the dimensionality of the features is rather small compared the number of training examples. [29]). these recorded single trials were then used to train a classiﬁer which was in a second sessions applied online to produce a feedback signal for (unlabeled) continuous brain activity (results will be reported elsewhere). Increasing τ changes the position and increases the number of the extreme values in the frequency domain.and the β -rhythm.e. yielding 62 different classiﬁcation tasks. Training After choosing all channels except the EOG and EMG and a few outermost channels of the cap that are known to have nonstationary signal quality due to changing conductive properties. Two experiments were carried out with only 2 classes l and r. CSSP with selected τ ) we split the data set in two. In the following we will refer to these halves of the data set as “training data” and “test data”. During the experiment the subjects were sitting in a comfortable chair with arms lying relaxed on the armrests. The EMG and EOG channels were exclusively used to ensure that the subjects performed no real limb or eye movements correlated with the mental tasks that could directly (artifacts) or indirectly (afferent signals from muscles and joint receptors) be reﬂected in the EEG channels and thus be detected by the classiﬁer. . in any other case the CSSP-based classiﬁcation improves the LOO-training error.5–6 seconds one of 3 different visual stimuli indicated for 3– 3. Experimental Design In this section we apply the CSP and the proposed CSSP algorithm to data from 22 EEG experiments of imaginary limb movements performed by 8 different subjects and compare the resulting classiﬁcation performances. which should be constrained to operate on the CNS (central nervous system) activity only. Additional to the EEG channels. IV. the right hand (r). Since a minimum corresponds to a suppression of the spectral information at this frequency. all possible combination of two classes (l-r. 2: Magnitude responses of the FIR ﬁlters at different values of τ 2 for a single ﬁxed angle φ = −5 π .5 seconds which mental task the subject should accomplish during that period. ﬁnd the optimal τ . 3: The left panel compares the LOO classiﬁcation error on the training data of the CSP based model and the selected CSSP based model for 62 binary classiﬁcation problems. In the training period every 4. the FIR ﬁlters for different τ focuses on different sub-bands of the frequency spectrum. The right panel shows the identical information but instead the error the corresponding information transfer rate measured in bits per trial is given. indicated by dots on the diagonal line. to ﬁnd a separating hyperplane. we recorded the electromyogram (EMG) from both forearms and the right leg as well as horizontal and vertical electrooculogram (EOG) from the eyes. For these six dimensions we calculate the logarithms of the variances as feature vectors. he/she will adapt to the feedback (output of the classiﬁer) itself.

In that particular case almost any additional delay improves the classiﬁcation both on the training data as well as on the unseen test data. 4: The left panel compares the classiﬁcation error of the CSP based model and the selected CSSP based model on the test data taken from the second half of the experiment. . the second spatial CSSP ﬁlter resembles the ﬁrst one. . A comparison of the resulting test errors/bitrate for all datasets is summarized in Fig. To show the general performance of the CSSP-based algorithm. The main difference in the projection occurs only in the spectral ﬁlter. 5: The left panel gives the resulting LOO information transfer rate in bits per trial on the training set for one exemplarily chosen dataset at all hyper-parameter τ = 0. C. . Fig. These ﬁgures also provide the ﬁlters found by CSP method. 3 compares the LOO error respective the information transfer rate of the CSP model and the best CSSP on the training data for each experiment. since the CSP model is part of the model selection procedure (τ = 0). obtained from the second unseen half of the training session of each experiment. The results on that database strongly suggest that the proposed algorithm outperforms the CSPbased approach. The ﬁlters were calculated for a classiﬁcation of foot and right hand movements. Especially we run a model selection procedure over τ = 0. the CSSP algorithm make explicit use of the label information. . 4. . where these ﬁlters have opposite sign in the central region. The spatial and spectral ﬁlters of the selected CSP spatial spatial CSSP spectral Fig. Since the CSP resp. For further illustrations of the properties of the proposed method.. 6: The scalp maps show the three spatial and spectral ﬁlters for class feet in descending order of the eigenvalues for both the CSP and the CSSP method. model for each class are visualized in Fig. Again the right panel provides the same information in terms of the corresponding information transfer rate per trial. . Fig. . indicating that at the same location different spectral information is exploited for classiﬁcation. 15) for the selected dataset. procedure on the corresponding training data. The ﬁrst two spectral . 6 and 7. .5 shows the information transfer rate on the training and on the test data for all models (τ = 0. Note that the ﬁrst spatial ﬁlters are almost 2 2 identical for CSP and CSSP. 15} on the training data and applied them to the test data. . . π ]. .15. For this selected dataset we will relate the spatial ﬁlters found by the CSSP method to those of the CSPmethod and will discuss the impact of the additional spectral ﬁlters (cf Eq. we will now pick one speciﬁc dataset. (13)). in terms of an improved classiﬁcation accuracy and an increased generalization ability. Note that in this particular case CSSP always performs the same or better than the CSP on the training data. especially we will focus on a classiﬁcation task of imaginary foot and right hand movement that serve this purpose best. described in section IV-B. Results After the model selection the best CSSP and the CSP based models for each classiﬁcation problem were then ﬁnally trained on the entire training data and afterwards applied to the corresponding test data. . . According to the model selection procedure. Except for a very few cases the CSSP based classiﬁcation model improves the test error. the model with the highest LOO bitrate (lowest LOO error) on the training data (τ = 7) would be chosen for the ﬁnal application. .5 Fig. 150 ms).(13) in the range [− π . . The spectral ﬁlters are gray-scale-coded according Eq. . The best model that has been selected by the model selection procedure based on the informations provided in the left panel corresponds to τ = 7. The right panel shows the corresponding information transfer rate on the test data. 15 ( at a sampling rate of 100 Hz this corresponds to 0. For that particular classiﬁcation problem almost any delay parameter would improve the standard CSPbased classiﬁcation. Fig. While the spatial ﬁlters found by CSP exhibit no clear structure. we trained CSSP models for all values of τ ∈ {0. but already those for the second largest eigenvalue diverges. Otherwise the cross-validation error could underestimate the generalization error. shown in the lowest row of each panel. where τ = 0 corresponds to the original CSP algorithm. these techniques have to be repeatedly applied to each LOO training set within the cross-validation procedure.

2002. Kotchoubey. 51. pp. I. 2000. McFarland.. A. D. 767–791.. [10] M. 1026–33. Eng. Hence and in consistence with our observations. Kübler. Proc. no. So instead of having a spatial projection onto a broad band (7–30 Hz) signal as a solution given by the CSP. [6] B. Kushwaha. Parra. i. Diettrich. 398. and W. L. Again the ﬁrst spatial ﬁlters are almost identical for CSP and CSSP (except for the sign). J. Tillery. 296. M. C. 1. the simultaneous diagonalization of the class covariance matrices ﬁnds directions that explain the training data best. pp. and M. [12] D. G. Guger. Rehab. Osman. that spectral information of disjoint frequency bands is obtained from the same location. vol. Huggins. 2. and P. J. pp. S. Gerstner. Psychol. 2. “Cortical ensemble activity increasingly predicts behaviour outcomes during learning of a motor task. Sajda. CSSP can split the information furthermore by projecting onto two signals of the same local origin. “Direct cortical control of 3d neuroprosthetic devices. Curran. and H. Trejo. while having minimal variance for the other class. 8. We found that the CSSP algorithm introduced here outperforms the current state-of-the-art CSP algorithm in terms of classiﬁcation accuracy as well as generalization ability. 2004. [5] G. Millán. J. Leuthardt. but also to treat different spectral information. 2. vol. 237–251. Pregenzer. M. McFarland. 5574. Sarnacki. B. it turns out that its maxima and minima respectively are roughly at 14. N. Harkam. J. K. Rosipal. Matthews. S. vol. pp. E. E.21. J. “A direct brain interface based on event-related potentials.. 405. For the second largest eigenvalue the CSSP ﬁlters work still at the same location. 214–215. “Linear spatial integration for single trial detection in encephalography. 7: The scalp maps correspond to the spatial and spectral ﬁlters of the three leading eigenvalues for class right hand of the CSP and the CSSP method in descending order. Pfurtscheller. The ﬁlters were calculated for a classiﬁcation of foot and right hand movements. Taylor. 2003. Perelmouter. Obermaier. 157–164. and T. B. Yeung. A. Eds. S. and M. Ross. Vaughan. R. Alvino. June 2000. Jun 2004. 11. In such a way the CSSP algorithm is not only able to automatically adapt to the spectral EEG characteristics of a subject. A. 8. L. 1829–32. [7] L. pp. vol.” IEEE Trans. and K. no. G. Pfurtscheller. Neural Sys. “Current trends in Graz brain-computer interface (BCI). Schalk. B. Jorgensen. Renkens. Rohde.” IEEE Trans. “A brain-computer interface using electrocorticographic signals in human. Wolpaw.” IEEE Trans. Neuper. vol. C. 297–298. 180–185. how to appropriately regularize the existing diagonalization methods to achieve better generalization. Smith. BeMent. while the spectral ﬁlters have opposite signs in this area. Aggregating these facts. R. Jun 2002. C. [3] J. Passaro. ACKNOWLEDGEMENTS The authors would like to thank Guido Dornhege and Matthias Krauledat for valuable discussions. pp. “Noninvasive brain-actuated control of a mobile robot by human EEG.” Journal of Nerual Engineering. 523–525. pp. and Z. whereas the CSP ﬁlter exhibit no clear focal point at the area of the left motor cortex. but stemming from different subbands. Matthews. Ghanayim. R. Levine.. J. Wolpaw. vol.. no. “Classifying single trial EEG: Towards brain computer interfacing.” Clin. vol. E. the ﬁrst spectral ﬁlter focuses on β band. Nicolelis. [8] L. “Brain-computer interface (BCI) operation: optimizing information transfer rates. Müller. Wolpaw. Elsievich. Krupka.e. Flor. R. 2000. Penny.e. Birbaumer. The advantages of the proposed method were proved by its application to the classiﬁcation of imagined limb movements on a broad set of experiments. Roberts.” Science. M. whereas the second spectral ﬁlter has its focus close to the upper α band (11–13 Hz). 7. Fig. But this will come at the expense of a quadratically increasing number of parameters for the estimation of the covariance matrices. J. Taub. [13] E. G. E. “Multimodal neuroelectric interface development. N. R EFERENCES [1] J. But this directly raises an interesting question for further studies. VI. Wessberg. This indicates. Pearlmutter. 2. no. and M. Ramoser. vol. A. A. del R. pp. R. A. Mouriñ. no. The corresponding spatial ﬁlters found by the CSSP method are almost identical and focusing on the central region. i. K. while the number of single trials for training remain the same. T. pp. Schlögl. K. A. Jun 2003. J. no. this approach will tend to over ﬁt the training data.. this yields an improved spatio-spectral resolution of the discriminative signals and hence can improve the robustness and accuracy of the ﬁnal classiﬁcation. G. Rehab.” in Advances in Neural Inf. and A. 1999. and B. P. B. V. J. Summarizing. N. ﬁlters for class foot supply insight how additional spectral information is exploited. R. 2. Moran. [9] W. originating from closely adjoint (or identical) focal areas independently. If we look at the FIR ﬁlter that corresponds to τ = 7 in Fig. J. but might have poor generalization ability. vol. W. Stokes. pp. “Brain-computer interfaces for communication and control. Blankertz. 223–230.” IEEE Trans. Birbaumer.. [4] N. no. S. Rehab. “EEG-based communication: A pattern recognition approach. June 2000. Ghahramani. Neurophysiol. C. Becker. Hinterberger. A. Schwartz. such that each projection fulﬁlls the optimization criterion of maximizing the variance for one class. Remember that for different signs of Φ(τ ) the maxima and the minima are exchanged. D. Systems (NIPS 01). Tang. that in principle it is possible to further extend the suggested model by incorporating more than one temporal delay. 6. pp. Schuh. A. 199–204.” Biol. Wheeler. Ojemann. [11] S. CONCLUSION The paper utilized the method of delay embedding in order to extend the CSP-algorithm to the state space. S. C. vol. Iversen.” NeuroImage. 2002. [2] D.” Nature. Eng. and M. Eng. vol. Rehab.6 CSP spatial spatial CSSP spectral V. . 63–71.” Nature. and D. 216–219. A. 113. and J.” IEEE Trans Biomed Eng. 8. W.28 Hz in the bandpass selected frequency range. Eng. Hibbs. Clanton. no. 1. pp. pp. 2002. vol. J. 14. F. no. “A spelling device for the paralysed.-R. D. 6786. A. It is worth to mention. T. 63. Laubach. Curio.

7 [14] B. A. 2. J. Müller. 11. 405. 2004. Friedman. no. Dornhege. [26] K. G. 163–74. 993–1002. [17] S.R. 4. Salmelin. Takens. 343–352. pp. Statist. Jousmäki. no.-R. 107. Huttunen. [15] K. 201–8. no. pp.. “Boosting bit rates and error detection for the classiﬁcation of fast-paced motor commands based on single-trial EEG analysis. and G. pp. [27] Z. Smola. 526–532. 1972. pp. Mika. 2.-R.-R. 1842–1857. K. Introduction to Statistical Pattern Recognition. [21] H. Academic Press. 1989. Ilmoniemi. Nov 1999. 1981. [19] G. Wikström. [20] G. Tsuda. C.” IEEE Trans. Rehab. vol. V. Hari. Ramoser. vol. vol. S. Neurophysiol.-R. S. “Linear and non-linear methods for brain-computer interfaces. Neural Sys. and K. Rätsch. E. Losch. vol. no. Clin. 1949. Müller-Gerking. “Evaluation of event-related desynchronization preceding and following voluntary self-paced movement. 127–131. pp. [16] K.” in Dynamical Systems and Turbulence.-R. 294. 110. Blankertz. vol. 12. 165–175. D. “An introduction to kernel-based learning algorithms.” J. [29] J. Psychiatrie Zeitschrift Neurol. vol. 1997. Mika. Müller. no. 84. C. . Rand and L. Berlin: Springer-Verlag. 6. Kunzmann. Eds. Schölkopf. Arabibar. Amer. vol. Schnitzler. Penﬁeld.” IEEE Trans. [22] G. pp. Pfurtscheller and A. 2004. B. 183.” IEEE Trans. Kesäniemi. 2000. [25] G. Eng. “EEG source localization: implementing the spatio-temporal decomposition approach. Biomed. Anderson. and R. Müller. 165–169.-R. and G. 11. G. Müller. Young.. vol. V. R. Leen. vol. B. Eng. 181–201. Fukanaga. Schölkopf. 12. and B. W. Antonova.. 2003. B. J. 366–381.. Nikouline. da Silva. L. 1979. 2001.” Electroenceph. “Boosting bit rates in non-invasive EEG single-trial classiﬁcations by feature combination and multi-class paradigms. “Involvement of primary motor cortex in motor imagery: a neuromagnetic study. Neurophysiol. MIT Press. 46.” IEEE Trans. Müller. Assoc. 16.” in Advances in Neural Information Processing Systems. and K. 51. 11. no. Koles and A. Blankertz. Müller. “Event-related EEG/MEG synchronization and desynchronization: basic principles. E. Rehab. no. and J. Neural Sys.. G. R. Rehab. pp. T. Systems (NIPS 03). Curio. “Increase information transfer rates in BCI by CSP extension to multi-class. Eng. pp. Kohlmorgen. “Regularized discriminant analysis. Salenius. Dornhege.. 138–46. 441–446. Soong. Pfurtscheller and F.” Neuroimage. J. Eds. vol. Curio. vol.” Electroencephalogr. Eng. J. Weston. 2000. H. Linkenkaer-Hansen. C. H. pp.” Neuroscience Letters. 8.” Clin. in press. Rätsch. K. [18] H. Solla.” in Advances in Neural Inf. K. Birch. vol. Schäfer. Jasper and W. and G.” Arch. [24] V.. pp. Neurophysiol. “Optimal spatial ﬁltering of single trial EEG during imagined hand movement. Krepki. Blankertz. pp. “Detecting strange attractors in ﬂuid turbulence. F. “Dynamics of mu-rhythm suppression caused by median nerve stimulation: a magnetoencephalographic study in human subjects. R. 2000.. pp. [23] A. Dornhege. and K. vol. [28] F. “Invariant feature extraction and classiﬁcation in kernel spaces. G. 6. “Electrocorticograms in man: Effect of voluntary movement upon the electrical activity of the precentral gyrus. and K.” IEEE Transactions on Neural Networks. H.. 1998. Proc. Pfurtscheller. 2003. 2. Curio. K. Müller. vol.. S. clin. M. G.

- Microcalcification Classification in Digital Mammogram using Moment based Statistical Texture Feature Extraction and SVMby Integrated Intelligent Research

- An Efficient Wavelet Based Feature Reduction and Classification Technique for the Diagnosis of Dementia
- H Adaptive Filters for Eye Blink Artifact
- Adaptive Filter1
- Microcalcification Classification in Digital Mammogram using Moment based Statistical Texture Feature Extraction and SVM
- InTech-Towards a Realistic and Self Contained Biomechanical Model of the Hand
- T-F Processing on Non Stationary Signals
- 1_1_2
- Information-theoretic Computer Vision for Autonomous Robots
- A temporally constrained ICA (TCICA) technique for artery-vein separation of cerebral microvasculature
- Journal of Chemical Engineering of Japan, Vol. 40, No. 6
- LATTICE BOLTZMANN SIMULATIONS
- 2 - Freesurfer.intro
- sensors-09-06312
- Publication
- Act a 1990
- dsp
- 14WCEE_05-02-0057 v v imp
- ACJCP2
- The Aplication of Throughflow Optimziation to the Design of Radial and Mixed Flow Turbines
- 10.1.1.136
- IPL-S-12-01115-1
- tomography
- Methods of Analysis For Earthquake- Resistant Structures
- 04586595
- Survey of Wavelet Fault Diagnosis
- Gu 3412511258
- Kalories Burning and Bodyweight.
- 302 Assignment 1 COMPLETE
- Tegeraden SystemsAnalysis.S.284 296
- Younkins STC 200601

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd