This action might not be possible to undo. Are you sure you want to continue?

Klaus-Robert Müller, Benjamin Blankertz, Gabriel Curio et al.

BBCI team:

Gabriel Curio Florian Losch Volker Kunzmann Frederike Holefeld Vadim Nikulin@Charite Benjamin Blankertz Michael Tangermann Claudia Sannelli Carmen Vidaurre Bastian Venthur Siamac Fazli Martijn Schreuder Matthias Treder Stefan Haufe Thorsten Dickhaus Frank Meinecke Paul von Bünau Marton Danozcy Felix Biessmann Klaus-Robert Müller@TUB

Matthias Krauledat Guido Dornhege Roman Krepki@industry

Andreas Ziehe Florin Popescu Christian Grozea Steven Lemm Motoaki Kawanabe Guido Nolte@FIRST Yakob Badower@Pico Imaging Marton Danozcy

Collaboration with: U Tübingen, Bremen, Albany, TU Graz, EPFL, Daimler, Siemens, MES, MPIs, U Tokyo, TIT, RIKEN, Bernstein Focus Neurotechnology, Bernstein Center for Computational Neuroscience Berlin, picoimaging, Columbia, CUNY Funding by: EU, BMBF and DFG

Noninvasive Brain-Computer Interface

DECODING

‚Brain Pong‘ with BBCI .

Computer Interface Signal Processing EEG Acquisition Application Interface FES Device Grasp-Pattern 3 channel Stimulation [From Birbaumer et al.] BBCI: Leitmotiv: ›let the machines learn‹ .] [From Pfurtscheller et al.Noninvasive BCI: clinical applications Brain.

EEG based noninvasive BCI .

2006] .The cerebral cocktail party problem • use ICA/NGCA projections for artifact and noise removal • feature extraction and selection [cf. Ziehe et al. Blanchard et al. 2000.

instruction: imagine . .kicking a ball.healthy subjects (BCI untrained) perform "imaginary” movements (ERD/ERS) . .BBCI paradigms Leitmotiv: ›let the machines learn‹ .squezzing a ball.feel touch .

Towards imaginations: Modulation of Brain Rhythms Single channel IMAGINATION of left arm .

Averaging Single channel .Variance I: Single-trial vs.

Variance II: Trial to trial variability .

Variance III: inter subject variability [l vs r] .

BCI with machine learning: training .

healthy subjects untrained for BCI A: training 20min: right/left hand imagined movements → infer the respective brain acivities (ML & SP) B: online feedback session .BBCI paradigms Leitmotiv: ›let the machines learn‹ .

Playing with BCI: training session (20 min) .

Machine learning approach to BCI: infer prototypical pattern Inference by CSP Algorithm .

BCI with machine learning: feedback .

Lecture Blankertz here .

2001. 2004. 2007. Blankertz et al. 2006. Dornhege et al. 2007. Müller et al. 2008] . 2007. 2008. 2003.BBCI Set-up Artifact removal [cf. 2005.

Blankertz et al. 2001. 2006] .What can Machine Learning tell us about physiology? 1 1 [cf.

ML for knowledge discovery .

Results: Exploring the limits of untrained users .

BCI: 1st session performance for novices .

Spelling with BBCI: a communication for the disabled I .

Spelling with BBCI: a communication for the disabled II .

2007] . 2005. Sugiyama et al. Sugiyama & Müller 2005.Variance IV: covariate shift: from training to feedback Need for adaptation ! [cf. Shenoy et al.

submitted] . Krauledat et al.Neurophysiological analysis [cf.

Sugiyama & Müller 2005. 2007] .Weighted Linear Regression for covariate shift compensation . Sugiyama et al. choosing yields unbiased estimator even under convariate shift [cf.

control not accurate enough to control applications Screening Study (N=80): Cat I: good calibration (cb). good feedback (fb) Cat II: good cb. no good fb Cat III: no good cb design a predictor ! . i.e.BCI Illiteracy Percentage of ~20% of naïve users: BCI accuracy does not reach level criterion..

k) = k1 + k2 / xλ (estimated noise) (2 peaks) g2 = g2 (x. λ. μ2 . C4 under rest cond.SMR-Predictor Calculate the power spectral density (PSD) in three Laplacian channels C3. Cz. σ2) Proposed predictor: Average height of the larger peak . k) = k3 φ(x . μ1 . σ1) + k4 φ (x . Model each resulting curve by g = g1 + g2. μ. σ. with g1 = g1 (x.

55 Pearson R = 0.51 As much as R2 = 30% (calibration) and R2 = 26% (feedback) of variability in BCI accuracy can be explained by the SMR predictor in our study sample! [cf. Kübler. Müller et al.SMR-Predictor (Results) Pearson R = 0. 2009] . Blankertz.

C4 and Cz. automated laps. • Each trial. unsupervised CSP Runs 1 3 fixed Laplace • Direct feedback -> Unspecific LDA classifier. in preparation] . Vidaurre. Müller et al. • Laplacian channels C3. • Compute CSP and sel. perform adaptation of the cls.Approach to „Cure“ BCI Illiteracy Runs 4 6 supervised CSP + sel.Lap. • Features: log band power (alpha and beta). Laps. Blankertz. Runs 7 8 • Compute CSP from runs 4-6. selection. • Update the bias of the classifier. • Fixed CSP filters. [cf. from runs 1-3. • Perform unsupervised adaptation of pooled mean. • Each trial retrain the classifier.

Results (Grand Averages) .

III Runs 1 and 2 ! Runs 7 and 8 [cf. Blankertz. Vidaurre.Example: one subject of Cat. Müller et al. 2009] .

Müller et al 2009] .Real Man Machine Interaction [Tangermann.

Harvest from BCI-gaming Limits of BCI-reaction time? Research platform: BCI gaming + Machine Learning + Single Trial Analysis (MSM) Mental states JUST BEFORE success or failure? Dynamics: limits of temporal control precision? .

Before-after Future issues: sensors Popescu et al 2007 .

Future Issues: Shifting distributions within experiment .

wireless EEG FOR INFORMATION SEE: www. BCI experiment • 5-8 letters/min mental typewriter on CeBit 06. Brain2Robot@Medica 07. Calibration < 20min. lNdW 09 • Machine Learning and modern data analysis is of central importance for BCI et al • Applications: Rehabilitation: TOBI EU IP Computational Neuroscience: Bernstein Centers Berlin Man Machine Interaction: using BCI as a measuring device: brain@work • BBCI Sensors. nonstationarity. data analysis <<10min.Conclusion • BBCI: non-invasive with high Information transfer rates for the Untrained • BBCI: Untrained. non-cooperative behavior • ‚illiterates‘. software: IDA spinoffs • towards no training.de .bbci.

.

Bremen. Siemens. Bernstein Focus Neurotechnology. CUNY Funding by: EU. Albany. Columbia. MES. EPFL.BBCI team: Gabriel Curio Florian Losch Volker Kunzmann Frederike Holefeld Vadim Nikulin@Charite Benjamin Blankertz Michael Tangermann Claudia Sannelli Carmen Vidaurre Bastian Venthur Siamac Fazli Martijn Schreuder Matthias Treder Stefan Haufe Thorsten Dickhaus Frank Meinecke Paul von Bünau Marton Danozcy Felix Biessmann Klaus-Robert Müller@TUB Matthias Krauledat Guido Dornhege Roman Krepki@industry Andreas Ziehe Florin Popescu Christian Grozea Steven Lemm Motoaki Kawanabe Guido Nolte@FIRST Yakob Badower@Pico Imaging Marton Danozcy Collaboration with: U Tübingen. Bernstein Center for Computational Neuroscience Berlin. Daimler. RIKEN. U Tokyo. TU Graz. BMBF and DFG . picoimaging. MPIs. TIT.

bbci.BCI Competitions For BCI IV Competition see www.de .

FOR INFORMATION SEE: www.bbci.de

Machine Learning open source software initiative: MLOSS see www.jmlr.org

**Machine Learning and Signal Processing Tools for Brain-Computer Interfacing
**

Benjamin Blankertz1,2 Klaus-Robert Müller1

1 Machine Learning Laboratory, Berlin Institute of Technology 2 Fraunhofer FIRST (IDA)

blanker@cs.tu-berlin.de

08-Jul-2009

Topic of This Section

Method:

classication of

spatio-temporal features;

the estimation bias

shrinkage of the sample covariance matrix to counterbalance

Application:

classication of single-trial ERPs in an attention-based speller

Some Neurophysiological Background An infrequent stimulus in a series of standard stimuli evokes a P300 component at central scalp position if attended: Cz P3 Cz N1 The presentation of a visual stimulus elicits a Visual Evoked Potential (VEP) in visual cortex if focused: P1 Oz N1 Oz .

.Experimental Design Classic Matrix Speller Attention-based Hex-o-Spell See Poster W07 (Treder et al. covert attention and a comparison of those two speller designs.) for a investigation of overt vs.

Experimental Design Classic Matrix Speller Attention-based Hex-o-Spell See Poster W07 (Treder et al. covert attention and a comparison of those two speller designs.) for a investigation of overt vs. .

Single-subject ERPs in Hex-o-Spell Data set for illustration of classication methods: AF3 AF4 F7 Fz FT7 FC3 FC4 F8 FT8 C5 Cz C6 CP3 TP7 Pz CP4 TP8 + 5 µV 200 ms PO7 Oz PO8 Target Nontarget .

Topographies of ERP Components There are several ERP components that can be used to determine the attended symbol: Cz (−) 5 [µV] 0 −5 −100 0 100 200 ms 300 400 500 5 Target Nontarget PO7+PO8 (−−) Target 0 −5 vP1 115−135 ms N1 135−155 ms vN1 155−195 ms P250 205−235 ms vN2 285−325 ms P400 335−395 ms N500 495−535 ms 5 0 −5 Nontarget .

The result is displayed as scalp map: 50 45 40 35 30 25 20 15 error [%] .Classication of Temporal Features As a rst step: classication on raw time courses (115535 ms) in single channels.

6 23.3 23.4 23.6 24.8 24.7 24.2 dev FC3 FCz FC4 C3 Cz C4 CP3 CPz CP4 Pz 27 27.4 FC3 FCz FC4 C3 Cz C4 CP3 CPz CP4 Pz 23.5 23.8 26 [s] 26.9 [s] std FC3 FCz FC4 C3 Cz C4 CP3 CPz CP4 Pz 25.4 std .2 [s] 27.5 24.7 [s] std FC3 FCz FC4 C3 Cz C4 CP3 CPz CP4 Pz 24.2 [s] 22.Extraction of Spatial Features dev FC3 FCz FC4 C3 Cz C4 CP3 CPz CP4 Pz 22 22.

2 [s] 27.6 23.6 24.2 [s] 22.4 23.7 [s] std FC3 FCz FC4 C3 Cz C4 CP3 CPz CP4 Pz 24.3 23.9 [s] std FC3 FCz FC4 C3 Cz C4 CP3 CPz CP4 Pz 25.5 23.5 24.8 26 [s] 26.4 FC3 FCz FC4 C3 Cz C4 CP3 CPz CP4 Pz 23.4 std .2 dev FC3 FCz FC4 C3 Cz C4 CP3 CPz CP4 Pz 27 27.8 24.7 24.Extraction of Spatial Features dev FC3 FCz FC4 C3 Cz C4 CP3 CPz CP4 Pz 22 22.

1 0.The r2 -Matrix of Dierences The temporal and spatial structure of the dierence between ERPs of dierent conditions can be investigated by the signed r2 -matrix: 0.15 0.05 Pz −0.05 0 115 − 135 ms 135 − 155 ms 155 − 195 ms 205 − 235 ms 285 − 325 ms 335 − 395 ms 495 − 535 ms √ r(x) := N1 · N2 mean{xi | yi = 1} − mean{xi | yi = 2} N1 + N2 std{xi } .15 Fz channels FCz Cz CPz −0.1 Oz −100 0 100 200 [ms] 300 400 500 −0.

Spatial Features #01: std #02: std #03: std #04: std #05: dev #06: std #07: std #08: dev #09: std #10: std #11: std #12: dev #13: std #14: std #15: std #16: std #17: dev #18: std #19: std #20: std #21: std #22: std #23: dev #24: dev .

02 −0.Linear Classier as Spatial Filter A linear classier that was trained on spatial features can also be regarded as a spatial lter. Let w be the LDA weight vector and X ∈ R#chans×#time points be continuous EEG signals.04 0.06 0.1 0.02 0 −0.08 −0.08 0.06 −0. 0.04 −0. Then Xf := w X ∈ R1×#time points is the result of spatial ltering: each channel of X is weighted with the corresponding component of w and summed up.1 The weight vector of the classier can be display as scalp map: .

Classication Results for Spatial Features 35 30 error [%] 25 20 15 10 115−135 ms 135−155 ms 155−195 ms 205−235 ms 285−325 ms 335−395 ms 495−535 ms 5 0 −5 vP1 N1 vN1 P250 vN2 P400 N500 .

1 22.2 22.Extraction of Spatio-Temporal Features dev FC3 FCz FC4 C3 Cz C4 CP3 CPz CP4 Pz 22 22.5 22.4 22.6 .3 [s] 22.

Spatio-Temporal Features Spatio-temporal features are typically high-dimensional (here 59 EEG channels × 7 time intervals = 413 dimensional features): 15 Fz space [channels] FCz Cz 0 CPz −5 Pz −10 Oz 125 145 175 220 305 365 mean of time interval [ms] 515 −15 10 5 [µV] .

.Classication Result for Spatio-Temporal Features 35 30 25 20 15 10 115−135 ms 135−155 ms 155−195 ms 205−235 ms 285−325 ms 335−395 ms 495−535 ms 5 0 −5 error [%] Combined vP1 N1 vN1 P250 vN2 P400 N500 Although information was added. classication on the concatenated feature becomes worse: overtting.

. There is a systematical bias: ˆ Large Eigenvalues of Σ are too large But. if the number of samples n is not large relative to the ˆ Small Eigenvalues of Σ are too small This aects..g.Bias in Estimating Covariances Let x1 . e. For classication µ and Σ have to be estimated from the data: 1 µ= n ˆ ˆ Σ= 1 n k=1 xk n k=1 (xk n−1 − µ)(xk − µ) ˆ ˆ dimension d. . the estimation is error-prone. . Σ). . classication with LDA: ˆ Normal vector of LDA: w = Σ−1 (µ1 − µ2 ). . xn ∈ Rd be n vectors drawn from a d-dimensional Gaussian distribution N (µ.

5 1 0.5 −2 −1 0 1 2 0 50 100 150 index of Eigenvector 200 .5 0 −0.Bias in Estimating Covariances (2) 250 true N=500 N=200 N=100 N= 50 200 1.5 −1 true cov empirical cov Eigenvalue 150 100 50 0 −1.

ˆ In LDA the empirical covariance matrix Σ is replaced by ˜ ˆ Σ(γ) = (1 − γ)Σ + γνI for a γ ∈ [0. 1] and ν dened as average Eigenvalue trace(Si )/d.A Remedy for Classication A simple way that can partly x the bias is shrinkage: the empirical covariance matrix is modied to be more spherical. .

A Remedy for Classication A simple way that can partly x the bias is shrinkage: the empirical covariance matrix is modied to be more spherical. 1] and ν dened as average Eigenvalue trace(Si )/d. γ = 1 assumes spherical covariance matrices. γ = 0 yields LDA without shrinkage. ˆ In LDA the empirical covariance matrix Σ is replaced by ˜ ˆ Σ(γ) = (1 − γ)Σ + γνI for a γ ∈ [0. . From ˜ Σ = (1 − γ)VDV + γνI = V ((1 − γ)D + γνI) V we see that ˜ ˆ Σ(γ) and Σ have the same Eigenvectors (columns of V) extreme Eigenvalues (large/small) are shrunk/extended towards the average ν . ˆ Since Σ is positive semi-denite we can have an Eigenvalue ˆ decomposition Σ = VDV with orthonormal V and diagonal D.

Modelselection LDA with shrinkage of the empirical covariance matrix has one free parameter (γ )... An easy (and also time-consuming) way is model-selection based on cross-validation. that needs to be selected. Numerous strategies with dierent properties exist. . . There is no general way to do it. also called hyperparameter.g. e. empirical Bayes shrinkage estimator MDL: Minimum Description Length Model-selection based on cross-validation.

8 regularization parameter γ 1 .2 0.4 0.6 0.6 0. 250 25 25 500 25 2000 cross−validation error [%] cross−validation error [%] 15 15 cross−validation error [%] 0 0. 500.4 0. 2000) for dierent values of the regularization parameter γ (x-axis).2 0.8 regularization parameter γ 1 20 20 20 15 10 10 10 5 0 0.4 0. Features vectors have 250 dimensions.6 0.Regularized LDA at Work Cross-validation results for dierent sizes of training data (250.8 regularization parameter γ 1 5 5 0 0.2 0.

Investigating the Impact of Shrinkage ˆ LDA: w = Σ−1 (µ1 − µ2 ). shrinkage: ˜ ˆ Σ(γ) = (1 − γ)Σ + γνI γ=1 w ∼ µ1 − µ2 .

Investigating the Impact of Shrinkage ˆ LDA: w = Σ−1 (µ1 − µ2 ). γ=0 ˆ w ∼ Σ−1 (µ1 − µ2 ) shrinkage: ˜ ˆ Σ(γ) = (1 − γ)Σ + γνI γ=1 w ∼ µ1 − µ2 .

γ=0 ˆ w ∼ Σ−1 (µ1 − µ2 ) shrinkage: ˜ ˆ Σ(γ) = (1 − γ)Σ + γνI γ=1 w ∼ µ1 − µ2 accounting for spatial structure of the noise .Investigating the Impact of Shrinkage ˆ LDA: w = Σ−1 (µ1 − µ2 ).

ERP and Noise Simple assumption for ERPs: single trial xk (t) is composed of an ERP s(t) and Gaussian `noise' nk (t): xk (t) = s(t) + nk (t) 10 8 for all trials k = 1. . . . K mean: phase−locked − ERP cov: non−phase−locked − noise potential at CPz [µV] 6 4 2 0 −2 −4 −6 −8 −10 −5 0 5 potential at FCz [µV] 10 . .

Spatial Structure of the Noise The two strongest principal components of the noise (covariance matrix) in this data set: Trial-to-trial variation of P3 Visual alpha .

2 0.Understanding Spatial Filters (a) with little disturbance 10 8 potential at CPz [µV] 6 4 2 0 −2 −4 −6 −8 −10 −5 0 5 potential at FCz [µV] 10 (b) FCz 0.4 CPz Oz .

(b): 37% error When disturbing channel Oz is added to the data (3D): 16% error. Here. channel Oz is required for good classication although itself is not discriminative. .Understanding Spatial Filters (a) with little disturbance 10 8 (b) with disturbance from Oz 10 8 potential at CPz [µV] 6 4 2 0 −2 −4 −6 potential at CPz [µV] 6 4 2 0 −2 −4 −6 −8 −10 −5 0 5 potential at FCz [µV] 10 −8 −10 −5 0 5 potential at FCz [µV] 10 Two channel classication of (a): 15% error.

.Impact of Shrinkage on the Spatial Filters With increasing shrinkage.005 γ = 0.5 γ=1 Maps of spatial lters for dierent values of γ . but classication may degrade with too much shrinkage. 40 35 error [%] 30 25 20 15 γ=0 γ = 0.001 γ = 0. the spatial lters (classier) look smoother.05 γ = 0.

A Novel Analytical Method Recently. a method to analytically calculate the optimal shrinkage parameter was published ([1]). . Thanks to Nicole Krämer for pointing the BBCI group to this method.

xn ∈ Rd be n feature vectors and let µ = ˆ be the empirical mean. . .Optimal Selection of Shrinkage Parameter Let x1 . . . . 1 n n k=1 xk Aim: get a better estimate of the true covariance matrix Σ (especially in case n < d) than the sample covariance matrix 1 ˆ ˆ ˆ Σ = n−1 n (xk − µ)(xk − µ) by selecting a γ in k=1 ˜ ˆ Σ(γ) := (1 − γ)Σ + γνI.

. xn ∈ Rd be n feature vectors and let µ = ˆ be the empirical mean. . (ˆ)i the i-th element of the vector xk µ resp. µ.j=1 vark (zij (k)) 2 2 i=j sij + i (sii − ν) . .Optimal Selection of Shrinkage Parameter Let x1 . 1 n n k=1 xk Aim: get a better estimate of the true covariance matrix Σ (especially in case n < d) than the sample covariance matrix 1 ˆ ˆ ˆ Σ = n−1 n (xk − µ)(xk − µ) by selecting a γ in k=1 ˜ ˆ Σ(γ) := (1 − γ)Σ + γνI. We denote by (xk )i resp. We dene zij (k) = ((xk )i − (ˆ)i ) ((xk )j − (ˆ)j ) µ µ Then the optimal shrinkage parameter γ for which ˜ Σ(γ ) = argminS ||S − Σ||2 can be analytically calculated ([2]) as F n γ = (n − 1)2 d i. Furthermore we denote by sij the element in the i-th row ˆ ˆ and j -th column of Σ. .

.Result of Classication with Shrinkage 35 30 LDA error [%] 25 20 15 10 115−135 ms 135−155 ms 155−195 ms 205−235 ms 285−325 ms 335−395 ms 495−535 ms 5 0 −5 with shrinkage 5% Combined vP1 N1 vN1 P250 vN2 P400 N500 Using shrinkage the classication error could be drastically reduced to 4%.

33%.Results for the Classic Matrix Speller 100 bae sab sac sad sae saf sag sah sai saj sak daj sal mean 1 2 3 4 5 6 7 8 intensification [#] 9 10 80 accuracy [%] 60 40 20 0 Accuracy in letter selection. chance level: 3. .

Results for the Classic Matrix Speller 100 bae sab sac sad sae saf sag sah sai saj sak daj sal median 1 2 3 4 5 6 7 8 intensification [#] 9 10 80 accuracy [%] 60 40 20 0 See Poster W07 (Treder et al). . W10 (Höhne et al). and for other applications of this technique A02 (Thurlings et al).

Summary of Spatio-Temporal Classication Linear classication with shrinkage is a powerful method. In contrast to non-linear classiers. In this case the classier is the dierence of the ERPs. The appropriateness of a linear separation depends on the way features are extracted and transformed. the weights of a linear classier are informative. The weights of the trained classier can be visualized as a sequence of scalp topographies: [−250 ms] [−150 ms] [−50 ms] [50 ms] [150 ms] [250 ms] 0. Complete shrinkage (γ = 1) means neglecting the structure of the noise.02 0 −0.02 .

In particular.Topic of This Section Method: Classication of spectral features. 4]). Application: Classication of motor imagery conditions in a BCI paradigm. . Common Spatial Pattern (CSP) analysis to classify dierent conditions that are characterized by a modulation of the amplitude of brain rhythms ([3. namely modulations of the amplitude in specic frequency bands.

Neurophysiology: Sensorimotor Rhythms Spectrum over left hand area right C3 Cz left hand at rest Spectral Power [dB] α C4 hand hand β ~ 1/f γ 10 20 30 40 50 left hemisphere right hemisphere Frequency [Hz] .

7].Neurophysiology: Sensorimotor Rhythms Spectrum over left hand area right C3 Cz left C4 imagined hand movement Spectral Power [dB] α ~ 1/f β γ 10 20 30 40 50 hand hand left hemisphere right hemisphere Frequency [Hz] Imagining a movement of a limb causes a local blocking of the corresponding sensorimotor rhythm (SMR). see [5. 6. .

The grand average over 80 participants is displayed as topographic mapping: 6.5 [dB] 5 4.5 4 3.5 Conclusion Locations C3 and C4 are good candidates to observe SMR modulations. .Average Topography of Idle SMR For each Laplace ltered channel in a relax recording. These cover the sensorimotor areas of the right and the left hand. the strength of the local rhythm was estimated.5 6 5.

3333 0.1667 0 .5 0.Spatial Smearing Raw EEG scalp potentials are known to be associated with a large spatial scale owing to volumne conduction. 1 0.8333 0.6667 0. In a simulation of Nunez et al [8] only half the contribution to one scalp electrode comes from sources within a 3 cm radius.

The Need for Spatial Filtering raw: CP4 bipolar: FC4−CP4 25 20 15 10 5 10 15 20 [Hz] Laplace: CP4 15 10 [dB] [dB] 5 0 0 5 10 15 20 [Hz] 25 30 5 10 15 20 [Hz] 25 30 10 25 30 5 10 15 20 [Hz] CSP left right 0.2 25 30 25 20 [dB] 15 10 5 10 15 20 [Hz] 25 30 CAR: CP4 30 25 20 15 [dB] [dB] r2 .4 0 0.6 5 0.

[5 40] Hz [−15 25] dB + 10 dB 10 Hz C3 lar sgn r 0. .2 2 F3 lar Fz lar F4 lar left right C1 lar Cz lar C2 lar C4 lar CP3 lar CP1 lar CP2 lar CP4 lar P5 lar P3 lar Pz lar P4 lar P6 lar 10 20 30 40 First step: determine a suitable frequency band that shows good discrimination between the conditions. N= 66/63.Analysis of Motor Imagery Conditions: Spectra VPkg_08_08_07/imag_arrowVPkg: left / right.2 0 −0.

2 F3 lar Fz lar F4 lar left right C1 lar Cz lar C2 lar C4 lar CP3 lar CP1 lar CP2 lar CP4 lar P5 lar P3 lar Pz lar P4 lar P6 lar Second step: determine a suitable time interval during which discrimination is most prominent.2 0.1 0 −0.1 −0.ERD Curves of Motor Imagery VPkg_08_08_07/imag_arrowVPkg: left / right. [−500 6000] ms [−2 1] µV + 1 µV 2000 ms C3 lar sgn r2 0. N= 66/63. Remark: Simultaneous selection of frequency band and interval is more appropriate. .

2.. unknown sources min variance for right no class− specific influence on variance min variance for left Goal: observed signals projection discriminative signals csp:R1.. ..Common Spatial Pattern (CSP) Analysis Find spatial lters that optimally capture modulations of brain rhythms Observation: power of a brain rhythm ∼ variance of band-pass ltered signal..2... filter V−1 forward model EEG V backward model csp:L1.

CSP Analysis The goal of CSP: right csp:R left right csp:L 2425 2430 2435 [s] CSP analysis yields spatial lters that can be visualized: CSP filter ’left’ CSP filter ’right’ .

band-pass ltered (here 913 Hz): left right right left data right data C3 XL XR (T×chans) (T×chans) ΣL := XL XL ΣR := XR XR 728 730 732 734 736 738 [s] C4 V ΣL V = D & V (ΣL + ΣR )V = I 1) choose eigenvector vi from V having a large eigenvalue di w. ΣL .CSP More Practical EEG-signals during motor imagery.D]= eig(Sigma1.t. left right right var(XL vi ) = di large var(XR vi ) = 1−di small CSP 728 730 732 734 736 738 [s] In Matlab: [V.r. . Sigma1+Sigma2).

left right right var(XL vi ) = di small var(XR vi ) = 1−di large CSP 728 730 732 734 736 738 [s] In Matlab: [V. Sigma1+Sigma2). ΣL . .CSP More Practical EEG-signals during motor imagery.D]= eig(Sigma1.t.r. band-pass ltered (here 913 Hz): left right right left data right data C3 XL XR (T×chans) (T×chans) ΣL := XL XL ΣR := XR XR 728 730 732 734 736 738 [s] C4 V ΣL V = D & V (ΣL + ΣR )V = I 2) choose eigenvector vi from V having a small eigenvalue di w.

**Training CSP-based Classication
**

To obtain features from the CSP ltered EEG, in each channel and trial, the variance across time is calculated and the logarithm is applied. This is a scatter plot of the resulting CSP features:

1 left right

0 log(var(CSPR))

−1

−2

−3 −3 −2 −1 log(var(CSPL)) 0 1

Here, only two dimensions are shown. Note, that applying the logarithm to the band power features makes the distribution more Gaussian and therefore enhances linear separability.

**Training CSP-based Classication
**

Determine most discriminative frequency band, band-pass lter EEG in that band, extract single trials using an appropriate time interval, calculate and select CSP lters, and apply them to EEG single trials, calculate the log variance within trials. This results in a low dimensional feature vector for each trial (dimensionality equals number of selected CSP lters). Train a linear classier like LDA on the features. (Since these features are low-dimensional, shrinkage is typically not necessary.)

**Summary: Training CSP-based Classication
**

C4 lap

C4 lap

12

0.4 0.3

1 left right

10

left right

0.2 0.1 [µV] 0 −0.1 −0.2 left right

log(var(CSPR)) 0

8 [dB]

−1

6

4

−0.3 −0.4

−2

(1)

2 5 10 15 20 25 [Hz] 30 35 40 45

(2)

csp3 [0.60]

0

1000

2000 [ms]

3000

4000

5000

(4)

−3 −3 −2 −1 log(var(CSPL)) 0 1

csp1 [0.68]

csp2 [0.64]

csp4 [0.71]

csp5 [0.68]

csp6 [0.61]

−28.2

−17.0

39.3

15.5

(3)

Remark: .e.g. last 500 ms). and apply the classier weighting. For more theoretical considerations as well as practical hints see [3]. calculate the variance in short windows (e. One nice feature of CSP is that the length of the classication window can be changed at runtime (i. during feedback).Applying CSP-based Classication Project EEG with spatial CSP lters and apply band-pass lter. take the logarithm.

it happens that they are violated. Unfortunately. Although these principles are quite obvious. The estimation is only proper if the test set was not used in any way to determine parameters of the method.Section: Caveats in Validation When machine learning techniques are used for classication of EEG single-trials. . and if the samples in the test set are independent from the samples in the training set. the expected performance of a method has to be evaluated carefully. and there are several possible pitfalls. even some published journal articles lack a proper validation of the proposed methods. The estimation of generalization performance requires a training and a test set.

.Hall of Pitfalls in Single-Trial EEG Analysis preprocessing methods that use statistics of the whole data set like ICA. including trials that are later in the test set select parameters by cross validation on the whole data set and report the performance for the selected values artifacts/outliers are rejected from the whole data set (resulting in a simplied test set) unsucient validation for paradigms with block design In this presentation we highlight the last issue. or normalization of features (particularly severe for methods that use label information) features are selected on the whole data set.

if the periods for which there is no alternation between conditions are longer than the intended change of states in online operation. if the performance is estimated for such a data set by cross validation. We say that an experiment has a block design.Block Design Assume the task is to discriminate between mental states in dierent conditions. block #1 block #2 block #3 block #4 block #5 block #6 block #7 cond #1 cond #2 cond #1 cond #2 cond #1 cond #2 cond #1 A problem arises. .

For an ordinary cross validation in a block design data set. therefore the single-trials are not independent. the requirement of independence between training and test set is violated. .Slowly Changing Variables Training set Test set In EEG there are many slowly changing variables of background activity.

. Taking an arbitrary EEG data set.A Validation Test To demonstrate impact of block design in cross validation. we perform cross validation in the following setting. we assign fake labels (regardless of what happened during the recording) like this: nBlocksPerClass=1: class #1 class #2 nBlocksPerClass=2: class #1 class #2 class #1 class #2 nBlocksPerClass=3: class #1 class #2 class #1 class #2 class #1 class #2 and so on.

This procedure was performed for 80 EEG data sets.Results of the Validation Test From each block single-trials are extracted of length 1s. Blue boxplots show the results of cross-validation: 70 60 50 test error [%] 40 30 20 10 0 1 2 3 4 number of blocks per class 5 10 .

Results of the Validation Test From each block single-trials are extracted of length 1s. leave-one-block-out validation are . results for shown in green. This procedure was performed for 80 EEG data sets. Blue boxplots show the results of cross-validation: 70 60 50 test error [%] 40 30 20 10 0 1 2 3 4 number of blocks per class 5 10 For comparison.

Further Comments and Summary The severeness of the underestimation of the true error depends on the complexity of the features and the classier. Cross validation in block design data might also give the correct result but alternative evaluation is required. The situation gets worse if trials are extracted from overlapping segments. Leave-one-block-out and leave-one-run-out have larger standard errors than cross validation. The most realistic validation is to train the methods on the rst N − 1 runs and to evaluate on the last run. .

Lopes da Silva. Event-related Dynamics of Brain Oscillations. volume conduction. Brunner. 2005.org/10.4408441. Clin Neurophysiol. B. Mu rhythm (de)synchronization and EEG single-trial classication of dierent motor imagery tasks. for Robust EEG Single-Trial Analysis. Neuper and W. M. R. Schlögl. Müller-Gerking. Schäfer and K. L. A. 2000. R. Laplacians. Event-related EEG/MEG synchronization and desynchronization: basic principles. NeuroImage. Ledoit and M. 103(5): 499515. S. and interpretation at multiple scales.1109/MSP. 2008. A well-conditioned estimator for large-dimensional covariance matrices.-R. Nunez. Tomioka.References O. Blankertz. F. R. and G. L. C. 2006. Wijesinghe. 1997. P. Strimmer. B. R. Tucker. 4(1). 31(1): 153159. Silberstein. M. Pfurtscheller and F. 2006. 25(1): 4156. 110(11): 18421857. and K. Elsevier. EEG coherency I: statistics. Pfurtscheller. Electroencephalogr Clin Neurophysiol. Statistical Applications in Genetics and Molecular Biology. J. Wolf. Pfurtscheller. G. A. 1999. 8(4): 441446. G. Optimal spatial ltering of single trial EEG during imagined hand movement.doi. IEEE Signal Proc Magazine. A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics.2008. Westdorp. 88(2): 365411. . J. Lemm. and P. and F. URL H. Optimizing Spatial Filters http://dx. S. eds. C. Srinivasan. J Multivar Anal. 2004. J. Klimesch. da Silva. Kawanabe. Cadusch. cortical imaging. Ramoser.. IEEE Trans Rehab Eng. reference electrode. D. Müller.

- A Theory of Thalamocortex
- Bayesian Reasoning in Science
- A Tutorial on Spectral Clustering_2006
- Noise and Neuronal Heterogeneity
- Computer Manual in Pattern Classification
- Curvature in the Calculus
- Single-Cell Recording From the Brain of Freely Moving Monkeys
- An Essay on the Human Corticospinal Tract_History, Development, Anatomy, And Connections
- Bayesian Filtering
- Statistical Structure of Spike Trains
- orientation selectivity
- The Mirror-Neuron System
- Brain Computation
- Analysis of Variance (Marden, 2003)
- Bayesian Methods for ML
- Graphical Models, Exponential Families, And Variational Inference
- Modeling Rhythms_from Physiology to Function_SfN09
- Spectral Analysis for Neural Signals
- Mathematics of the Neural Response
- nrn2009
- Statistical Learning Theory- A Primer
- Book_Graphical Models, Exponential Families, And Variational Inference

Sign up to vote on this title

UsefulNot useful- Friedlander Weiss 98
- 40-80-1-SM
- Weighted Linear Ranking
- Machine Learning [Matzinger].pdf
- 12.Estimation - Correlations
- Space-time subspace techniques
- TransitionMatrixCalculations.xls
- facerecog.rtf
- Ieee
- EE453 08fa Mt1 Solns
- Fundamentals of Matrix Computations 3rd Edition
- ch05-linkanalysis1
- Visualizing Data Using Multidimensional Scaling
- Monee a 199812
- 7076 Fnl
- R tips
- Sorensen
- A Blind Source Separation Technique Using Second-Order Statistics
- Normal Mode Analysis
- EigenLab-102
- A Cointegation Demo Fetrics Chp4
- Gradient Newton Galerkin Algorithm
- TAMUDFEQ22CR
- Sam at Rices
- Min-Max Theorem
- MIT14_451F09_pset3
- One-factor Model for Cross-correlation Matrix in the Vietnamese Stock Market
- 253 Supp
- DG CG MG Paper
- If Em
- Tutorial_Machine Learning and Signal Processing Tools for BCI

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd