Motivation

Automatic Feature Selection

Summary

Learning optimal EEG features across time, frequency and space.
Jason Farquhar, Jeremy Hill, Bernhard Schölkopf
Max Planck Institute for Biological Cybernetics, Tübingen Germany

NIPS06 Workshop on Trends in BCI
kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schölkopf Learning optimal EEG features across time, frequency and space.

MPI Tübingen

Motivation

Automatic Feature Selection

Summary

Outline

Motivation Source types in EEG based BCI Automatic Feature Selection Learning Spatial Features Feature selection as Model Selection Spectral/Temporal Filtering

kyb-logo

mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schölkopf Learning optimal EEG features across time, frequency and space.

MPI Tübingen

Motivation

Automatic Feature Selection

Summary

The current approach to learning in BCIs
Feature Extraction Classification

Current BCI use learning in two distinct phases, 1. Feature Extraction – where we attempt to extract features which lead to good classifier performance, 2. Classification – usually a simple linear classifier, (SVM, LDA, Gaussian), because “Once we have good features the classifier doesn’t really matter”
kyb-logo mc-logo

Jason Farquhar, Jeremy Hill, Bernhard Schölkopf Learning optimal EEG features across time, frequency and space.

MPI Tübingen

Motivation

Automatic Feature Selection

Summary

The current approach to learning in BCIs
Feature Extraction Classification

Current BCI use learning in two distinct phases, 1. Feature Extraction – where we attempt to extract features which lead to good classifier performance, using,
prior-knowledge, the 7-30Hz band for ERDs maximising r-scores, maximising ’independence’ (ICA) maximising the ratios of the class variances (CSP)

2. Classification – usually a simple linear classifier, (SVM, LDA, Gaussian), because “Once we have good features the classifier kyb-logo doesn’t really matter”
Jason Farquhar, Jeremy Hill, Bernhard Schölkopf Learning optimal EEG features across time, frequency and space.

mc-logo

MPI Tübingen

Gaussian). MPI Tübingen . because “Once we have good features the classifier doesn’t really matter” kyb-logo mc-logo Jason Farquhar. 1. frequency and space. 2. Classification – usually a simple linear classifier.Motivation Automatic Feature Selection Summary The current approach to learning in BCIs Feature Extraction Classification Current BCI use learning in two distinct phases. Feature Extraction – where we attempt to extract features which lead to good classifier performance. Jeremy Hill. (SVM. LDA. Bernhard Schölkopf Learning optimal EEG features across time.

Question? Why. MPI Tübingen . Jeremy Hill. frequency and space. evidence) available in the unimportant classifier? kyb-logo mc-logo Jason Farquhar. use an objective for the important feature extraction which is a poor predictor of generalisation performance? When we have provably good predictors (margin. Bernhard Schölkopf Learning optimal EEG features across time.Motivation Automatic Feature Selection Summary The current approach to learning in BCIs This seems wrong! Note The objectives used in feature extraction are not good predictors of generalisation performance.

Motivation Automatic Feature Selection Summary The current approach to learning in BCIs This seems wrong! Note The objectives used in feature extraction are not good predictors of generalisation performance. Jeremy Hill. Bernhard Schölkopf Learning optimal EEG features across time. evidence) available in the unimportant classifier? kyb-logo mc-logo Jason Farquhar. use an objective for the important feature extraction which is a poor predictor of generalisation performance? When we have provably good predictors (margin. Question? Why. MPI Tübingen . frequency and space.

evidence) available in the unimportant classifier? kyb-logo mc-logo Jason Farquhar. Question? Why. MPI Tübingen .Motivation Automatic Feature Selection Summary The current approach to learning in BCIs This seems wrong! Note The objectives used in feature extraction are not good predictors of generalisation performance. Bernhard Schölkopf Learning optimal EEG features across time. frequency and space. use an objective for the important feature extraction which is a poor predictor of generalisation performance? When we have provably good predictors (margin. Jeremy Hill.

evidence) available in the unimportant classifier? kyb-logo mc-logo Jason Farquhar. Question? Why.Motivation Automatic Feature Selection Summary The current approach to learning in BCIs This seems wrong! Note The objectives used in feature extraction are not good predictors of generalisation performance. MPI Tübingen . frequency and space. Jeremy Hill. Bernhard Schölkopf Learning optimal EEG features across time. use an objective for the important feature extraction which is a poor predictor of generalisation performance? When we have provably good predictors (margin.

Choose features which optimise the classifier’s objective We show how to learn spatio-spectro-temporal feature extractors for classifying ERDs using the max-margin criterion1 kyb-logo 1 We have also successfully applied this approach to LR and GP classifiers and MRP/P300 temporal signals) mc-logo Jason Farquhar.Motivation Automatic Feature Selection Summary A Better approach 1. frequency and space. MPI Tübingen . Combine the feature extraction and classifier learning 2. Jeremy Hill. Bernhard Schölkopf Learning optimal EEG features across time.

Jeremy Hill. frequency and space. Bernhard Schölkopf Learning optimal EEG features across time. Combine the feature extraction and classifier learning 2. Choose features which optimise the classifier’s objective We show how to learn spatio-spectro-temporal feature extractors for classifying ERDs using the max-margin criterion1 kyb-logo 1 We have also successfully applied this approach to LR and GP classifiers and MRP/P300 temporal signals) mc-logo Jason Farquhar. MPI Tübingen .Motivation Automatic Feature Selection Summary A Better approach 1.

Combine the feature extraction and classifier learning 2. Choose features which optimise the classifier’s objective We show how to learn spatio-spectro-temporal feature extractors for classifying ERDs using the max-margin criterion1 kyb-logo 1 We have also successfully applied this approach to LR and GP classifiers and MRP/P300 temporal signals) mc-logo Jason Farquhar. MPI Tübingen . frequency and space.Motivation Automatic Feature Selection Summary A Better approach 1. Bernhard Schölkopf Learning optimal EEG features across time. Jeremy Hill.

Jeremy Hill. frequency and space. Bernhard Schölkopf Learning optimal EEG features across time. frequency vs. MPI Tübingen . ROC score for each channel allows us to identify where the discriminative information lies kyb-logo mc-logo Jason Farquhar.Motivation Automatic Feature Selection Summary Data-visualisation: the ROC-ogram Raw data is dxT time-series for N trials ROC-ogram : time vs.

frequency and space. and Temporal discriminative features are subject specific kyb-logo mc-logo Jason Farquhar. MPI Tübingen . Jeremy Hill. Spectral. Bernhard Schölkopf Learning optimal EEG features across time.Motivation Automatic Feature Selection Summary Example raw ROC-ogram: (The good) Spatial.

MPI Tübingen . Jeremy Hill. frequency and space. Spectral. and Temporal discriminative features are subject specific kyb-logo mc-logo Jason Farquhar. Bernhard Schölkopf Learning optimal EEG features across time.Motivation Automatic Feature Selection Summary Example raw ROC-ogram: (The bad) Spatial.

Jeremy Hill. Spectral. and Temporal discriminative features are subject specific kyb-logo mc-logo Jason Farquhar. Bernhard Schölkopf Learning optimal EEG features across time. frequency and space. MPI Tübingen .Motivation Automatic Feature Selection Summary Example raw ROC-ogram: (The ugly) Spatial.

Jeremy Hill. frequency and space.Motivation Automatic Feature Selection Summary Spatio-Spectro-Temporal feature selection Would like to automatically perform feature selection: Spatially. Spectrally kyb-logo mc-logo Jason Farquhar. Temporally. MPI Tübingen . Bernhard Schölkopf Learning optimal EEG features across time.

Bernhard Schölkopf Learning optimal EEG features across time.Motivation Automatic Feature Selection Summary Spatio-Spectro-Temporal feature selection Would like to automatically perform feature selection: Spatially. Spectrally kyb-logo mc-logo Jason Farquhar. Jeremy Hill. frequency and space. MPI Tübingen . Temporally.

Bernhard Schölkopf Learning optimal EEG features across time.Motivation Automatic Feature Selection Summary Spatio-Spectro-Temporal feature selection Would like to automatically perform feature selection: Spatially. frequency and space. Jeremy Hill. Temporally. MPI Tübingen . Spectrally kyb-logo mc-logo Jason Farquhar.

Bernhard Schölkopf Learning optimal EEG features across time. Temporally. frequency and space. Jeremy Hill. MPI Tübingen . Spectrally kyb-logo mc-logo Jason Farquhar.Motivation Automatic Feature Selection Summary Spatio-Spectro-Temporal feature selection Would like to automatically perform feature selection: Spatially.

Jeremy Hill. Bernhard Schölkopf Learning optimal EEG features across time.Motivation Automatic Feature Selection Summary Learning Feature Extractors Start by showing how to learning spatial filters with the max-margin criteria. MPI Tübingen . frequency and space. Then extend to learning spatial+spectral+temporal kyb-logo mc-logo Jason Farquhar.

Jeremy Hill. mc-logo MPI Tübingen . frequency and space. Bernhard Schölkopf Learning optimal EEG features across time.Motivation Automatic Feature Selection Summary Spatial Filtering Volume Conduction – electrodes detect superposition of signals from all over the brain X = AS Spatial filtering undoes this superposition to re-focus on discriminative signals y = fs X This is a Blind Source Separation (BSS) problem many algorithms available to solve this problem In BCI commonly use a fast. supervised method called kyb-logo Common Spatial Patterns [Koles 1990] Jason Farquhar.

frequency and space. Bernhard Schölkopf Learning optimal EEG features across time. Jeremy Hill. mc-logo MPI Tübingen . supervised method called kyb-logo Common Spatial Patterns [Koles 1990] Jason Farquhar.Motivation Automatic Feature Selection Summary Spatial Filtering Volume Conduction – electrodes detect superposition of signals from all over the brain X = AS Spatial filtering undoes this superposition to re-focus on discriminative signals y = fs X This is a Blind Source Separation (BSS) problem many algorithms available to solve this problem In BCI commonly use a fast.

Bernhard Schölkopf Learning optimal EEG features across time. supervised method called kyb-logo Common Spatial Patterns [Koles 1990] Jason Farquhar. Jeremy Hill.Motivation Automatic Feature Selection Summary Spatial Filtering Volume Conduction – electrodes detect superposition of signals from all over the brain X = AS Spatial filtering undoes this superposition to re-focus on discriminative signals y = fs X This is a Blind Source Separation (BSS) problem many algorithms available to solve this problem In BCI commonly use a fast. mc-logo MPI Tübingen . frequency and space.

frequency and space. Jeremy Hill. Bernhard Schölkopf Learning optimal EEG features across time. 1 − yi (xi w + b)) for non-linear classification we can simply replace xi with an explicit feature mapping ψ(xi ) this is how we include the feature extraction into the kyb-logo classification objective Jason Farquhar. min λw w + w.b i max(0. mc-logo MPI Tübingen .Motivation Automatic Feature Selection Summary The Max-margin Objective related to an upper bound on generalisation performance the basis for the SVM finds w such that the minimal distance between classes is maximised in the linear case can be expressed primal objective as.

1−yi (ψ(Xi . ψ. Fs ) = λw w+ i max(0. ψ(Xi . Bernhard Schölkopf 3. e. b. Max Margin Jmm (X . Define the feature-space mapping. Xi to spatially filtered log bandpowers.Motivation Automatic Feature Selection Summary Max-margin optimised spatial filters 1. treating ψ’s parameters as additional optimisation variables kyb-logo Note Unconstrained optimisation. . Include the dependence on ψ explicitly into the classifiers objective. solve for w. fs2 . . Fs = [fs1 .] is the set of spatial filters 2. Fs ) = ln(diag(Fs Xi Xi Fs )) where. Linear. Jeremy Hill. from time series. . w. . Optimise this objective. frequency and space. b. Fs ) w+b)) Jason Farquhar. Fs directly using CG mc-logo MPI Tübingen Learning optimal EEG features across time.g.

ψ. e. Fs ) = ln(diag(Fs Xi Xi Fs )) where. treating ψ’s parameters as additional optimisation variables kyb-logo Note Unconstrained optimisation. . Fs directly using CG mc-logo MPI Tübingen Learning optimal EEG features across time. frequency and space. w. . Bernhard Schölkopf 3. Jeremy Hill. fs2 .g. Optimise this objective. from time series. Fs ) = λw w+ i max(0. Max Margin Jmm (X . solve for w.Motivation Automatic Feature Selection Summary Max-margin optimised spatial filters 1. . b. Define the feature-space mapping. Fs ) w+b)) Jason Farquhar. Fs = [fs1 . Linear. ψ(Xi . Include the dependence on ψ explicitly into the classifiers objective. Xi to spatially filtered log bandpowers.] is the set of spatial filters 2. b. . 1−yi (ψ(Xi .

treating ψ’s parameters as additional optimisation variables kyb-logo Note Unconstrained optimisation. . ψ(Xi . e. Define the feature-space mapping.g.] is the set of spatial filters 2. . Linear. w. Include the dependence on ψ explicitly into the classifiers objective. Xi to spatially filtered log bandpowers. Fs = [fs1 . . Fs directly using CG mc-logo MPI Tübingen Learning optimal EEG features across time. solve for w. ψ. b. Jeremy Hill. 1−yi (ψ(Xi . b. . Optimise this objective. Fs ) = ln(diag(Fs Xi Xi Fs )) where. fs2 . frequency and space. Fs ) w+b)) Jason Farquhar. Fs ) = λw w+ i max(0. from time series. Bernhard Schölkopf 3. Max Margin Jmm (X .Motivation Automatic Feature Selection Summary Max-margin optimised spatial filters 1.

ff .just modify the feature-mapping ψ to include them.. the Spatial + Spectral + Temporally filtered band-power is.) ) kyb-logo mc-logo Jason Farquhar. fs . ψ(X . and ft a temporal filter.) = diag(f(...Motivation Automatic Feature Selection Summary Adding Spectral/Temporal filters Very simple to include Spectral/Temporal filtering. Jeremy Hill.. MPI Tübingen . Then. frequency and space.. F is the Fourier transform. ff be a spectral filter.. Let. and D(. Bernhard Schölkopf Learning optimal EEG features across time. ft ) = F −1 (F(fs X Dt )Df )(F −1 (F(fs X Dt )Df )) = fs F(X Dt )Df2 F(X Dt ) fs /T where.. ..

) = diag(f(. and D(.just modify the feature-mapping ψ to include them.. and ft a temporal filter.. ψ(X ... the Spatial + Spectral + Temporally filtered band-power is.Motivation Automatic Feature Selection Summary Adding Spectral/Temporal filters Very simple to include Spectral/Temporal filtering.. ff . ff be a spectral filter. fs . Let.. Then. MPI Tübingen . F is the Fourier transform. ...) ) kyb-logo mc-logo Jason Farquhar. Bernhard Schölkopf Learning optimal EEG features across time. ft ) = F −1 (F(fs X Dt )Df )(F −1 (F(fs X Dt )Df )) = fs F(X Dt )Df2 F(X Dt ) fs /T where. frequency and space. Jeremy Hill.

F is the Fourier transform. Jeremy Hill. the Spatial + Spectral + Temporally filtered band-power is. ψ(X . and D(. Bernhard Schölkopf Learning optimal EEG features across time. frequency and space.) ) kyb-logo mc-logo Jason Farquhar.just modify the feature-mapping ψ to include them. fs . and ft a temporal filter.. ff be a spectral filter...... . MPI Tübingen . ff .. Let. Then. ft ) = F −1 (F(fs X Dt )Df )(F −1 (F(fs X Dt )Df )) = fs F(X Dt )Df2 F(X Dt ) fs /T where.) = diag(f(.Motivation Automatic Feature Selection Summary Adding Spectral/Temporal filters Very simple to include Spectral/Temporal filtering..

Fs . Ft )) w + b)) +λs Tr(Fs Rs Fs ) + λf Tr(Ff Rf Ff ) + λt Tr(Ft Rt Ft ) where. 1 − yi (ln(ψ(Xi . Ff . Jeremy Hill. mc-logo MPI Tübingen . spatial filters tend to be over the motor regions temporal and spectral filters should be smooth Include this prior knowledge with quadratic regularisation on the filters. Fs . Ft are unconstrained so may overfit We have prior knowledge about the filters shape.g. Ff .) is a positive definite matrix encoding the prior kyb-logo knowledge Jason Farquhar. R(. Jmm = λw w + i max(0. frequency and space. e. Bernhard Schölkopf Learning optimal EEG features across time.Motivation Automatic Feature Selection Summary Filter regularisation The filters.

ff . Spectral optimisation. 1. results in a “stiff” problem and very slow convergence Further. CSP or prior knowledge.f Dt2 Xs.t Xf . fs ) = fs Xf . where ψt (X .f 4. frequency and space. e.t 3. ff ) = Xs. ψs (X . Spatial optimisation. mc-logo MPI Tübingen . ft ) = Xs. ft ) requires a costly FFT Coordinate descent on the filter types solves both these problems.Motivation Automatic Feature Selection Summary Implementation issues Optimising Jmm for all the filters directly.t Df2 Xs. Temporal optimisation.g. Jeremy Hill. where. Bernhard Schölkopf Learning optimal EEG features across time.t fs 2. kyb-logo Jason Farquhar. where ψ(X . Repeat until convergence Non-convex problem – seed with good solution found by another method. evaluating ψ(X . fs .

Motivation Automatic Feature Selection Summary Example – Optimisation trajectory for CompIII. >. frequency and space. kyb-logo mc-logo MPI Tübingen .Vc Noisy CSP seed Finds: motor region.5s temporal band Finds foot region Iteration 0: 45% Error Jason Farquhar. Bernhard Schölkopf Learning optimal EEG features across time. Jeremy Hill. 15Hz band.

15Hz band. Jeremy Hill.Vc Noisy CSP seed Finds: motor region.5s temporal band Finds foot region Iteration 1: 16% Error Jason Farquhar. >. Bernhard Schölkopf Learning optimal EEG features across time. frequency and space.Motivation Automatic Feature Selection Summary Example – Optimisation trajectory for CompIII. kyb-logo mc-logo MPI Tübingen .

Bernhard Schölkopf Learning optimal EEG features across time.5s temporal band Finds foot region Iteration 2: 3.Vc Noisy CSP seed Finds: motor region. 15Hz band. Jeremy Hill.5% Error Jason Farquhar. kyb-logo mc-logo MPI Tübingen . frequency and space. >.Motivation Automatic Feature Selection Summary Example – Optimisation trajectory for CompIII.

kyb-logo mc-logo MPI Tübingen .5s temporal band Finds foot region Iteration 3: 3. frequency and space. Bernhard Schölkopf Learning optimal EEG features across time. 15Hz band.Motivation Automatic Feature Selection Summary Example – Optimisation trajectory for CompIII.Vc Noisy CSP seed Finds: motor region.5% Error Jason Farquhar. Jeremy Hill. >.

15Hz band. Jeremy Hill.6% Error Jason Farquhar. Bernhard Schölkopf Learning optimal EEG features across time.5s temporal band Finds foot region Iteration 4: 2. frequency and space. kyb-logo mc-logo MPI Tübingen . >.Vc Noisy CSP seed Finds: motor region.Motivation Automatic Feature Selection Summary Example – Optimisation trajectory for CompIII.

kyb-logo mc-logo MPI Tübingen .Motivation Automatic Feature Selection Summary Example – Optimisation trajectory for CompIII. frequency and space.Vc Noisy CSP seed Finds: motor region. >.6% Error Jason Farquhar. Bernhard Schölkopf Learning optimal EEG features across time. 15Hz band. Jeremy Hill.5s temporal band Finds foot region Iteration 5: 2.

6% Error Jason Farquhar. 15Hz band.5s temporal band Finds foot region Iteration 6: 2.Vc Noisy CSP seed Finds: motor region. Jeremy Hill. >. kyb-logo mc-logo MPI Tübingen .Motivation Automatic Feature Selection Summary Example – Optimisation trajectory for CompIII. frequency and space. Bernhard Schölkopf Learning optimal EEG features across time.

MPI Tübingen . CSP solution used as the spatial filter seed. Comp 3:IVa. Jeremy Hill.5–45Hz Baseline performance is from CSP with 2 filters computed on the signal filtered to 7-27Hz. pre-processed by band-pass filtering to . flat seeds used for spectral and temporal filters kyb-logo mc-logo Jason Farquhar.IVc) and 6 from an internal MPI dataset.Motivation Automatic Feature Selection Summary Experimental analysis We show binary classification error from 15 imagined movement subjects: 9 from BCI competitions (Comp 2:IIa. frequency and space. Bernhard Schölkopf Learning optimal EEG features across time.

particularly for low numbers of data-points (when overfitting is an issue) kyb-logo Huge improvement in a few cases Jason Farquhar. mc-logo MPI Tübingen .Motivation Automatic Feature Selection Summary Results – Spatial Optimization 100 Training Points 200 Training Points Spatial General Improvement in performance. Bernhard Schölkopf Learning optimal EEG features across time. frequency and space. Jeremy Hill.

for the subjects helped before kyb-logo mc-logo MPI Tübingen . 200 Training Points Further improvement.Motivation Automatic Feature Selection Summary Results – Spatial + Spectral Optimization 100 Training Points Spatial + Spectral large benefit in a few cases Jason Farquhar. frequency and space. Jeremy Hill. Bernhard Schölkopf Learning optimal EEG features across time.

frequency and space.Motivation Automatic Feature Selection Summary Results – Spatial + Spectral + Temporal Optimization 100 Training Points Spatial + Spectral + Temporal slight decrease for others Jason Farquhar. 200 Training Points Further improvements for some subjects kyb-logo mc-logo MPI Tübingen . Jeremy Hill. Bernhard Schölkopf Learning optimal EEG features across time.

Better priors – particularly for the spatial filters. P300. mc-logo MPI Tübingen . found by cross-subject learning? Other feature/signal types – wavelets. MRPs. LR and Gaussian Process objectives implementated already. etc. Jeremy Hill.Motivation Automatic Feature Selection Summary Summary EEG BCI performance depends mainly on learning subject-specific feature extractors These can be learnt by direct optimisation of the classification objective (Max-margin) Results show significant improvement over independent feature-extractor/classifier learning (better in 12/15 cases) Future work Alternative objective functions – SVM. frequency and space. kyb-logo On-line feature learning Jason Farquhar. Bernhard Schölkopf Learning optimal EEG features across time.

frequency and space.Motivation Automatic Feature Selection Summary Results – Temporal Signal Extraction 100 Training Points 200 Training Points learn a rank-1. approximation to the full svm weight vector this regularisation significantly improves classification performance kyb-logo mc-logo and produces readily interpertable results. i. Jason Farquhar. MPI Tübingen . Bernhard Schölkopf Learning optimal EEG features across time. 1-spatial + 1-temporal..e. Jeremy Hill.

approximation to the full svm weight vector this regularisation significantly improves classification performance kyb-logo mc-logo and produces readily interpertable results.Motivation Automatic Feature Selection Summary Results – Temporal Signal Extraction 100 Training Points 200 Training Points learn a rank-1. frequency and space.e. Jeremy Hill. Bernhard Schölkopf Learning optimal EEG features across time. 1-spatial + 1-temporal.. MPI Tübingen . Jason Farquhar. i.

frequency and space. Jeremy Hill. Bernhard Schölkopf Learning optimal EEG features across time. spatially – differential filter between foot and motor regions kyb-logo mc-logo temporally? MPI Tübingen .Motivation Automatic Feature Selection Summary Results – Example solutions spatially – differential filter between left/right motor regions temporally? Jason Farquhar.

Sign up to vote on this title
UsefulNot useful