You are on page 1of 69

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/267084852

Single Channel Hybrid EOG/EEG-based Brain-Computer Interface

Thesis · September 2014


DOI: 10.13140/2.1.3108.0960

CITATIONS READS
0 639

1 author:

Andersen M. S. Ang
Université de Mons
21 PUBLICATIONS 76 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Log-determinant constrained Non-negative Matrix Factorization View project

All content following this page was uploaded by Andersen M. S. Ang on 20 October 2014.

The user has requested enhancement of the downloaded file.


THE UNIVERSITY OF HONG KONG
Faculty of Engineering
Department of Electrical and Electronic Engineering

Single Channel Hybrid EOG/EEG-


based Brain-Computer Interface

A dissertation submitted by
Ang Man Shun

In fulfillment of the requirements of


ELEC3818 Senior Design Project

towards the degree of


Bachelor of Engineering (Electronics and Communication Engineering)

Submitted : 14th April, 2014


Abstract

The term human-computer interface (HCI) is generally defined as the communication and
interaction between man and machine. Normal HCI devices such as keyboard and mouse are
relatively inaccessible and not user-friendly to those locked-in patients who have lost their
neuromuscular capabilities. To help such patients, researchers are developing more and more
assistive HCI technologies. One potential candidate of assistive HCI technology to solve such
patient-machine communication problem is the Brain-Computer Interface (BCI).

In normal BCI system, eye-blinking is considered as a kind of noise that is subjected to


removal. But at the same time, several papers have demonstrated that the use of eye-blink
generated signal can also be used as a kind of information carrier to carry the conscious
thought of locked-in patients to act as a basis for man-computer communication.

The purpose of this project is to investigate the possibility of implementing a robust


electrooculography (EOG)-based HCI system using only one channel to perform simple
computer manipulations. A prototype EOG-based HCI system is designed and built and
several signal processing algorithms and techniques will be studied and compared to assess
the feasibility of the system.

Acknowledgement

Firstly I would like to thank Prof. Y.S. Hung for giving clear direction and helps.

Then I would like to thanks Dr. Z.G. Zhang for his relentless support and informative advices
throughout this project.

I would also like to thank for Dr. Joseph Mak for lending the NeuroSky® Headset and giving
technical support.

Last but not least I would like to express my heartfelt gratitude to Prof. Ricky Y.K. Kwok for
being my non-academic advisor in giving me support in terms of spiritual inspiration in all the 3
years of my undergraduate studies in the Department of Electrical and Electronic Engineering.

This project is sponsored by the HKU 81 Inclusion Fund.

2
List of symbols
AWGN Additive white gaussian noise
BCI Brain computer interface
DC Direct current
DWT Discrete wavelet transform
EEG Electroencephalography
ECG Electrocardiography
EMG Electromyography
EOG Electrooculography
FFT Fast Fourier transform
HCI Human computer interface
ITR Information transfer rate
MCD Minimum covariance determinant
ROI Region of interest
RSS Residue sum of squares
SNR Signal to noise ratio
SVM Support vector machine
WT Wavelet Transform
WPT Wavelet Packet Transform

List of tables
Table 1.1 Comments on the signals after primal analysis.
Table 1.1 The location of orbicularis oculi muscle on the head.
Figure 1.2 The EEG measurement with 5 electrodes.
2
Table 2.1 The value R in different situation.
Table 3.1 Some NeuroSky® Communication Protocol.
Table 4.1 The table of R2 for different raw EOG signals
Table 4.2 The table of R2 for different band pass filtered EOG
signals Table 4.3 The 30 features.
Table 4.4 Offline SVM performance
Table 4.5 Online performances of the activity detector and outlier
detector Table 4.6 Online SVM performance

3
List of figures
Figure1.1 The location of orbicularis oculi muscle in the head.
Figure1.2 Model of retina-cornea voltage source.
Figure1.3 The EEG measurement with 5 electrodes
Figure1.4 The types of EEGs
Figure1.5a(Left) Superposition of 60 looking-up EOG demonstrating some variability.
Figure1.5b(Right) Superposition of some double blink EOG demonstrating intrapersonal
moment to moment variability.
Figure 1.6 The positions of the electrodes in the EOG
system Figure 1.7 The overview of the final system
Figure 2.1. Band pass filtering with different lower cutoff
frequency Figure 2.2. Noisy ECG with a trend.
Figure 2.3 Wavelet compression of blink EOG
Figure 2.4. Signal de-noising via Donoho Visu Thresholding using 5-level db4
wavelet. Figure 2.5 The templates
Figure 2.6 The plots of templates of blink EOG and looking-down EOG.
Figure 2.7. The decomposition structure of Wavelet Transform and Wavelet Packet
Transform in a 3-level set up
Figure 2.8 Two linear separating hyperplanes
Figure 2.9 Two SVMs. The left one is the absolute hard SVM, the right one is the soft SVM.
Figure 2.10 Under-fitting and fit.
Figure 2.11 The covariance ellipsoids.
Figure 3.1 Block diagram of the overall system.
Figure 3.2 The data logging process.
Figure 3.3 The flow chart of the data logger.
Figure 3.4 The flow of the offline system for template analysis.
Figure 3.5 Offline data segmentation.
Figure 3.6 The whole online system
Figure 3.7 The buffer xB at different t
Figure 3.8 The concept of outlier detection
2
Figure 4.1. The two noisy EOG Raw signal with the least R value with their de-trended
version. Figure 4.2a The blink EOG : superposition, coherent average and empirical noises
Figure 4.2b The looking down EOG : superposition, coherent average and empirical noises
Figure 4.2c The looking up EOG : superposition, coherent average and empirical noises
Figure 4.3 The filtered double blink EOG.
Figure 4.4 The EOG of double blinks.
Figure 4.5 The SNR values of 3 types of EOGs.
Figure 4.6 The superposition of 3 type of normalized EOG signals.
Figure 4.7 Power spectral density estimate of the 3 type of EOGs using Welch’s method.
Figure 4.8 The skewness of three type of EOGs.

4
Figure 4.9 The energies ( square of L2-norm ) for the three type of EOG.
Figure 4.10a The RSS values of all blink signal to all the looking down signal.
Figure 4.10b The RSS values of all blink signal to all the looking up signal.
Figure 4.11 The plots of three type of EOGs.
Figure 4.12 The Mahalanobis distance between signals
Figure 4.13 The “updated” analysis of the time domain.
Figure 4.14 The overlap of peak values.
Figure 4. 15 The looking up EOG signal number 74.
Figure 4.16 The Peak-Energy feature space of the signals.
Figure 4.17a,b The Normalizations using min-max and standardization.
Figure 4.17c The Normalizations using order of magnitude.
Figure 4.18 Single Blink signal number 68 with saccade component.
Figure. 4.19. The true labels and test labels generated using K-means.
Figure 4.20 The computational time of two method.
Figure 4.21 The computational time of two method.
Figure 4.22 The RSSM of looking down and looking up.

5
Contents
Abstract............................................................................................................2
Acknowledgement.........................................................................................2
List of symbols...............................................................................................3
List of tables...................................................................................................3
List of figures.................................................................................................4
Content............................................................................................................6
1.Background.................................................................................................8
1.1. The head : the eye and the brain......................................................8
1.2. The Eye blink and EOG.....................................................................8
1.3. Brain and EEG.................................................................................... 9
1.4. Feasibility study.................................................................................10
1.4.1. System feasibility............................................................................10
1.4.2. System usability..............................................................................11
1.4.3. Previous people’s works................................................................12
1.4.4 This work in this project................................................................12
1.5. System specification...........................................................................12
1.5.1 Goal and specification....................................................................12
1.5.2 Limitations and requirements......................................................13
1.6 Report Organization..........................................................................13

2.Theories on EOG processing................................................................14


2.1 Introduction.........................................................................................14
2.2 Band pass filtering and de-trending..............................................14
2.3 Coherent Averaging............................................................................15
2.4 Wavelet Transform.............................................................................16
2.5. Shrinkage De-noising.........................................................................21
2.6. Wavelet Packet Transform..............................................................24
2.7. Signal Detection..................................................................................26
2.8 Support Vector Machine...................................................................26
2.9 Outlier detection by convex hull.....................................................29

3. Implementation Details..........................................................................30
3.1 Introduction.......................................................................................... 30
®
3.2 Hardware : NeuroSky sensors......................................................30
3.3 Wireless connection and data logging...........................................30
3.4 Obtaining EOG template................................................................. 32
3.5 Performing Template Analysis........................................................34
3.6 Constructing the online system : an overview ...........................34
3.7 Implementing the adaptive activity detector...............................35
3.8 Implementing the adaptive outlier detector................................36
3.9 The adaptive database and user adaptivity................................38
3.10 Obtaining EEG data........................................................................38
3.11 The application layer......................................................................38

6
4. Analysis on data, features and results..............................................39
4.1 Introduction...........................................................................................39
4.2 Analysis of goodness of fit of de-trending.......................................39
4.3 Analysis of the templates and detectability....................................40
4.4 Analysis of the SNR..............................................................................42
Discriminability analysis
4.5 On time domain and frequency domain.........................................43
4.6 On features.............................................................................................45
4.7 On blink and directional EOGs using residual sum of squares 46
4.8 On blink and directional EOGs using Mahalanobis distances...48
4.9 Offline SVM performance...................................................................53
4.10 Online SVM performance................................................................. 54
4.11Offline discriminability test using K-means in wavelet domain 55
4.12 Discriminability on wavelet domain with maximum sparsity. 56
4.13 Analysis on computational speed....................................................56
4.14 Possible extensions and unsolved problems..................................57
4.15 Conclusion.............................................................................................59

References.......................................................................................................60
Appendix . The plots...................................................................................62

7
1. Background
This section presents the general background of the project. It begins with the background
information on the relevant signals, followed by preliminary feasibility studies and literature
review. After that the major aims of the system as well as the system specification and run
time system requirement will be mentioned. This section ends with report organization, which
briefly introduce other sections of the whole report.

1.1. The head : the eye and the brain


This section presents the background for the signals of interest in this project, including the
sources and types of signals.

The eyes and the brain, the two most universally recognized organs that are of paramount
significance in the nervous system are located inside the head. The brain is the source of the
EEG (Electroencephalography), while the eyes, together with the orbicularis oculi muscle, are
the major source of the EOG (Electrooculography). Figure 1.1. illustrate the location of
orbicularis oculi muscle in the head.

Figure 1.1 The location of orbicularis oculi muscle in the head.


(Image source : Encyclopedia Britannica)

1.2. The Eye blink and EOG


EOG is created in 2 ways: the first one is the movement of the cornea-retina electric dipole
when the eye looks at different directions, and the second one is eye blinks.

In the first mechanism, the eye can be modeled as a fixed dipole with positive pole at the
cornea and the negative pole at the retina. [1](chapter 28). Figure 1.2 illustrates such a model.

8
Figure 1.2 Model of retina-cornea voltage source. Adapted from [23].

The second mechanism is about the EMG generated from eye blink. Eye blink is the closing-
opening of the eyelid, and it is the orbicularis oculi muscle in the face that closes the eyelid.
Eye blink can be classified into 2 major types: voluntary and involuntary. Voluntary eye blinks
are conscious controls from the brain, while involuntary eye blink are controlled by two
organs: the first one is the globus pallidus inside the brain [2], which is responsible for
controlling the involuntary normal automatic eye blinks; the second organ is the spinal cord,
which is responsible for the reflex blinking in reflex action.

The inverse of the time interval between successive blinks, that is, the frequency of the blinking, is
called the blink rate. The average male and female, in the resting state, relax condition, has an
approximate blink rate of 0.2 per seconds (12.55 to 17 blinks per minute). The average blink
duration is about 100~400 milliseconds [3,4]. Blink rate and blink duration is highly affected by
cognitive processes and age, and also effected by emotion, fatigue, diseases and drug [3,5]. Since
these numbers are highly affected by many factors, thus these numbers are less important in this
study, but this data can provide certain basic information for system construction like how to
determine the signal interval. Section 3 will discuss the details.

1.3. Brain and EEG


EEG can be classified as two major types, invasive and non-invasive. Non-invasive EEG is the
electrical activities that are recorded using electrodes attached on the surface of the scalp.
Because of the complexity of the brain, the EEG measured is actually a mixture of many
signals located in different locations in the brain. Therefore in the research of EEG, source
localization and source separation are very hot topics. But in this study, since the system only
has one electrode, source localization and separation is theoretically impossible. Figure 1.3
illustrates a typical non-invasive EEG measurement with multiple sensors.

Figure 1.3 EEG measurement with 5 electrodes.

9
EEGs are classified based on frequency band of the signal. Figure 1.4 shows 4 type of EEG
rhythms.

Figure 1.4 The types of EEGs. Image adapted from [1](chapter 13).
Although EEG is not the major concern of this study, but eye-related alpha wave will be
used. As mentioned in the last part of section 3, instead of computing the EEG in the
computer, the EEG will be obtained from the chip installed in the sensor.

1.4 Feasibility study


This section discusses the preliminary feasibility studies on building a system based on signal-
collection-and-classification paradigm for the EOG from system performance and usability
points of view. A literature review on some existing systems will follow.

1.4.1. System feasibility


The following summarized the differences between EEG and EOG.
EOG EEG
Detectability (by human observation) High Almost impossible
Detectability Relatively high Relatively very low
Signal-to-noise-ratio (SNR) Relatively high Relatively very low
Intrapersonal moment-to-moment variability Relatively low Can be very high
Interpersonal variability Can be high Can be very high
Table 1.1 Comments on the signals after primal analysis

10
Although EOG signals have relatively lower variability over time, this does not imply no
variability. In the early stage of the study, template analysis was carried out. Template
analysis is important for obtaining these basic information. The details of template analysis
will be discussed in section 4. Figures 1.4a and 1.4b are two plots of the superpositions of
multiple signals of two kinds of EOG, the ‘double-blinks’ and ‘looking up’.

Figure 1.5a (Left) Superposition of 60 looking-up EOG demonstrating some variability.


Figure 1.5b (Right) Superposition of some double blink EOG demonstrating intrapersonal
moment-to-moment variability.

The template analysis tells us that, for example, the moment-to-moment variability of double blink
EOG makes it inappropriate to use template matching method like matched filtering for double
blink EOG detection: since the template obtained by coherent averaging assumes that the peaks
are aligned, which means the signals are time synchronized. This important assumption already
fails for the double blink EOG because of the variability of the peak time of the second blink. Thus,
the template obtained by averaging multiple double blink EOGs is non-representative, and hence
degrades the quality of the matched filter making the matched filtering output less accurate.

1.4.2. System usability


From the human-computer interaction point of view, system usability is determined by degree
of ease to use. This is an critical issue since the system has to be used for a prolonged period of
time. Thus the system should be not difficult to use, especially for locked-in patients. Hence,
this system is developed using NeuroSky® hardware with single dry electrode. The merit of
such a system is that the set up is relatively very easy to do and very user-friendly. On the
other hand, the setting up of wet electrodes or multi-electrodes can be very troublesome and
not user-friendly. However, a drawback of using single electrode system is that single channel
system collect less information than the multi-channel counterparts. Thus this project strives
to investigate the possibilities of extending the power of single channel system.

11
Another usability issue is that some training cannot be avoided before the user really needs to master
how to ‘control’ the system. The usability will be increased once the user is familiar with the control
method via some training. And therefore, this system is “user-dependent”. Since the system is “user-
dependent”, training is a must for handling the issues of “adaptability to different individual”.
Interpersonal variabilities are unavoidable, but the system should be adaptable to that by training.
Furthermore, there are certain adaptive methods implemented in the system with an aim to increase the
degree of user-adaptability. The details of these issues will be discussed in section 3.

The final issue on usability is about the user condition such as fatigue. This project assumes that eye
movements and thoughts will not impose too much burden on a normal user (i.e., user with normal eye
blink condition). Therefore the fatigue issue will not be studied in this project. For user with eye muscle
disease, fatigue will be unavoidable and the system will not work in optimal condition.

1.4.3. Previous people’s work


There are already some EEG/EOG-based system implemented ([6],[7],[16],[17]). Most of these
EOG-based systems apply 4 to 5 electrodes (and 1 reference electrode). Figure 1.5 illustrate
the positions of the electrodes in these systems. The reference electrode is not shown here.

Figure 1.6 The positions of the electrodes in the EOG system

1.4.4. The work in this project


Single channel system is user friendly but less powerful, while multi-electrode system is powerful
but less user-friendly. In considering the trade-off, usability is ranked as the first priority at the
expense of sacrificing the power of the system. Thus this project aims to establish an hybrid
EEG/EOG BCI system using only single electrode, and try to investigate the possibility to retain
the ability of multi-channels system, and if possible, to construct a prototype for such a system.
As will be discussed in the final part of section 4, the answer to the above problem is positive. That
is, it is possible to construct a single channel based system that has the ability to sense multiple
types of different EOG signals and classify them in an online situation. But because of time limit,
such a system has not been fully realized, and instead, a less powerful system was build.

1.5. System specification


1.5.1 Goal and specification
The major aim of the system is to develop a single channel EOG system with acceptable speed
and high accuracy. The second aim is try to retain the abilities of the multi-channel
counterpart as far as possible such as having more types of recognizable input patterns.

The communication method will mainly rely on the detection of the eye related signals such as the
EOG (blinks, looking in different directions), and the EEG signal such as alpha wave (eye open
and eye close). If the signal detected is classified correctly, the user can use it to do certain task.

12
The following figure illustrate the proposed system

Figure 1.7 The overview of the final system

Thus, the system is an asynchronous, user-dependent BCI, and no cue is used.

The system is developed on a PC with Bluetooth support, running in Windows environment


with MATLAB 2013b and together with the NeuroSky Headset.

1.5.2 Limitations and requirements


The system intends to provide alternative pathway for locked-in patients and healthy people
to communicate with the machine. It is not intended to fully replace the mouse and keyboard
for normal healthy people. It cannot be used when the conditions listed below are not satisfied.

Equipment requirements End user’s condition requirement


A PC with Bluetooth, in Windows Mentally healthy
The NeuroSky Headset, with batteries With proper eye condition, no eye-disease
The EEG/EOG system software Emotionally relaxed
Table 1.2 The run system requirements

1.6 Report Organization


This report has 4(+1) sections. Section 1 is the introduction, which introduces the general
background of the whole system.
Section 2 deals with the theoretical backgrounds of the algorithms and methods used or
investigated in the system, with explanations on why certain method is better than others.
Chapter 3 discusses the implementation details of the system, including data collection,
wireless connection and system implementation. For simplicity, only certain key components
of the system are mentioned.
Chapter 4 is about the data analysis, discussion, future extensions, followed by a brief
discussion on some unsolved problems worthy to be mentioned and finally the conclusion.
The report has a few appendices, including references, MATLAB code for the system software
and proofs of certain important theorems.

13
2. Theories on EOG processing
2.1 Introduction
This section presents main theories used in this study on EOG processing. The section started
with band pass filtering, coherence averaging, followed by wavelets, detection and support
vector machine, and finally the convex hull.

2.2 Band pass filtering and de-trending


In signal processing and machine learning, the first thing to do is the determination of Region
of interest (ROI). For one-dimensional (1D) waveform, frequency range is the ROI. Signal
that is outside the ROI is regarded as noises. Thus intuitively, the first thing to do is filtering
in frequency domain. Traditional frequency filtering technique like bandpass filtering
removes components which are regarded as noises, Such filters work well when there is no or
low spectrum overlapping between signal and noise. Such conventional method is not
appropriate for EEG/EOG de-noising because of the following reasons :
1. The ROI is very broad band that overlap significantly with noises.
2. Low-pass filtering removes high frequency components of the EOG which can be useful.

But bandpass filtering is still used as the first step in template formation for template analysis.
After experiments, the ROI was found to be 0.5 to 30Hz : the alpha wave EEG is about 6-13Hz,
while the EOGs lies below 100Hz. Low pass filter at 30Hz removes high frequency noises, such as
AWGN generated by environment and devices. Meanwhile the high pass filter at 0.5Hz remove DC
components and perform de-trending , which is very important as shown in Figure 2.1.

Figure 2.1. Bandpass filtering with different lower cutoff frequency. The 0.001Hz high
pass filtered signal (in green) has significant DC component, while the 0.5Hz high pass
filtered signal (in blue) has no DC component.

The Butterworth filter is selected. It has relatively more linear phase response than Chebyshev
and Elliptic filter, such advantage of having less phase distortion overweights its weakness in
slow roll-off. To achieve higher roll-off, a more complicated 5 th order Butterworth filter is
used. Computational complexity is less important in offline template formation stage.

Another reason why filtering is necessary is de-trending. De-trending prevents information


blurred by background trend, which is a critical issue in ECG. The following figure illustrates
the difficulties in finding the QRS-complex in a noisy ECG.

14
Figure 2.2. Noisy ECG with a trend.

Unlike ECG, trend may contain important information. To quantifying the importance of
trend, coefficient of determination R2 can be used :
2
xi xˆi
R2 1 i
xi x2
i

where xˆ xˆ1, xˆ2 ,..., xˆn is the de-trend estimate of


original signal x x1, x2 ,..., xn .

The goodness of fit value R2 ranged from 0 to 1 :

Tend is not important Trend is important


Distance x , xˆ small large
ii

R2 Close to 1 Close to 0
Table 2.1 The value R2 in different situation

Experimental analysis shows that trend is not important in blink EOGs, but can be
important in directional EOGs. Details will be discussed in section 4.

2.3 Coherent Averaging


After bandpass filtering and de-trending, signals still have noises because of spectrum
overlapping. Coherent averaging, a.k.a. time synchronous averaging can be used to increase
SNR. For a set of aligned (time synchronized) signals xi, the average is

1
xavg N
x i

N
i1

It can be proved that, SNR of average, denoted as SNRavg is improved by a factor of N over
the SNR of the individual signal.

15
SNRavg N SNR

Figure below shows coherent average (the template) of the looking-up EOG, as well as the
superposition of multiple blink signals and the empirical noises which is defined as

Empirical noise = Empirical signal - template

Figure 2.3. Superposition of 111 looking-up EOG, coherent average, and superposition of
empirical noises.

The same plots for other signals are listed in the appendix. And the detailed analysis of the
template obtained will be discussed in section 4.

2.4. Wavelet Transform


To further improve SNR by de-noising, wavelet shrinkage de-noising technique can be used. Idea
is that, a signal can be represented by a linear combination of a set of linearly independent basis

f cn n
n

where n is basis and coefficients cn can be obtained by operation

16
cn f,n

After such expansion, we can zero out coefficients that are relatively less important. The set of
remaining coefficients after shrinkage, denoted as R is used to reconstruct the signal approximate.

ˆ
f cn n f
n R

Thus in this way, a less noisy signal is obtained.

The famous Fourier basis is less suitable in this study since Fourier operator is a global
operator that can not detect local transients which are important. Wavelet, being able to
identify local features of a signal, works well for transients. In contrast to its counterpart
Short-time Fourier transform (STFT) that chops the signal into segments, WT applies a set of
analyzing functions that allow the time and frequency resolutions to varies. ([8],[9]). The
Continuous Wavelet Transform of signal x(t) is defined as

W a,b 1 t b
x(t) * a dt
a

where * denotes complex conjugate of wavelet function (t). W(a,b) are wavelet coefficients with
dilation factor a and shifting factor b of wavelet (t).

Different kinds of continuous wavelets exist, but for computers, the dyadic wavelet is used
where a = 2 b = 1.

m,n t 2 m /2 2 m t n

The (m,n) parameters are the new dilation and shifting factors respectively, the smaller m ( the
more negative m is ), the finner scale the wavelets are. Notice that such wavelet basis, also
called mother wavelets, is orthonormal :

i, p t j ,q t dt ij pq

where ij is the Kronecker delta.

Such orthonormal dyadic discrete wavelets are associated with scaling functions , a.k.a. father
wavelets :

17
m,n t 2 m /2 2 m t n

Scaling function is orthonormal to itself and orthogonal to its shifting

i, p t i, p t dt 1 i, p t i,q t dt 0

The convolution of signal with wavelets produce detail coefficient D at (m,n)

Dm,n x(t) m,n t dt

The convolution of signal with scaling function produce approximation coefficient A at (m,n)

Am,n x(t) m,n t dt

The signal approximation at scale m0 is generated by summing the scaling functions weighted
by approximation coefficients

x (t) A t
m0 m0 ,n m0 ,n
n

x(t)

when m0→- , the scaling functions ‘squeeze’ into a very small dimension thus approximation
in this case approaches the true x(t) :

lim xm (t) x(t)


0
m0

On the other hand, the signal detail d at scale m0 is defined as

d (t) D t
m0 m0 ,n m0 ,n
n

In this sense, on the basis of macroscopic approximation up to certain scale m0 , we can increase the

18
accuracy of such approximation by adding more “details” at smaller scales

Theorem The signal x(t) can be decomposed as an infinite sum


m
x(t) x (t)
m0 dm (t)0
m
Apporximation
All the details at scale
at sacle m0
smaller than m0
Or, in expanded form,
m0
x(t)A tD t
m0 ,n m0 ,n m,n m,n
n mn

For details of the above theorem, see [10] (p165-166).

This theorem is the theoretical basis of wavelet shrinkage , that is

m0

x(t) xm0 (t) lim dm (t)


N m N

Or , written in function approximation form,

m0

x(t) xm0 (t) dm (t)


mR

That is, using a finite number of atoms, approximate can be constructed. The
equation listed above is also called multi-resolution theorem. Consider

19
m0
x(t) xm (t) 0
d (t)
m
Apporximation
m
at sacle m0 All the details at scale
smaller than m0

Expand the terms


m0 1
x(t) xm0 (t) dm0 (t) dm (t)
m
Since m0 is dummy variable, we can consider m0-1 in the summation sign as m0 , then in this
way, we have just showed that

xm0 1(t) xm0 (t) dm0 (t)

This is saying that adding details to approximation at certain scale m0 , the resolution will be
increased by a scale of 1. And this equation is the multi-resolution equation.

Therefore , in summary,

Wavelet function m ,n t 2 m /2 2mt n


Scaling function m ,n t 2 m /2 2 m t n

Detail coefficien t D x(t ) t dt


m ,n m ,n
Approximat ion
The DWT
A x(t ) t dt
coefficien t m ,n m ,n

System m0
x(t )A t D t
Reconstruc tion m0 ,n m0 ,n m ,n m ,n
n mn
x(t )A t D
m0
t
Approximat ion m0 ,n m0 ,n m ,n m ,n
n R(n) m R(m)n R

Note that multi-resolution theorem is the basis of fast algorithm. FastDWT has already been
implemented by researchers, and thus DWT is fast.

20
2.5. Shrinkage De-noising
This section discuss about coefficient selection in shrinkage process, such process implicitly
assume that most of the energy of signal is contained in some coefficients. And most of the
remaining coefficients are noisy, should be removed.

Method 1. Energy Thresholding and the sparsity nature of wavelet


The problem becomes how to identify which coefficient is noisy. The first method is to use energy,
the percentage of energy can be used as an indicator of coefficients importance (for both Dm,n ,
Am,n ).

Recall that scaling functions and wavelet functions has unit energy and thus forming an
orthonormal basis. Such property or restriction of wavelet basis has one key advantage, the
Parseval Theorem can be applied. Total energy of signal x(t) can be determined using sum of
squares of coefficients :

2 m0
Energy x(t) 2 dt A D 2
m0 ,n m,n
n mn

A parameter p can be defined as part of energy of certain coefficients :

2
p %energy of or in whole signal Partial Energy C
m,n

2
Total A
m0
D 2
m0 ,n m,n
n mn

where C is coefficient (either A or D).

A threshold θ can be used as criterion to zero out coefficients.

if p( C m ,n )
then C m ,n R
if p( C m ,n )
then C m ,n R

This is the first method applied in the study before referring to any literature on wavelet de-
noising. It was found that this method has very poor de-noising ability. Disregarding the poor
performance, the energy ratio p actually is meaningful, it can used empirically to show that
wavelet coefficients is sparse. In other words, only a small portion of wavelet coefficients are
important. Following diagram illustrate how sparse the coefficients in the blink EOG. Detail
analysis will be discussed in section4.

21
Figure 2.3 Wavelet Compression of blink EOG. Figures in the left column are raw blink EOG,
compressed EOG and residue. The right figure shows at most 6% of wavelet coefficients
contribute 97.5% of the total energy, showing that coefficients in the wavelet domain is sparse.

Method 2. The Donoho’s Methods


David L. Donoho, a professor of statistics at Stanford University, had extensive research on
wavelet shrinkage de-noising. Before discussing Donoho’s methods, the first things to do is to
distinguish global thresholding and local thresholding.

In the soft linear shrinkage method, instead of zero out certain coefficients, all wavelet
coefficients are penalized with smoothness index r and smoothing factor . If both indies (r, ) is
independent of level of coefficients (m,n), this method is also a global thresholding method. If
(r, ) is level-dependent, the method becomes level-dependent thresholding.
C
~ m ,n
C
m ,n 1
2 2 jr
Donoho et. al [6 7 8 9 10 ] proposed nonlinear shrinkage methods using thresholding.

22
~ 0 | C m ,n | Donoho ( Hard )
C m ,n Donoho Visu
C
m ,n | C m ,n |
Donoho SURE
Donoho
~ 0 | C m ,n |
C Donoho
(Soft )
C m ,n C m ,n m ,n Donoho Hybrid
C m ,n C m ,n

The first one is the Visu threshold, which is a global threshold [11].

Donoho
Vi us 2 log L

Where L is the length of the signal and ˆ Donoho is the estimated noise power in wavelet
domain,
media C m ,n
ˆ Donoho n
0.6745
The second one is the SURE threshold [12]

2
1 |D | |D |

Sure arg min n 2 i, j


t min i, j
,t
t #
n ˆ D ˆ
i,j

The SURE (Stein’s Unbiased Risk Estimate) threshold is a level-dependent method, it estimate
the threshold with respect to the minimization of the risk function.

The last one is the Hybrid threshold

3
2
ˆ 2 log N if Di ,j 1 log N

2
D
i,j N
Hybrid

1 |D | |D | 2

arg min n 2 i, j i, j
,t else
n ˆ ˆ
# t min

t D
i,j

The hybrid method integrate the advantages of the Visu and SURE method, achieving the least risk.

23
The following figure demonstrate the power of the Donoho’s Thresholding De-noising :

Figure 2.4. Signal de-noising via Dohono Visu Thresholding using 5-level db4 wavelet. The
first one is original signal, middle one is signal after hard thresholding, the lowest one is
signal after soft thresholding. It can observed that, soft thresholding has less ripple.

2.6. Wavelet Packet Transform


In the original WT, it works very well in describing the macro-structure of whole transient signal,
and that’s the reason why WT is used : to extract the macroscopic structure of the signal. This
works well in de-noising but not in classification. Figures below shows templates of different EOG.

Figure 2.5 The templates

Although these templates can be described sparsely in wavelet domain. But this only achieve the
ability in detection but not in classification. The red, blue ,green and black curve are “very”
similar. Following figure shows difference between the blink EOG and the looking down EOG.

24
Figure 2.6 The plots of templates of blink EOG and looking-down EOG.

In control theory terminology, the tail of looking-down EOG is “saturated”, while the tail of
blink EOG has “overshoot”. To extract such features, wavelet packet transform is
implemented. The following describe the difference between WT and WPT.

Figure 2.7. The decomposition structure of Wavelet Transform and Wavelet Packet
Transform in a 3-level set up. The only difference is that in WPT, the detail coefficients at
the previous level is also decomposed again. Therefore the WPT can be treated as a
generalized WT. Image adapted from MathWorks®.

Experiment showed that the major difference in figure 2.6 is indeed insignificant. That will be
discussed in section 4.8 on Mahalanobis distance.

25
2.7. Signal Detection
In online processing, data stream is stored in a circular buffer. Processing the content of the
buffer time by time is indeed computationally too expensive. Also, most of the time, the buffer
only contains noise, it is wasting computer power to process on noise. Therefore, a signal
detection stage has to be implemented. This section focus on the theoretical aspect of signal
detection, the implementation will be discussed in section 3.

Signal detection is a special case of signal classification. The general problem is to distinguish

x = s + n (Signal and noise, a.k.a. alternative hypothesis)


x = n (Noise only, a.k.a. null hypothesis )

Detection by simple magnitude thresholding


The simplest way is to use thresholding at threshold θ.

Thres x,sgn x 1 x Signal exists


0 x Noise only

The method has one disadvantages : not robust. Although in SNR of EOG is relatively very
large, but false alarm rate is not zero due to the presence of outlier such as spikes generated by
the brain or the high amplitude pulse due to poor electrode contract.

Other methods
There are other methods such as
 detection using magnitude thresholding with adaptive noise floor
 detection using envelope thresholding by Hilbert transform
 detection using variance ratio
 detection using matched filter
After experiments, those methods were rejected and thus the theories will not be discussed here.

2.8 Support Vector Machine


Support Vector Machine was invented by Vladimir N. Vapnik, a Russian professor in the field
on machine learning. The following paragraphs summarize the theories on SVM.

Generally, the binary-class classification problem to find classification function f . In the


perceptron format, the SVM classifier the following form :
f xsgn wT x b { 1, 1}

w is the weighting and b is the bias, both has to be determined by optimization problem

26
2
max LP w 1 w max LDi 1 y y x ,x
i j i j i j
2 2
s.t. yi wT xi b 1 0i s.t.i yi 0i
The primal problem 0 i

The dual problem obtained by Lagrangian - method

Geometrically, the problem is to find a linear separating hyperplane to separate 2 class of data
in the feature space.

Figure 2.8 Two linear separating hyperplanes

After derivations, the solution to the separating hyperplane problem fx sgn wT x


b is

w* i yi xi b yi wT xi

And the classifier thus has the form

f xnew sgn w *T xnew b*

sgn*i yi xtraini , xnew b *

Support vector SV
One major reason why SVM is used is because of sparsity of SV. Physical meaning is that only
a few data points affecting the hyperplane orientation are important. Therefore, the results
can be further optimized as
w*i yi xi w*i yi xi
xi SV
* train * train
,x b*
y y
f x sgni i
x
i f x sgn i i x
i ,x b*
Xi SV

27
The bias b

In practice , for numerical stability, the b is computed as the average value. Since only those support

b yi wT xi b 1 yi wT xi
|SV |
xi SV

Theoretical Emperical

Soft SVM and Regularization


To avoid over-fitting, that means, to prevent building a over complicated SVM, regularization
can be used via allowing soft margin instead of a absolute hard margin.

Figure 2.9 Two SVMs. The left one is the absolute hard SVM, the right one is the soft SVM.
Although hard SVM successfully separate all data into 2 side, but the margin of such
hyperplane is much less than the soft one.

Feature expansion and kernel


Introduction of slack variables and regularization can only handle data that is “slightly
overlapped”. For data that is seriously overlapped, even the soft SVM yields poor results. In
this case, feature expansion can be used. The idea is same as increase the order of under-fitted
polynomial into a higher order polynomial.

Figure 2.10 Under-fitting and fit. The left figure is a high bias under-fitting case for using a
linear equation, while the right figure is using a 4rd order polynomial to fit the data perfectly.

Feature expansion is usually used with kernel to keep the complexity low.

28
Thus the new SVM systems is

classifier f x sgn*i yi xitrain , x b *


New

N
SVM 2 O
min w C
To solve i
i

s.t. y wT x b 1 i
0i
i i

i ≥ 0, is the slack variables, which represent degree of violation of the outlier, for i < 1, the
outlier lies within the margin. For i ≥ 1 , the outlier is misclassified. Thus the sum of i represent
the total degree of violation occurred in the data set.

C is regularization parameter that control the importance of those violation. Small C means
violation is diminished, thus creating a large margin SVM classifier and increase the
generalization power of SVM, and thus reduce over-fitting. Large C make constraints
significant, can not be ignored, such harder problem reduce the size of feasible solution space
and thus create a smaller margin SVM classifier, decrease the generalization power of the
SVM, and thus reduce the over-fitting in a smaller extent. Notice that if C is infinity, then all
constraints must be satisfied, and this create an absolute hard classifier.

is the kernel.
a,b a, b

2.9 Outlier detection by convex hull


In geometric point of view, covariance matrix can be characterized by its boundary, which is
an hyper-ellipsoid.

Figure 2.11 The covariance ellipsoids. The black dots are the outlier.

The presence of outlier increase size of original ellipsoid, and rotate it. And because of the
nature that outlier is far away from the original ellipsoid, outlier can be removed if the data
point is outside certain “extended boundary”, the red convex hull. The algorithm can be
implemented by Fast-Minimum-Covariance-Determinant. It only takes O(n) time. Which is
much faster than the brute force search.

29
3. Implementation Details
3.1 Introduction
This section discusses the design and implementation of the system in detail. The following
diagram (Fig. 3.1) provide an overview of the whole system.

Figure 3.1 Block diagram of the overall system.

This section describes how the system was constructed, in chronological order, beginning with
wireless connection and data logger, followed by template analysis system, and finally the
whole system. Not all parts will be discussed in detail, only important parts will be discussed.
The detailed code is listed in the appendix.

®
3.2 The hardware : NeuroSky sensors
®
The hardware system used in this project is the NeuroSky MindWave headset. It provides
sampling at frequency of 512Hz, and data communication at baud rate of 57600 through the
Bluetooth Serial Port Profile (SPP) [14].

3.3 Wireless connection and data logging


®
The NeuroSky MindWave headset provides wireless Bluetooth communication, while the
Matlab Instrumentation Control toolbox provides methods to connect Bluetooth object. After
Bluetooth connection is established, the next things to do is to construct the data logger. The
following diagram summarized the structure of the data logging process.

Figure 3.2 The data logging process.

The raw data is obtained by packet parsing only after the data packet received is correct.

30
®
Following are a table showing some of the NeuroSky Communication Protocol

Code Name Meaning


0x80 Raw Raw wave value in a single big-endian 16-bit two’s
compliment signed value ranged from -32768 to 3277
0xAA SYNC For synchronization
0x55 EXCODE For extended code level
-- PLENGTH Packet Length
-- VLENGTH Value Length
-- PAYLOAD Pay load, content of the data is a numerical value.
®
Table 3.1 Some NeuroSky Communication Protocol. Adapted from [14].

The sensors did provide more data other than raw data such as eSense values and “poor
quality measure” data. But in this project, only the raw value is used.

The details of the decoding is listed in the appendix, the following figure summarize the logic
flow of the decoding process.

Figure 3.3 The flow chart of the data logger.

31
The raw data is in the format of 2 bytes of signed integer in twos complement, in big-endian. For a
N bit array aN-1aN-2...a0 , the corresponding decimal value w is given by the following formula,
note that the negative sign is due to the sign of two’s complement.

N 2
N 1
w aN 1 2 ai 2i
i 0

One important point is that in Matlab, the conversion of bit string to double is usually
performed via build-in function str2double. But in the conversion process, str2double is too
slow. Analysis of the mfile of str2double shows that it is indeed a very complicated “multi-
functional” function, it need to scan the user input format and match the input format to the
default settings, and thus resulted in low efficiency. Thus sscanf is used instead.

3.4 Obtaining EOG template


The experiment of collecting the EOG is listed as follows

Type of Single Blink, double blink, looking down, looking up, looking left and
signals looking right
Experimental 40 seconds for blink EOGs
time 10 seconds for direction EOGs
Interval between About 4 seconds.
eye movement A “beep” sound will be produced by the computer as a trigger.
Data collection The user first looking at the center of the screen. After the beep sound, the
method user blinks, or look at different directions. For directional EOGs, the user
move his/her eyes to the four edges of the computer screen. After that, the
user looks back to the center of the screen and waits for another beep.
The head-screen distance is approximately half a meter.

The following flow chart summarizes the logic flow of the offline subsystem that was
constructed to perform template analysis.

Figure 3.4 The flow of the offline system for template analysis.

32
After the raw signal is obtained, the raw signal inside a very long data buffer, is
preprocessed : de-trending and band pass filtering. In choosing the band pass filter, the
Butterworth filter is chosen. Since it has better phase response han Chebyshev and Elliptic
filter, such strength has overcome its weakness that Butterworth filter has slower roll-off. To
achieve higher roll-off, a 5th order Butterworth filter is used. Computation speed is not the
major concern here since the filtering is applied in this offline system.

After filtering, the followed task is peak detection and signal segmentation. Initially a peak-
detection m.file was created. In the later stage of the project, it was found that Matlab has
build- in peak detector which, has even more functionality. Thus finally findpeaks() function
the was used :

[ Pks Loc ]= findpeaks(X, ‘MINPEAKHEIGHT’, MPH , ‘MINPEAKDISTANCE’ , MD, ‘THRESHOLD’,T) The

code [ P L ] = findpeaks(x0, ‘MINPEAKHEIGHT’, 100 , ‘MINPEAKDISTANCE’ , 50, ‘THRESHOLD’,

3) means that find the peaks of the signal x0 with minimum peak height 100, and minimum
peak to peak separation of 50, with peak to it’s neighbour having minimum difference of 3 ,
the peak amplitudes and peak locations are stored in P and L respectively.

The code above find the positive peak, for detection of negative peak, the code can be applied
by feeding input as -x instead of +x.

After the peak is being detected and the peak locations are obtained, segmentation can be
performed by chopping signals into a 1-second interval segments ( 512 data points ) using
those peak locations as flags.

Figure 3.5 Offline data segmentation.

As mentioned in introduction section, average blink signal has approximately 100-400 millisecond,
thus a 1-second interval buffer provide enough space for further process. The reason why the
buffer is much longer than the EOG length will be explained in 3.7 in this section.

After obtaining those segments, the data are stored into a database. And the whole processes start
again and were repeated many time for collecting enough samples. After enough data was
obtained, all the signals in the database were used to create the time average. The signals were all
aligned before averaging. An alignment mfile was created in the beginning for signal alignment.
Again, later it was found that Matlab has build-in function alignsignals() for signal alignment.

[S1 S2] = alignsignals( SigToAlign , SigRef , D , ‘truncate’)

33
The above code means that align SigToAlign with reference to SigRef , and adding zeros or
truncate the tail at maximum delay D.

Finally, a simple sum is performed to obtain the template.

3.5 Performing Template Analysis


After the templates were being collected, several analysis can be performed such as statistical
analysis and signal analysis. The detail data analysis will be discussed in section 4. One of the
important things to mention is that the template can be used to extract noise. The empirical
noise is defined as
Empirical noise = Raw signal - Template

Thus using the template, all the noises signal of the database can be collected and hence SNR
can be computed.

3.6 Constructing the online system : an overview


The following diagram illustrate the whole online adaptive system.

Figure 3.6 The whole online system

The system consists of 8 parts , they are :


 Adaptive activity detector
 Wavelet Filter
 Real time segmentation
 Feature extractor and adaptive outlier detector
 (Real time adaptive) Support Vector Machine
 Adaptive Database storage
 Simple application layer
In the following sections, only the adaptive part : activity detector, outlier detector and
database will be discussed as these are relatively more important in the practical aspect of the
system. Feature extraction and SVM performance will be discussed in section 4. For the other
parts, the explanations given in the comments in the code should be suffice.

34
3.7 Implementing the adaptive activity detector
In online processing, it is too expensive to process every blocks of data segment obtained. Thus
activity detection is necessary. In online processing, the buffer vector x B will contain either 3 of
the following
xB = n
xB = sincom+n
xB = scom + n

Where n denote noise, sincom denote incomplete signal and scom denote complete signal.
Following figure illustrate the idea

Form the diagram


xB(t1) = n
xB(t2) = sincom+n
xB(t3) = sincom+n
xB(t4) = scom+n
xB(t5) = scom+n
xB(t6) = sincom+n
Figure 3.7 The buffer xB at different t

From the figure, in the 6 time instance, only the 4th one should be processed. Therefore the
activity detector should be able to :
 (A) Reject the buffer in t1 , t2 , t3 ,t6
 (B) Process on buffer in t4
 (C) Prevent repeated processing on buffer in in t5

In choosing the detector, several kinds of detectors were created, but in terms of
computational speed, the simple magnitude threshold is chosen. Following logic flow illustrate
the method to implement such detector

If there is signal with magnitude > threshold TH


Find peaks locations;
Record the time as T
If peaks location is in the middle of the buffer
and the time Now is 0.5 second after the previous recorded time
T Process on this buffer;
end
end

35
(A) and (B) can be solved using the first condition inside the second if-statement, while (C) is
tackled using the second condition statement. Since the condition statement requires the peak
location to be in the middle of the buffer, thus this is the reason why a 1-second interval buffer
is implemented for a 100-400 millisecond waveforms.

The magnitude thresholding activity detector is not robust. Thus the threshold TH has to be
adaptive. Farthermore, when the buffer only contain noise (that is, the first condition is not
fulfilled), it would be really wasting computer resources lost if those data were simply rejected.
Thus the noise data can be used to calculated the noise floor, and update the value of the threshold.
After certain experimental analysis, the magnitude thresholding with noise floor computation is
still computationally the fastest, as compared to using matched filtering or other method.

The following illustrate the updated activity detector.

If there is signal with magnitude > threshold TH


Find peaks locations;
Record the time as T

If peaks location is in the middle of the buffer


and the time Now is 0.5 second after the previous recorded time
T Process on this buffer;
end

else % there is only noise or incomplete signal


Compute variance
TH = 2 * square root of variance + a constant
end

3.8 Implementing the adaptive outlier detector using convex


hull There are two reasons why outlier detector is used.
1. False alarm rate of the activity detector is not always zero, there is always certain chance
that there will be some “outlier” in the signal. Outlier detector is used to reject these outliers.
Detail analysis will be discussed in section in the following section.
2. Another reason, the more important one, is that in this real time system, for the activity
detector, the type II error (false negative) has a more serious effect than type I error (false
positive, or false alarm). Thus, the false negative rate has to be minimized to zero (in theory)
by sacrificing the false alarm rate, provided that the activity detector has reduced the
computational complexity of repeated processing on buffer. Therefore, as reinforcement of the
activity detector, the outlier detector implemented can be used to catch those false alarm.

36
The implementation of outlier detector is very simple, the outlier served as an additional
classifier before the SVM, it compares the features of the incoming signals to the feature range
of all the signals inside the database.

Figure 3.8 The concept of outlier detection

From the figure above, the database will generate a feature space. And the boundary is the “envelope”
of this feature space. An extended version of such boundary, (for example, the extension
= 1.2*boundary) is constructed. For all the input signal, if its feature values fall within the extended
boundary, that signal is accepted. Otherwise the signal is treated as outlier. In practice, the ellipse is
replaced by a polygon, which is called convex hull. In Matlab, such boundary can be implemented using
convhull() function. The function simply find the outermost points in the data set.

Figure 3.9 The false positive outlier wrongly accepted as blink signal by the activity detector. The
outlier mis-classified as useful signal has very distant features from the normal signals in the feature
space. The lowest plot is the enlarged version of the second plot. In the last plot, only the boundary of
the feature space is drawn, the extended boundary is not shown in the figure. Notice that the outlier is
outside the extended boundary, which was set to be 1.1 size bigger then the real boundary.

The extended boundary is not fix, since the boundary is constructed using the database, thus the
boundary will keeps changing. It was a concerned that will the boundary keeps expanding since
there will be more an more data falling into the inner edge of the boundary, such problem can be

37
solved by using forgetting factor. Because of time limit, currently only non-automatic outlier
update version was created.

3.9 The adaptive database and user adaptivity


The database store all the signal that is not treated as noise or outlier, but not all signal stored has
equal importance. Because of the non-stationary nature of biomedical signals, the most recent
signal stored in the database should be “closer” to the user, that its data are more significant for
referencing, while the data collected in the database that is far away from now, these signals are
“far” away from the user, and thus they were giving a lower weighting. No past signal stored in the
database will be removed since all signals provide certain important information such as the
general shape of the signal and the general energy range of the signal. But using the idea discussed
above, a small forgetting factor ( for example, 0.95) is introduced, the database now can provide
up-to-date information for updating the adaptive parts of the system.

Furthermore, because of the noisy nature of biomedical signal, even though the outlier
detector is being implemented, but practically the false alarm rate is not zero. There is always
a chance that there will be outlier treated as normal signal and added to the database. While
the outlier detector is a reinforcement of the activity detector, the database itself can be used
as the reinforcement of the outlier detector in the sens that the database reject outlier.

As mentioned in section 2, the database can “remove” outlier by the method of minimum
covariance determinant. To prevent data lost in the database adjustment process, the outlier
removal will be kept in minimum by given a very low weighting, for example, 0.05. The draw
back of such process is that the computational complexity is increasing with respect to the
database size, that means when the database size is very large, such database adjustment
process will enforce tremendous amount of computational burden on the real time system.
Thus the database adjustment process will only be carried out in “pseudo real time”. That
means, the database update itself when the user “turn off” the system.

Therefore, because of time limit, currently a non-automatic database update version was created.

3.10 Obtaining EEG data


Originally the EEG data was obtained using real time Welch’s method. After testing, it is
computationally expensive to perform EEG processing in real time. Thus finally the EEG
band power is obtained directly from the headset using the build-in functionality of the chip
installed inside the headset.

3.11 The application layer


Currently there is no user-interface created. The system communicates with the Microsoft
Windows is achieved by using Java Abstract Window Toolkit.

robot.mouseMove(locationX, locationY);
robot.mousePress(InputEvent.BUTTON1_MASK);

38
4. Analysis on data, features and results
4.1 Introduction
This section discusses performances of the whole project and data analysis.
4.2 Analysis of goodness of fit of de-trending
De-trending is performed to remove DC shifts in signal. This is a typical step in ECG analysis
since ECG is well studied. But in EOG analysis, the goodness of fit of de-trending has to be
2
determined to make sure de-trending is an appropriate step. Following table is the R values
for different raw EOG signals.

Single Blink Double Blink Looking Down Looking Up


2 0.9906 0.9980 0.9604 0.9684
Highest R
2 0.9732 0.9891 0.8876 0.7665
Mean R
2 0.9793 0.9901 0.9105 0.7862
Median R
2
Lowest R 0.8939 0.9295 0.5539 0.3827
Standard Deviation 0.0155 0.0069 0.0706 0.1241
2
Table 4.1 The table of R for different raw EOG signals

Results shows that de-trended signals are good estimation of raw blink EOGs but may work
badly for directional EOGs. Following figures are the two signals with the lowest R2 values.

Figure 4.1. Two noisy EOG Raw signals with lowest R2 and their de-trended version. (The
left signals are looking-down EOG, the right signals are looking-up EOG)

By observing the figure, indeed de-trended signals are good approximations. Small R2 values are
mainly caused by high frequency components, thus such problem can be solved by filtering.

Single Blink Double Blink Looking Down Looking Up


2 1.0000 1.0000 0.9990 0.9993
Highest R
2 0.9974 0.9847 0.9532 0.9086
Mean R
2 0.9994 0.9937 0.9664 0.9200
Median R
2 0.9533 0.8813 0.725 0.6762
Lowest R
Standard Deviation 0.0060 0.0210 0.0496 0.0707
2
Table 4.2 The table of R for different band pass filtered EOG signals.

39
After bandpass filtering, all R2 values improved except for standard deviation of double blink
EOG. Thus the above analysis showed de-trending can be applied to raw signal to remove
trend, and such approximate is good enough.

4.3 Analysis of the templates and detectability


After obtaining the templates, by direct inspection double blink EOG can be easily
distinguished from others by using number of crossing at certain threshold or using energy.

For single blink, looking up and looking down EOGs, their waveform are very similar and
need further analysis. The following diagrams illustrate the superposition of all blink signals,
the coherent average, and the empirical noises for blink EOG, looking-up EOG and looking-
down EOG.

Figure 4.2a The blink EOG : superposition, coherent average and empirical noises

40
Figure 4.2b The looking down EOG : superposition, coherent average and empirical noises

Figure 4.2c The looking up EOG : superposition, coherent average and empirical noises

By inspection, superposition plots showed that variability of these signals are not large, and
therefore detection is not a problem. Instead, classification is a big problem.

41
For double blink signal, the detectability of double blink signals are much higher.

Figure 4.3 The filtered double blink EOG.

Because of such high contrast in magnitude, even though double blink EOGs suffer from certain
amount of variability, the double blink signal is still considered as highly detectable signal.

Figure 4.4 The EOG of double blinks. The time gap between the second blink can be as
large as half second, showing that the double blink EOGs has certain amount of moment-
to-moment variability.

4.4 Analysis of the SNR


Empirical noise can be obtained using template. It is defined by

noise (Signal Noise) Template


Obtained in experiment Obtained via averaging
The empirical SNR can be calculated as

2 2

SNR Emp 10 log dB


x x
10 log i i

2
10
ni 10
xi x i 2

42
Following are the plots of all SNR of blink(in red), down(in blue) and up(in green) EOGs

Figure 4.5 The SNR values of 3 types of EOGs.

By observation, SNR of blink EOG are all very high, most of them has SNR over 10dB. For
looking down EOG, most of the SNR is high than 7dB. While for looking up EOG, most of them is
high than 4dB. Form SNR analysis, it shows that delectability of those EOG signal is high.

4.5 Discriminability analysis on time domain and frequency domain


Discriminability here refer to the ability to classify different type of signals. For the double
blink EOG, it has very high discriminability. Indeed, in testing and training of both offline
and online, the classification accuracies are mostly 100%. But unfortunately, other three types
of signals have very high degree of similarity.

43
Figure 4.6 Superposition of 3 types of normalized EOGs showing very high degree of similarity.
Not just in time domain, the 3 EOGs has very high similarity in frequence domain.

Figure 4.7 Power spectral density estimate of the 3 type of EOGs using Welch’s
method. The plots also show high degree of similarity in the frequence domain.

44
The PSD plots obtained using Welch’s method show that for three type of signals, their major
frequency components are all within 2-15Hz. And the only difference in these plots is the peak
amplitude. Meaning that blink EOG has larger power than the other two. This suggest that
the power can be a good feature on classification.

4.6 Discriminability analysis on features


Beside power, totally about 30 features has been used for testing the separability of the data.
For all these features, certain statistical analysis such as ANOVA test and distance test has
been performed.The following lists out the 30 features used :

Maximum Positive Peak Amplitude Mean Frequency


Maximum Negative Peak Amplitude Peak Frequency
Peak-to-Peak Amplitude Bandwidth
Peak-to-Peak Interval Skewness
Number of zero crossing Kurtosis
Zero crossing rate Cumulants
P
Area under the curve / Energy / L norm (p=1~5) Shannon Entropy
Envelope Duration Log Energy Entropy
P Norm Entropy
L norm of Envelope
Centriod of kernel spectrum density Negentropy
Poincaré Map, SD1, SD2 Fractal Dimension
Table 4.3 The 30 features.

After experimental analysis, most of the features listed above has low separability : having
serious overlapping of the values between different groups. It was found that the third-order
statistics, the skewness, has the poorest performance all the time throughout the whole
project.

Figure 4.8 The skewness of three type of EOGs. The y-axis is the skewness value while the x-
axis is the signal index. The red dots are blink EOG, the blue dots are looking-down EOG and
the green dots are looking-up EOG.

45
Fortunately, there are still some “good” features. Figure below illustrates the EOGs in the power
domain. Red, blue and green dots are single blink, looking down and looking up EOGs respectively.

2
Figure 4.9 The energies ( square of L -norm ) for the three type of EOG. This plot shows that
to distinguish blink and looking up, the energy feature can be used.

All the other plots will be listed in the appendix.

4.7 Discriminability test on blink and directional EOGs using residual sum of squares
The previous energy analysis shows that there is a large difference between blink EOG to the
directional EOGs. In this part, such notion is being further investigated using Euclidean
distance of the residue sum of squares RSS :
2
RSS x, y xi yi

The RSS vector RSSV is defined as


RSSV x1, Y RSS x1, y1 , RSS x1, y2 ,..., RSS x1, yn

46
Following plots of RSSV(Blink, Looking Down) and RSSV(Blink, Looking Up). The top left-hand
corner is the RSSV of single blink signal number 1 to all the looking-down EOG signals.

Figure 4.10a The RSS values of all blink signal to all the looking down signal.

47
Figure 4.10b The RSS values of all blink signal to all the looking up signal.

Form figures above, the order of magnitudes of raw blinks signals to the two directional
EOGs are all in the order of 10,000,000, showing that blink signal has very large difference to
directional EOGs.

4.8 Discriminability test on blink and directional EOGs using Mahalanobis distances

Figure 4.11 The plots of three type of EOGs.

48
By observation, the major differences of these signals in time domain occur in about n = 250
and n = 300 to 400. The differences are in terms of magnitude.

To find out the distances between these three groups of signals, Mahalanobis distance is
computed. The Mahalanobis distance is also defined as by the covariance matrix

DM (x, y) x yS1 x y

Figure 4.12 The Mahalanobis distances between different groups of signals. The blue curve is
the blink-down distance , the green curve is the blink-up distance and the red curve is the
down-up distance.

From the plot of Mahalanobis distance, first the peak in n = 250 agree with time domain plots.
However, there is no significant difference in n = 375 to 425. In the time domain plot of figure 4.11,
n = 375 to 425 for red curves has a small overshoot, while the green and blue curves simply go to
saturation like the sigmoid function. But the peaks in the Mahalanobis distance are located in
about n = 300 to 325, which corresponding to negative peaks in the time domain. Such results
showed that, the primal analysis of the template, has certain error, that is the second difference in
the time domain is actually insignificant. And instead, the negative peaks are also important
characteristic. That means, apart from energy features, upper peak amplitude and negative peak
amplitude can be used as features, as shown in the following diagram.

49
Figure 4.13 The “updated” analysis of the time domain. ( Red- single blink,
Blue- looking down, Green- looking up)

However, as discussed in section 4.6, the peak features has serious overlap.

Figure 4.14 The overlap of peak values. ( Red- single blink, Blue- looking down,
Green- looking up)

50
Another problem is that , due to variability, not all looking-up EOG signals have obvious
negative peak.

Figure 4. 15 The looking-up EOG signal number 74. The signal demonstrated that some
looking-up EOGs has no obvious negative peak in the single electrode setup.

Even the signals are plotted in feature space of energy and peak, there is still overlapping between
directional EOGs. And there is a very narrow margin between blink EOG and the looking up EOG.

Figure 4.16 The Peak-Energy feature space of the signals. There is serious overlapping between the

51
green and blue dots. For red and blue dot, the margin is very small

Even-though the above plot looks nice for separating red dots to blue and green dots using a soft-
margin SVM, however, the order of magnitude of y-axis (energy) is 100 times larger than x-axis. In
machine learning point of view, such large difference in order of magnitude will distort the
learning process, since all features should have same amount of contribution in the learning
process, if one features has dominant order of magnitude, then that features will be the dominant
factors inside the learning machine. Thus data normalization should be carried out.

The data normalization has 3 types :

x' x min x Min-Max normalization

max x min x
x' x mean x Standardization

std x
x' x Standardization of order of magnitude

10Order
Following plots showing the distribution after normalization, only the third normalization method
retain the degree of separability. The first two normalization blur and merge the data closer together.

Figure 4.17a,b The Normalizations using min-max (left) and standardization (left). The
data become much closer and the separation become even more difficult.

52
Figure 4.17c The Normalizations using order of magnitude. The data retain
same degree of separability.

4.9 Offline SVM performance


Finally the following features were selected as the SVM input
 Number of zero-crossing at magnitude 400
 Shannon Entropy
 L2 Norm
 Kurtosis
 Peak Amplitude

The number of zero-crossing is used to classify double blink and other EOGs. In offline
setting, almost all classifier training to separate double blinks out has 100% of accuracy. For
the offline accuracy of distinguish single blink , looking up and looking down. All the SVM has
training accuracy and testing accuracy of almost 100%

Type of Signals Double Blink Single Blink Up or Down


Before optimization Best case 99.6% Best case 98.85 Best case 98.95
Worst case 99.2% Worst case 97% Worst case 96.85%
After optimization 100% 100% No improvement
Table 4.4 Offline SVM performance

53
4.10 Online whole performance
There are two accuracies measure in online system : detectors and SVMs.

For the detectors (activity detector and outlier detector)


Activity detector only, outlier detector After outlier detector was also
was not installed installed
False negative rate 0.0125 for 160 test signals 0.065 for 154 test signals
False positive rate 0.0125 for 160 signals 0.065 for 154 test signals

Table 4.5 Online performances of the activity detector and outlier detector

Furthermore, it was found that, the adaptive noise floor threshold has no significant improvement
on the false positive rate, it is because the SNRs of the EOG signal are high. And instead, the online
adaptive noise floor consumes about 0.0125 seconds computation time. From complexity point of
view, the adaptive noise floor algorithm was not applied in the final system.

Then, consider the case for SVM. To boost the computational speed, even fewer features are
selected. Since the peak amplitude is correlated with L2 norm (As shown in figure 4.17c, there
is a straight line trend between the two features), thus only energy is used. Furthermore,
kurtosis is rejected since among the selected features, kurtosis has the largest overlapping.
Thus the final features used are :
 Number of zero crossing at 600 ( using self defined function with double smoother )
 L2 Norm ( using sum() rather than norm() because of speed )
 Shannon Entropy ( using wentropy() in the wavelet packet analysis toolbox)

The number of zero-crossing is used to classify double blink EOG signals also has very good
performance in online setting, it has almost 100% of accuracy. One important point is that, double
smoothing is performed before counting the number of zero crossing. This step is critically
important in extraction of accurate feature. In since naturally in all human there is a phenomenon
called saccade which means “rapid eye movement”. When human look at an object, we try to
locate interesting parts of the object and our eye move around with high speed. Such rapid
movement of the eye, although the magnitude is not large, but it is causing higher frequency
noises. Such noise has serious effect in alternating the number of zero-crossing at 600.

54
Figure 4.18 Single Blink signal number 68 with saccade component.

Such problem can be tackled using smoother. For robustness, double smoother was implemented.

In the online setting of classifying the single blink, looking up and looking down , because of
time limit in the project, there is not enough time to perform detail studies on classifying up
and down signals.

Type of Signals Double Blink Single Blink Up or Down


Before optimization Best case 100% Best case 100% Not enough time
Worst case 97.5% Worst case 98.5% Not performed
After optimization 100% 100%
Table 4.6 Offline SVM performance

4.11Offline discriminability test using K-means in wavelet domain


The feature analysis was performed once again on the wavelet domain as well as in the wavelet
packet domain using K-means. K-means clustering is an grouping algorithm, the input data
matrix is being split into different groups.

55
Figure 4.19. The true labels and test labels generated using K-means. The upper one shows the
labels of 3 type of signals. There are 104 single blinks, followed by 104 looking down and
finally 104 looking up. The second figure shows the clustering using K-means.

From the figure, it is clear that the signals can be classified with high enough accuracy. The draw
backs is that, although the K-means algorithm is theoretically an NP-hard problem. That means it
is not suitable for online process even though several heuristic variants has been proposed.

4.12 Discriminability on wavelet domain with maximum sparsity


It was found that, the signals can also be classified in the sparse representation of the signal in
the wavelet domain with very high accuracy using sparse wavelet coefficients clustering.

4.13 Analysis on computational speed on certain steps


In comparing (norm( x , 2))^2 and sum( x.^2 ) for computing the energy of the signal, the
second one is used since it relatively faster.

Figure 4.20 The computational time of two method.

56
In the stage of packet parsing , there is a step that require to sum a vector with small length.
Summing, it was found that in summing a small vector, the for loop is even more faster than
the sum( ) function.

Figure 4.21 The computational time of two method.

4.14 Possible extensions and unsolved problems


Although the project is ended in terms of the time limit, but that is not the end of the system.
There are tons of things to do. This section discuss possible extensions of the system. Furthermore,
this section also present certain unsolved problems encountered in the project that worth
mentioned.

Possible Extensions
(a) Design wavelet that is more suitable for the problem
Currently the wavelet used is the D wavelet. Which may not be the best wavelet to
approximate the signal. In general, wavelet can be every function once the conditions are
satisfied. And therefore, it sis better to design a more problem-oriented wavelet rather then
picking wavelets from the standard wavelet dictionary.

(b) More electrodes


In facing the usability-ability dilemma, initially the usability was ranked higher. After the project,
it is believed that such view point can be adjusted a little bit in the sens that one more electrode
can be added. Introducing one more electrode shall not decrease the usability much. But in terms
of system performance, addition of one more electrodes can boost the system to a very high level.

Unsolved problems raised in the project.


During the project, several failures were encountered, some of them were solved, and some of
them are remained unsolved. This section will discuss some problems that indeed are giving
insights in the field of signal processing and machine learning.

(a) Finding an transformation that can increase the distance between groups of data
During the project, it was found that the “distance” between the looking up and looking down
signals are very close together. The RSSM ( Residue Sum of Square Matrix , as mentioned in
4.7 ) between the directional EOGs are plotted below :

57
Figure 4.23 The RSSM of looking down and looking up.

Even the order of magnitude of the above plot is in order of 6, but that is contributed by the
noise in stead of the signal. The above plot shows the distance between directional EOGs are
very close. This also suggested that addition of one more electrode can be very useful in terms
of separating EOGs.

The RSS is calculated in the Euclidean sense, the question is that Euclidean distance may not
be a suitable measure for the signal. Thus the following problem is proposed :

Given 2 group of signals X, Y that they are close


That means there is an upper bound for all the distance
sup d xi , y j U i, j
The goal is to find a transform A such that
the size of the upper bound can be increased further
Find A s.t.
max sup d Axi , Ay j i, j

58
(b) Can we know classifier is over-fitted without using testing data in the high dimension
space?
In training SVM, one approach is to use automatic training : the process is the repeated cross
validation on training the classifier, and select the most suitable classifier parameters after, to
say, 10,000 trails. But the poor performance of the online testing shows that there is “over-
fitting” in the offline training process that , we over-belive the SVM trained is fine . Over-
fitting is a serious problem in machine learning. One way to test a whether classifier is over-
fitted or not is to apply cross validation. It is believed that cross validation can reduce the case
of over-fitting, but not remove it completely [15]. Thus there is still possibility that the model is
over-fitting but not being observed in the whole process of model selection. Then the following
problem, a very difficult problem comes

How to know if a classifier is totally or absolutely not over fitted,


in the sense that it does not require the help of model selection like cross-validation ?

4.15 Conclusion
In this project, an prototype HCI system based on EOG and EEG (mainly EOG) is
implemented. The system can achieve sufficiently high accuracy offline for 5 different kinds of
signals. While for online application, the system can achieve sufficiently high accuracy for 3
kinds of signals. There are several thing to be improved and extended.

59
References

1. Malmivuo, J., and R. Plonsey. Bioelectromagnetism , 1995

2. Basso, M. A., Powers, A. S., & Evinger, C.. An explanation for reflex blink hyperexcitability in
Parkinson’s disease. I. Superior colliculus.The Journal of Neuroscience(1996), 16(22), 7308-7317.

3. Bentivoglio, A. R., Bressman, S. B., Cassetta, E., Carretta, D., Tonali, P., & Albanese, A.
Analysis of blink rate patterns in normal subjects.Movement Disorders(1997), 12(6), 1028-1034.

4.Carney, L. G., & Hill, R. M.. The nature of normal blinking patterns. Acta
Ophthalmologica(1982), 60(3), 427-433.

5. Yolton, D. P., Yolton, R. L., Lopez, R., Bogner, B., Stevens, R., & Rao, D. (1994). The effects of
gender and birth control pill use on spontaneous blink rates. Journal of the American Optometric
Association, 65(11), 763-770.

6. Bulling, Andreas, Daniel Roggen, and Gerhard Tröster. "Wearable EOG goggles: Seamless
sensing and context-awareness in everyday environments." Journal of Ambient Intelligence
and Smart Environments1.2 (2009): 157-171..

7. Junichi, H. O. R. I., Koji Sakano, and Yoshiaki Saitoh. "Development of a communication


support device controlled by eye movements and voluntary eye blink." IEICE transactions on
information and systems89.6 (2006): 1790-1797.

8. Addison, Paul S. "Wavelet transforms and the ECG: a review."Physiological measurement


26.5 (2005): R155.

9. Kohler, B-U., Carsten Hennig, and Reinhold Orglmeister. "The principles of software QRS
detection." Engineering in Medicine and Biology Magazine, IEEE 21.1 (2002): 42-57.

10. Boggess, Albert, and Francis J. Narcowich. A first course in wavelets with Fourier analysis.
John Wiley & Sons, 2009.

11. Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation via wavelet
shrinkage. Biometrika 81 425–455.

12. D.L. Donoho and I.M. Johnstone. Adapting to unknown smoothness via wavelet
shrinkage. J. American Statistical Association, 90 : 1200-1224, 1995

13. Rousseeuw, Peter J., and Katrien Van Driessen. "A fast algorithm for the minimum
covariance determinant estimator." Technometrics 41.3 (1999): 212-223.
®
14. NeuroSky , ThinkGear Communication Protocol

60
15. Cawley, Gavin C., and Nicola LC Talbot. "On over-fitting in model selection and subsequent
selection bias in performance evaluation."The Journal of Machine Learning Research 99 (2010):
2079-2107.

16. LaCourse, John R., and Francis C. Hludik Jr. "An eye movement communication-control
system for the disabled." Biomedical Engineering, IEEE Transactions on 37.12 (1990): 1215-1220.

17. Norris, Gregg, and Eric Wilson. "The eye mouse, an eye communication device."
Bioengineering Conference, 1997., Proceedings of the IEEE 1997 23rd Northeast. IEEE, 1997.

18. Chen, Yingxi, and Wyatt S. Newman. "A human-robot interface based on electrooculography."
Robotics and Automation, 2004. Proceedings. ICRA'04. 2004 IEEE International Conference on.
Vol. 1. IEEE, 2004.

19. Usakli, Ali Bülent, et al. "On the use of electrooculogram for efficient human computer
interfaces." Computational Intelligence and Neuroscience 2010 (2010): 1.

20. Venkataramanan, S., et al. "Biomedical instrumentation based on electrooculogram


(EOG) signal processing and application to a hospital alarm system." Intelligent Sensing and
Information Processing, 2005. Proceedings of 2005 International Conference on. IEEE, 2005.

21. Lv, Zhao, et al. "Implementation of the EOG-based human computer interface system."
Bioinformatics and Biomedical Engineering, 2008. ICBBE 2008. The 2nd International
Conference on. IEEE, 2008.

22. Usakli, Ali Bulent, and Serkan Gurkan. "Design of a novel efficient human–computer
interface: An electrooculagram based virtual keyboard." Instrumentation and Measurement,
IEEE Transactions on59.8 (2010): 2099-2108.

23. Krupiński, Robert, and Przemysław Mazurek. "Convergence Improving in Evolution–


Based Technique for Estimation and Separation of Electrooculography and Blinking Signals."
Information Technologies in Biomedicine. Springer Berlin Heidelberg, 2010. 293-302.

61
Appendix Plots

The superposition of 105 single blink EOG

The superposition of 118 looking down EOG

62
The superposition of 67 looking-up EOG

The superposition of double-blink EOG

63
The raw signals, compressed signals and residue and percentage of coefficients to reconstruct
the signal with 97.5% energy of blink EOGs

The raw signals, compressed signals and residue and percentage of coefficients to reconstruct
the signal with 97.5% energy of looking down EOG

64
The raw signals, compressed signals and residue and percentage of coefficients to reconstruct
the signal with 97.5% energy of looking up EOG

The superposition of raw blink EOG (in red), raw looking-up EOG (in green) and looking-
down EOG

65
The superposition of raw time synchronized single blink EOG, the template and the noises.

The superposition of raw time synchronized looking down EOG, the template and the noises.

66
The superposition of raw time synchronized looking down EOG, the template and the noises.

The energy-entropy plots of all type of EOGs.

67
The energy-Entropy-Number of Zero Crossing at 600 plot of all type of EOGs.

The SVM training accuracies with different polynomial kernel and different values of the
regularization parameters C. The x-axis is the order of the kernel. The plot shows that the for
C = 0.5 and using a quadratic kernel, the SVM training has accuracies of 93.6%.

68

View publication stats

You might also like