You are on page 1of 14

Computer Methods and Programs in Biomedicine 60 (1999) 183 – 196

www.elsevier.com/locate/cmpb

Context related artefact detection in prolonged EEG


recordings
Maarten van de Velde a,*, I. Robert Ghosh b, Pierre J.M. Cluitmans a
a
Eindho6en Uni6ersity of Technology, Medical Electrical Engineering Group, PO Box 513, 5600 MB Eindho6en, The Netherlands
b
Department of Clinical Neurophysiology, St. Bartholomew’s Hospital, London, UK

Received 30 September 1998; received in revised form 12 February 1999; accepted 15 February 1999

Abstract

The need for reliable detection of artefacts in raw and processed EEG is widely acknowledged. Although different
EEG analysis systems have been described, only few general applicable artefact recognition techniques have emerged.
This paper tackles the problem of artefact detection in seven 24 h EEG recordings in the intensive care unit. ICU
recordings have received less attention than, e.g. epilepsy monitoring, although recordings in this environment present
an interesting application area. The EEG data used here was recorded during the difficult circumstances of an explorative
ICU study. The data set includes a diverse set of EEG patterns, as well as EEG artefacts. The study investigates objective
artefact detection methods based on statistical differences between signal parameters, using time-varying autoregressive
modelling (AR) and Slope detection. In addition to matching the performance of artefact detection against two human
observers, the study focuses on the optimal settings for context incorporation by testing the algorithms for different
time windows and epoch lengths. Results indicate that a relatively short period (20 – 40 s) provides sufficient context
information for the methods used. The combined AR and Slope detection parameters yielded good performance,
detecting approximately 90% of the artefacts as indicated by the consensus score of the human observers. © 1999 Elsevier
Science Ireland Ltd. All rights reserved.

Keywords: EEG; ICU; Artefact detection; Validation; Amplitude analysis; Autoregressive modelling

1. Introduction of artefacts is a time-consuming and tedious task,


especially in prolonged recordings, though the
The occurrence of artefacts in the EEG hinders viewing of events detected by automation and
the reliable use of automatic analysis techniques. subsequent confirmation of artefact is not so
Pre-processing by manual screening and marking onerous for human observers. A major problem
in computerised processing of the EEG is the
* Corresponding author. Tel.: +31-40-2473288; fax: + 31- non-stationary behaviour of the non-artefactual
40-2466508. signal and the fact that some types of artefacts

0169-2607/99/$ - see front matter © 1999 Elsevier Science Ireland Ltd. All rights reserved.
PII: S 0 1 6 9 - 2 6 0 7 ( 9 9 ) 0 0 0 1 3 - 9
184 M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196

can resemble EEG activity. In addition, a wave- analysis (first derivative), which has been success-
form of exactly similar morphology may be cor- ful for instance in the detection of muscle artefact
rectly categorised as artefactual in one record, and [12,13]. Temporal context is modelled for both
non-artefactual in another. They must, therefore, methods by reference to the EEG period immedi-
be assessed in clinical context. ately preceding the test-epochs, where detection of
From a practical point of view, the problem of significant changes is based on statistical princi-
non-stationarity and artefact identification actu- ples for variability tracking (AR) and outlier de-
ally may lie in the basic differences between hu- tection (Slope). Different lengths for the context
man screening and screening by a computer. period are investigated.
Visual evaluation is usually performed on rela-
tively long segments of 10 – 60 s where artefacts
are observed in relation to the ongoing signal. On 2. Methods
the other hand, a computerised screening process
should always be based on EEG features obtained 2.1. Autoregressi6e modelling
from a stationary signal, which requires the use of
short epochs of only 1 – 2 s [1]. Somewhat longer Auto-regressive (AR) modelling of discrete time
stationary epochs may be found when using adap- series consists of computing the coefficients that
tive segmentation of EEGs, but in general the represent the correlation of a discrete time series
segments are still rather short when compared to s(n) with the preceding samples at sampling times
human screening (e.g. [2 – 4]). (n−1) to (n−p),
The signal’s behaviour may be modelled by p

analysing the behaviour of features during seg- s(n)= 80 + % 8ks(n− k)+ e(n) (1)
k=1
ment transitions, thus incorporating the temporal
context of the EEG. We can then apply con- where 80 represents the DC component of s(n),
straints (rules) to restrict the permitted sequence 81,…,8p are the AR model coefficients, and e(n)
of segments, and identify distinct segments ac- is the residual error. The order p determines the
cordingly [5,6]. A drawback of these methods is number of unknown variables in the model.
An optimal solution can be found for a signal
the amount of heuristics involved in feature selec-
period of length N by minimising the residual
tion and the difficulties in composing an optimal
errors, which can be performed with the ordinary
set of rules [7]. An alternative approach to arte-
least squares (OLS) method [14]. The N equations
fact detection is the comparison of parameters to
that are used in the calculation are first written in
thresholds that are derived from statistics of a
vector notation:
preceding EEG period. For instance, Flooh et al.
[8] took a short period as referential context, S = 8 0 + Z 8 + e (2)
using an amplitude threshold calculated as sixfold S and e are vectors of N elements,
the average amplitude in the preceding 10 s. A
relatively long context period was used in a study Á8 0Â
by Brunner et al. [9], where the median of spectral ÷Ã
power was calculated over 3 min for the detection 8 0 = Ã · Ã,
of muscle artefact in sleep recordings.
÷Ã
The present study will further explore the con-
cept of temporal context in relation to artefact
Ä8 0Å
detection, using two complementary detection Á 0 · · 0 Â
Á8 1 Â
methods. A time-varying autoregressive (AR) Ã s(1) · · 0 Ã Ã Ã
model will target EEG-like artefacts, where in ·
Z = Ã · · · · Ã, 8 = Ã Ã
particular the identification of low frequency arte- ·
à · · · 0 à à Ã
facts is expected [10,11]. Detection of artefacts in Ä8 p Å
the higher frequency range is performed by Slope Äs(N− 1) · · s(N−p)Å
M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196 185

The OLS method consists of minimising the R0 autocorrelation, or power, of s(n)


quadratic cost function N length in samples of EEG period
J = e Te = e 2
The best model order p minimises the AIC,
to through the following set of equations:
for
e = S −Z 8 − 8 0 (3) 15 pB3
N
and consequently, which is a practical test range [19]. Previous in-
vestigations in EEG reveal optimal model orders
J = [S − Z 8 −8 0] [S −Z 8 − 8 0]
T
between p= 2 and p= 15 (e.g. [22,23]). Rela-
and tively low orders p= 5 or 6 have been reported
dJ consistently in various types of EEG [24–26].
= −Z T[S −Z 8 −8 0]= 0
d8
2.1.2. EEG 6ariability detection
The minimum for J is found for Theoretically, the AR coefficients describe the
Z 8 =S −8 0 EEG signal in each epoch. However, the re-
quirements of independence and normality of
resulting in the ‘least squares estimate’ for the the error term will most likely not be met during
model coefficients [14]: abnormal and artefact periods. The AR model
8 = (Z TZ ) − 1Z T(S − 8 0) (4) will try to adapt to changes and any distur-
bance, which will be reflected in both the coeffi-
2.1.1. Optimal model estimation and order cients and the residuals. Therefore, using
selection coefficients or errors alone is not enough for
An adequate fit of the original signal is char- detection of changes in the EEG [26,27].
acterised by a residual error term that has the We calculate the multidimensional vector F i in
statistical properties of an independent white- every epoch of 1 s, which is assumed to be sta-
noise process. This can be checked by testing for tionary [1]. The vector F i consists of the AR
normality [15] using the Shapiro – Wilk statistic coefficients 81,…,8p and includes the mean merr
[16]. This test is more powerful than other alter- and standard deviation serr of the residual errors
natives, and provides a sensitive measure for normalised relative to 1 mV. All vector compo-
non-normality [17,18]. Minimum power of the nents are weighted equally in the model. Initial
residual error process is guaranteed by the OLS explorations in the current data set confirmed
method, and the actual values can be calculated that the model’s individual components were ap-
from Eq. (3). proximately of the same order of magnitude
Higher model orders will generally result in (varying below 5 mV, normal EEG) (also see,
smaller errors, but apart from the expense of e.g. [28]).
increasing computing time, a problem of over- Variability tracking is performed on two con-
fitting exists [19,20]. In order to find a compro- secutive EEG periods, shifting forward in time,
mise, Akaike’s information criterion (AIC: [21]) designated ‘context’ window and ‘test’ window
includes the log-likelihood of the normalised er- (see Fig. 1). In both periods, the variance of the
ror while penalising increasing orders p: Euclidean distances between F i and their average

AIC(p)= ln
 s 2e
+
2p
(5)
is calculated. A high variance is expected when
artefacts are encountered, for which the statisti-
R0 N cal significance is examined by comparing ‘test’
variance to ‘context’ variance. For two indepen-
dent normal processes, the ratio of variances l 22/
s 2e error variance l 21 follows an F-distribution, having N2 − 1 and
186 M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196

N1 −1 degrees of freedom for test and context analogous to the AR approach: the reference
period respectively. The one-sided 100(1− a) per- histogram was obtained from the context win-
centage upper-confidence limit is found from a dow, for context lengths of N=10, 20, 40, 80
standard table for the F-distribution [29]: and 160 [s]. For increasing numbers of N, the
l 22 precision of the threshold estimate (m + 3s) in-
5 fa, N 1 − 1, N 2 − 1 (6) creases, which should lead to improved hypothe-
l 21
sis testing.
The length of the test window is fixed at N2 =
10 [s], context length is varied for N1 =10, 20, 40, 2.3. Data set
80, 160 [s], corresponding to f0.01, N 1 − 1, N 2 − 1
= 5.35, 4.81, 4.57, 4.40, and 4.31 respectively. The data used here are EEG registrations as
This approach incorporates all parameters of the measured in a feasibility study in the intensive
AR estimation and tests Eq. (6) at significance care unit (ICU) in Kuopio University Hospital,
level a = 0.01. Finland. The recordings were approved by the
Medical Ethics Committee; informed assent was
2.2. Detection of short transients obtained from the patients’ relatives. Five pa-
tients (male, age range 19–78 years) were in-
Another statistical approach to EEG validation cluded in this study; two were monitored twice,
is based on the assumption that the occurrence of resulting in seven 24 h recordings. This data is
artefacts is reflected in changing statistical prop- publicly available from the fully annotated data
erties of amplitude parameters. In the current library (DL) that was acquired in an interna-
study, we used the Slope parameter to target tional collaboration, the IMPROVE DL [35]. The
short transients. This parameter is simple to im- EEG data in the DL presents a wide range of
plement, yet very sensitive to high-frequency arte- patterns, and may be considered reasonably rep-
fact (see, e.g. [30–32]). resentative of EEG recordings in ICU.
A straightforward statistical implementation The EEG investigations were restricted to two
has been used here. In each epoch, the maximum channels, as only globally representative cerebral
Slope (1st derivative) is calculated, between all changes were being assessed; these were C3-P3
pairs of successive samples, resulting in the Slope and C4-P4 (10-20 system). As a minimal set,
histogram over a ‘context’ window of epochs. these parasagittal derivations are also known as
The histogram is expected to follow a normal showing the least number of artefacts in a clinical
distribution during ‘normal’ data conditions [33]. setting [36]. Standard Ag–AgCl type electrodes
Now we can set a highamplitude threshold at were used. Electrode impedance was kept low,
(m + 3s) based on the mean (m), and standard and electrodes were reapplied when checks or
deviation (s) for outlier detection in the ‘test’ sustained artefacts suggested deterioration. The
window (see Fig. 2). The confidence interval B input amplitude range was 9 200 mV, at a sample
− , m +3s \ defines the range in which the frequency of 100 Hz, using a 2nd order low-pass
parameter values are considered ‘normal’. In a filter at 25 Hz cut-off frequency. A comprehen-
normal distribution, this range encompasses sive review of procedures and technical details is
99.9% probability of the distribution function, given by Thomsen et al. [37].
therefore promising high specificity (few false de-
tections).
The unit epoch length for processing was cho- 2.4. E6aluation
sen at 1 s, identical to the autoregressive method.
Apart from accepting this epoch length as sta-
tionary [1], 1 s is also optimal for the accuracy of 2.4.1. Visual artefact assessment
detection, e.g. as shown in muscle artefact [34]. Two experienced human observers were in-
The Slope detection process was performed volved in the visual screening of all data, which
M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196
Fig. 1. Variability tracking: an autoregressive model of order p is fitted every ith epoch, yielding vectors F i that consist of AR-coefficients 81,…,8p and the mean merr
and standard deviation serr of the residual errors (the arrows depict the (p + 2) dimensional vectors F i and their average C (N)). Next, the variance of the Euclidean
distances between F i and C (N) is calculated. This procedure is performed in both the context window and the test window. The statistic l 22/l 21 is used to detect
significant changes in signal variability.

187
188
M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196
Fig. 2. Detection of slope outliers: In every ith epoch the maximum slope is calculated, resulting in a distribution with mean mS and standard deviation ss for the slopes
in the context window. The threshold of (mS +3ss) is then used in the test window to detect ‘short’ transients.
M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196 189

was performed on a high-resolution computer dis- A comparable measure is specificity, to assess


play, showing only one channel in pages of 10 s. the reliability of leaving unmarked those pages
This means that both channels C3 and C4 were that do not contain any artefacts—i.e. a low false
scored, without bias of the other channel. The detection rate results in a high specificity.
observers were well trained in artefact assessment Subjective differences in interpretation of
of clinical EEGs and worked independently lengthy phenomena may be magnified in the per-
through the data, browsing through the data page formance measures. However, this evaluation is
by page. Artefact-pages were classified as ‘moder- objective in view of the question ‘what average
ate artefact’ or ‘severe artefact’, and scored ac- length of EEG context is adequate for detection
cordingly by a button-push. The evaluation of artefacts?’
included on average 7,500 pages per channel per
patient, amounting to a total of more than
100 000 pages. 3. Results
Scoring was performed according to the follow-
ing guidelines:
“ No score was given to pages showing a distinct
3.1. E6aluation by human obser6ers
EEG signal, allowing for minor artefacts (i.e.
very short duration or low amplitude, e.g. mi- 3.1.1. Matching the obser6ers’ artefact scoring
nor 50 Hz/muscle activity). The observers marked approximately 1,000
“ Moderate artefact was scored in signal pages artefacts (80 min per channel) in each of the
showing: artefacts of relatively small ampli- patients, an average 7% of total recording time.
tude, total artefact time less than 1 s, or show- Observerc 1 was the most critical of the two, and
ing presence of only one or two short scored significantly more artefact pages than ob-
‘electrode-pop’ artefacts. serverc2, especially in recordings 34, 35 and 36.
“ Se6ere artefact was assigned to pages other This is obvious also from the inter-observer com-
than above: large amplitude artefacts, 50 Hz parison as depicted in Fig. 3: c 2 scored less than
interference and muscle activity of larger am- 60% of the artefacts of c 1 in those recordings.
plitudes (twice the ‘background’ amplitude). The agreement score, or consensus, represents
In general, artefact scoring is not a sharply the sensitivity towards the other observer’s scor-
defined procedure; indeed, these guidelines were ing. Mean consensus was 76%, which includes all
designed to allow for some subjectivity while try- artefact markings. The differences in artefact as-
ing to capture most of the artefacts. In view of the sessment were mostly caused by the subjective
amount of data, the exercise was kept relatively interpretation of 50 Hz interference. Some lengthy
simple, while obtaining accurate artefact periods of this type of artefact were marked by
markings. observerc 1 as ‘severe’ (recording 34, 36) or
‘moderate’ (35) and were not marked by ob-
2.4.2. Performance measures serverc2 because of relatively distinct EEG pat-
The above procedure and methods allow us to terns. When correcting for these periods, the
evaluate the performance of both observers and agreement score reached well over 80% (correc-
computer in percentages of time, in brief: tion not shown in graph). Lengthy periods of
Sensiti6ity is defined as the percentage of true serious distortion in patient 38, including 50 Hz
artefacts (true according to the observer) that are interference, were marked by both observers.
marked correctly by the detection algorithm, indi- The general agreement of the observers—as
cating the detection power of the method. Positive well as the subjective interpretation of signals and
prediction is the accompanying measure of proba- guidelines—is further illustrated by comparing
bility that indicates the percentage of automatic the scores for ‘severe’ artefact periods. The result-
markings that are considered by the observer(s) to ing higher overlap of the observers’ markings is
be true artefacts. also indicated in Fig. 3. This effect was largest in
190 M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196

the scores for patient 35. In this patient, only tending to electrodes [38]. Although no direct
2% of the recording was scored as artefact by comparison could be performed because of dif-
both observers, whereas an extra 2 h of 50 Hz ferent methodology, the observers of the current
interference in channel C3 was marked only by study acknowledged those earlier findings. The
observer c1 (adding 4% artefact time). current study focussed on the aspects of time
The consensus about 6alid EEG periods was resolution of artefact detection using a higher
very high: typically, 95 – 99% of the unmarked resolution for scoring. Therefore, scores and
periods by one observer were accorded by the derived measures are necessarily different (also
other. In part this is also explained by the low see Refs [37,39]).
occurrence of artefacts, relative to the length of The patients had been admitted to the ICU
the recordings. The number of markings that based on the diagnosis of multiple organ failure
did not match was rather small compared to the (definitions in Ref [40]). Recordings 32 and 34
7,500 pages in an average recording.
were of the same patient (age 69), showing a
generally attenuated EEG; the patient eventually
3.1.2. Artefacts and patients
died 7 days after the second recording. Record-
A previous exploration of the data set had
ings 33 and 36 were of a cardiac patient (age
resulted in an initial classification of artefacts.
The annotations had been made on a 1 min 78), without gross abnormalities in the EEG. A
time scale. Artefact occurrence was found to presumed drug effect resulted in a burst-suppres-
consist of: sustained artefacts (71%), brief elec- sion (BS) pattern in patient 35 (age 19), who
trode artefacts (21%), 50 Hz interference (6%), received a loading dose of thiopental before the
and scalp muscle potentials (2%). The absence recording. His ICU diagnose was ‘status epilep-
of eye movement artefacts and the relative ticus, suspected encephalitis’, and the EEG gen-
paucity of scalp EMG potentials reflected the erally showed high-amplitude, irregular patterns.
chosen electrode derivations and the medication Neither of the observers scored the BS pattern
or pathologically obtained state of the patients. as artefact, nor did the automatic methods.
Nursing and medical interventions and patient Patient 37 (age 39) showed low amplitude
coughing were responsible for 78% of the arte- EEG (diagnosis: meningitis Escherichia coli,
facts. Most artefacts resolved rapidly without at- hydrocephalus, septic shock). He died 10 days

Fig. 3. Inter-observer comparison: the consensus or agreement-score for marking of artefacts in the different patients (recordings
32/34, and 33/36 are the same patient). Consensus was high for ‘severe’ artefacts.
M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196 191

were generally in the lower frequency range, and


were often spread over several pages. Therefore,
the performance measures in terms of time were
found to underestimate the true detection power,
since the variability tracking only detected the
start of multiple-page artefacts. For instance, the
variability in a prolonged 50 Hz signal reduces to
zero. This problem could be solved by an al-
gorithm that ‘halts’ the context window until the
artefact is over. However, this was found a rather
intricate addition to the current model. Moreover,
Fig. 4. Optimal order estimation for the AR model using the
Akaike information criterion (AIC).
this would invalidate the investigation of different
‘context’ lengths: the period between artefacts of-
after the recording. Patient 38 (age 29) did not ten did not allow reinstating the context model.
show any grossly abnormal EEG features. Fig. 5 shows the performance for detection of
artefact onset of the AR method versus the con-
sensus of the observers. The consensus incorpo-
3.2. Performance of automatic methods rated all artefacts that were marked by both
observers, regardless of being ‘moderate’ or
3.2.1. Autoregressi6e modelling ‘severe’. The average sensitivity reached over 50%
Order selection and model validation. Before only for context lengths of 10 and 20 s. The
starting the evaluation of the variability tracking positive prediction for these contexts was signifi-
method, the autoregressive model was examined cantly different from the neighbouring series
for optimal order and normality of residual er- (ANOVA, a= 0.01). The increasing overlap for
rors. These analyses were performed in the first 3 40, 80 and 160 s context was obvious also from
min of every recording, testing both channels. increasing, high non-significance.
Fig. 4 shows the grand-averaged data for the AR detection of only the ‘severe’ artefacts was
AIC, showing a minor, but obvious inclination characterised by approximately 20–40% higher
towards AR order 5 as optimal. Therefore, this sensitivity values. However, the corresponding
order was used in all subsequent calculations. predictive accuracy was below 10%.
The normality test was performed using a C-
translation of Royston’s implementation of the 3.2.2. Slope detection
Shapiro–Wilk test [41]. Overall, 80% of the error Slope detection was very successful in the ICU
series was accepted as statistically normal (a = EEG data. The results are indicated in Fig. 6,
0.05). The normality was lower in patients 35 and showing detection performance versus the consen-
36 where the recording started with artefact peri- sus of the observers.
ods. This further validates the inclusion of the The sensitivity showed acceptable high values in
error parameters in the AR variability tracking. all patients, except in patient 38. In this patient
AR detection results. The detection of statisti- the relatively long periods of (consensus) interfer-
cally different EEG pages was performed by test- ence artefact resulted in a very wide distribution
ing the F-statistic as described in the methods of Slope values, causing insensitive threshold
section. The average variance of the AR-vectors calculation.
in the artefactual pages was significantly higher Overall, Slope detection performance was not
than in the unmarked pages, but the method different for artefact onset alone: foremost, the
proved to be rather insensitive to artefact detec- method detected short duration, transient arte-
tion in general. Two observations were made: (1) facts. Average sensitivity was highest when using
the method was most successful for artefacts of a 20 s context length (76%), rising to 84% when
higher amplitude, and (2) the detected artefacts excluding patient 38.
192 M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196

The variance of positive prediction increased artefacts [42]. Validation of EEG data acquired
with longer context windows. At the same time during such difficult conditions is imperative for
average prediction decreased: the performance did automatic analysis and incorporation into routine
not improve. However, no statistical significance practice [43].
was found. The current study aimed at detection of all
artefacts in the EEG subset of the IMPROVE
3.2.3. Combined AR and Slope detection data, focussing on context resolution. The meth-
The results of the combined methods are given ods were based on statistical rules, designed for
in Table 1, for a context of 20 s. Selection of 20 s objective detection of outlier phenomena in the
context was based on the observations above: an EEG. Two observers were involved in scrutinising
AR sensitivity of 51% (at an acceptable 0.3 true the 24 h recordings at a 10 s time resolution.
artefact prediction rate), and highest Slope perfor- Observer 1 scored a total percent artefact time of
mance. The detection process was generally char-
7.7%, observer 2 scored 5.7% as artefact.
acterised by Slope detection of high frequency
Subjective interpretation is a general problem in
artefacts and AR detection of ‘lower’ frequency
EEG evaluation studies [44]. For instance, small
artefacts. We can see that the average sensitivity
artefacts in delta frequency range amid a (normal)
has increased to 89%, which is 5% higher than the
background of larger amplitude can be underesti-
average indicated in Fig. 6 using Slope detection
alone. In the individual patients, the AR method mated even by experienced observers [11]. There-
contributed a 2–10% improvement to detection fore, the consensus score of observers was used to
power. test the performance of automatic algorithms. The
The specificity of detection was generally very performance measures were defined to reflect the
high: 93–99% of valid EEG pages was left un- percentages of time correct detection.
marked by both the Slope and the AR method. In general, the detection was performed highly
specific—partly affected by using performance
definitions in terms of time, in combination with a
4. Discussion low occurrence of artefacts. We acknowledged in
retrospect that 6alid EEG periods were sufficiently
Signal monitoring in ICU frequently presents a left unmarked by the automatic methods, i.e. im-
good mix of biologic, technologic, and extrinsic plying high specificity.

Fig. 5. Artefact detection using time-varying autoregressive variability tracking. Ellipses indicate the (m+ s) probability-contours
(mean+standard deviation) for each series. (*) denotes a significant difference in positive prediction for 10 s, 20 s context lengths.
M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196 193

Fig. 6. Performance of detection for the Slope amplitude method: detection versus context lengths. Ellipses indicate the (m+ s)
probability-contours (excluding the outlier values of patient 38). No change in performance was observed beyond 80 s context
length.

The results also show that the Slope parameter detection’ not only includes the artefacts, but also
detected most of the artefacts, and indicate that may highlight the most interesting parts of the
long context lengths were not needed for the recording. As a discriminating method, higher
investigated data set. The time-varying autore- AR-variability scores will more likely indicate
gressive variability tracking method was only rela- (low frequency) artefacts. Interestingly, AR based
tively successful. Nevertheless, when using a analysis combined with variance testing was also
combination of both methods, AR contributed up used in an early method by Vachon et al. (1978)
to 10% sensitivity by detecting low frequency [45]. They used an F-ratio of only the error
artefacts. The overall performance reached 89% variances, calculated within the residual array of
sensitivity and 53% positive prediction. This latter the AR model (1 s-epoch, p= 5). At significance
figure implies that approximately half of the auto- levels a= 0.05 and 0.10, they concluded that the
matic markings do not indicate artefacts. How- detected non-stationary waveforms also needed
ever, positive prediction is somewhat adversely additional pattern recognition. The current ap-
influenced because of the consensus data from the proach incorporated all parameters and residuals
human observers, which may also have excluded of the AR estimation and tested formula Eq. (6)
some possible true artefacts. In addition, it would at significance level a= 0.01, while incorporating
seem sensible to err towards high sensitivity (at a longer context periods.
cost of lesser positive prediction); this would al- Context related detection was implemented here
low observers to visually analyse events detected as a ‘history’ based detection, therefore still differ-
by automation, and categorise them as artefact/ ent from human screening. Human screening of-
non-artefact. This would be in the knowledge that ten also involves ‘going back’ in the data, which
very few artefacts were missed by automation. If influences decision about the EEG being artefac-
the aim eventually were to develop ‘event detec- tual or not. In the current implementation, the
tion’ as opposed to ‘artefact detection’, the posi- automatic methods were designed for objective
tive prediction would be greatly increased. on-line processing, testing for statistical signifi-
Based on the current findings, especially the cance. As an illustration, Figs. 1 and 2 represent
EEG-like deviations found by AR variance detec- true data from the current study. Both figures
tion may be defined as ‘events’ rather than arte- indicate automatically detected EEG ‘events’ in
facts. Therefore, in clinical recordings ‘event the test window that were not marked by the
194 M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196

Table 1
Artefact detection using a context length of 20 s: slope detection and autoregressive variability tracking combined

Patients 32 33 34 35 36 37 38 Overall

Sensitivity (%) 89 79 94 97 87 98 75 89
Pos. prediction (%) 49 61 57 50 58 50 49 53

observers, while clearly displaying deviating phe- N length in samples of EEG period
nomena in the EEG. P order of autoregressive (AR) model
Artefacts often occur in more channels simulta- S signal vector
neously, therefore a detected ‘event’ (or ‘candi- e residual error vector
date’ artefact) is usually checked visually in all S summation
channels displayed together. This was also ob- 81,…,8p AR coefficients
served in the current data set, but not incorpo- 8 AR vector (coefficients)
`
rated in the algorithms or the evaluation. Z matrix of p times N elements
D
Combining channels has been described by vari- J quadratic cost vector (of error power)
ous authors (e.g. [4,46,47]), implementing such s 2e error amplitude variance
spatial (cross-channel) processing mainly for the R0 autocorrelation, or power, of s(n)
identification of eye-artefacts using rule-based sys- F i vector of 81,…,8p, mean merr, stan-
tems. Another recently described system [48] used dard deviation serr of the residual
artificial neural networks to pre-process EEG fea- errors
tures, and discriminated between (eye-) artefacts, N1, N2 number of epochs in ‘context’, ‘test’
muscle artefacts and electrode artefacts in an ad- window respectively
ditional knowledge-based stage. The system cor- C (N 1), C (N 2) average of F i
rectly identified 90% of artefacts in the initial l 21,l 22 variance of F i (euclidian distance to
evaluation. Unfortunately, the system was not C (N 1), C (N 2))
evaluated in a large clinical data set, and temporal fa, N 1−1, N 2−1 significance of the ratio of variances
context was not evaluated systematically. l 21,l 22
The current study provides some starting points
for choosing the optimal length of the context Slope distribution
periods in automatic analysis. Optimal context − minus infinity
was concluded to be as short as 20 – 40 s. ms mean
ss standard deviation

Acknowledgements

This project was supported by the Co-operation References


Centre of the Brabant Universities, project 94CH.
We are also very grateful for the co-operation
[1] J.A. McEwen, G.B. Anderson, Modeling the stationarity
with colleagues from the IMPROVE project: Dr and gaussianity of spontaneous electroencephalographic
P. Prior, Dr C.E. Thomsen and Mr R. Pottinger. activity, IEEE Trans. Biomed. Eng. 22 (1975) 361 – 369.
[2] B.H. Jansen, A. Hasman, R. Lenten, Piecewise EEG
analysis: an objective evaluation, Int. J. Biomed. Comput.
Appendix A. Nomenclature 12 (1981) 17 – 27.
[3] G. Bodenstein, W. Schneider, C.V.D. Malsburg, Comput-
Autoregressi6e model erized EEG pattern classification by adaptive segmenta-
s(n) discrete time signal tion and probability-density-function classification.
M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196 195

Description of the method, Comput. Biol. Med. 15 (1985) [21] H. Akaike, A new look at the statistical model identifica-
297 – 313. tion, IEEE Trans. Autom. Control 19 (1974) 716 – 723.
[4] A. Värri, K. Hirvonen, J. Hasan, P. Loula, V. Häkkinen, [22] C.W. Anderson, E.A. Stolz, S. Shamsunder, Multivariate
A computerized analysis system for vigilance studies, autoregressive models for classification of spontaneous
Comp. Meth. Progr. Biomed. 39 (1992) 113–124. electroencephalographic signals during mental tasks,
[5] V. Jagannathan, J.R. Bourne, B.H Jansen, J.W. Ward, IEEE Trans. Biomed. Eng. 45 (1998) 277 – 286.
Artificial intelligence methods in quantitative electroen- [23] S. Cerutti, D. Liberati, G. Avanzini, S. Franceschetti, F.
cephalogram analysis, Comput. Prog. Biomed. 15 (1982) Panzica, Classification of the EEG during neurosurgery.
249 – 258. Parametric identification and Kalman filtering compared,
[6] B.H. Jansen, B.M. Dawant, Knowledge-based approach J. Biomed. Eng. 8 (1986) 244 – 254.
to sleep EEG analysis—A feasibility study, IEEE Trans. [24] L.H. Zetterberg, Estimation of parameters for a linear
Biomed. Eng. 36 (1989) 510–518. difference equation with application to EEG analysis,
[7] B.H. Jansen, Quantitative analysis of electroencephalo- Math. Biosciences 5 (1969) 205 – 226.
grams: is there chaos in the future?, Int. J. Biomed. [25] B.H. Jansen, J.R. Bourne, J.W. Ward, Autoregressive
Comput. 27 (1991) 95–123. estimation of short segment spectra for computerized
[8] E. Flooh, E. Körner, G. Ladurner, H. Lechner, EEG- EEG analysis, IEEE Trans. Biomed. Eng. 28 (1981) 630 –
Nachtschlafableitungen: auswertung mittels automatis- 638.
cher Datenanalyse (EEG-night-sleep-recordings: [26] J. Pardey, S. Roberts, L. Tarassenko, A review of para-
automatic analysis. In German), Z. EEG-EMG 13 (1982) metric modeling techniques for EEG-analysis, Med. Eng.
157 – 160. Physics 18 (1996) 2 – 11.
[9] D.P. Brunner, R.C. Vasko, C.S. Detka, J.P. Monahan, [27] F.D.J. Dunstan, R.W. Marshall, The detection of arte-
C.F. Reynolds III, D.J. Kupfer, Muscle artifacts in the facts in EEG series, Stat. Med. 10 (1991) 1719 – 1731.
[28] S. Cerutti, D. Liberati, P. Mascellani, Parameter extrac-
sleep EEG: automated detection and effect on all-night
tion in EEG processing during riskful neurosurgical oper-
EEG power spectra, J. Sleep Res. 5 (1996) 155–164.
ations, Signal Proc. 9 (1985) 25 – 35.
[10] B.H. Jansen, J.R. Bourne, J.W. Ward, Identification and
[29] D.C. Montgomery, G.C. Runger, Applied statistics and
labelling of EEG graphic elements using autoregressive
probability for engineers, Wiley, New York, 1994.
spectral estimates, Comput. Biol. Med. 12 (1982) 97–106.
[30] M. Scherg, Simultaneous recording and separation of
[11] J.S. Barlow, Artifact processing (rejection and minimiza-
early and middle latency auditory evoked potentials, Elec-
tion) in EEG data processing, in: F.H. Lopes da Silva,
troenceph. Clin. Neurophysiol. 54 (1982) 339 – 341.
W.H. Storm van Leeuwen (Eds.), Handbook of Elec-
[31] H. Hinrichs, H.J. Heinze, M.R. Gaab, Neurophysiologis-
troencephalography and Clinical Neurophysiology, Re-
ches monitoring bei neurochirurgischen gefäßoperationen:
vised edition, Vol. 3B: Applications of Analytical
spezifische technische anforderungen und deren umset-
Techniques, Elsevier, Amsterdam, 1986, pp. 15–62.
zung (Neurophysiological monitoring of neurosurgical
[12] J.S. Barlow, Muscle spike artifact minimization in EEGs vessel-operations: technical specification and implementa-
by time-domain filtering, Electroenceph. Clin. Neuro- tion. In German), Z. EEG-EMG 23 (1992) 195 – 202.
physiol. 55 (1983) 487–491. [32] H. Hinrichs, H. Feistner, H.J. Heinze, A trend-detection
[13] J.S. Barlow, Automatic elimination of electrode-pop arti- algorithm for intraoperative EEG monitoring, Med. Eng.
facts in EEG’s, IEEE Trans. Biomed. Eng. 33 (1986) Physics 18 (1996) 626 – 631.
517 – 521. [33] P.J.M. Cluitmans, J.W. Jansen, J.E.W. Beneken, Artefact
[14] V. Strejc, Least squares parameter estimation, Automat- detection and removal during auditory evoked potential
ica 16 (1980) 535 – 550. monitoring, J. Clin. Mon. 9 (1993) 112 – 120.
[15] D.A. Pierce, Testing normality in autoregressive models, [34] M. van de Velde, G. van Erp, P.J.M. Cluitmans, Muscle
Biometrika 72 (1985) 293–297. artefact detection in the normal human awake EEG,
[16] S.S. Shapiro, M.B. Wilk, An analysis of variance test for Electroenceph. Clin. Neurophysiol. 107 (1998) 149 – 158.
normality (complete samples), Biometrika 52 (1965) 591– [35] I. Korhonen, J. Ojaniemi, K. Nieminen, M. van Gils, A.
611. Heikelä, A. Kari, Building the IMPROVE data Library,
[17] S. Shapiro, M.B. Wilk, H.J. Chen, A comparitive study of IEEE Eng. Med. Biol. 16 (1997) 25 – 32.
various tests for normality, Am. Stat. Ass. J. 63 (1968) [36] B. Schultz, R. Bender, A. Schultz, I. Pichlmayr, Reduk-
1343 – 1372. tion der anzahl von EEG-ableitungen für ein rou-
[18] R. Bender, B. Schultz, A. Schultz, I. Pichlmayr, Testing tinemäßiges monitoring auf der intensivstation
the gaussianity of the human EEG during anaesthesia, (Electroencephalographic monitoring in the ICU — Re-
Meth. Inf. Med. 31 (1992) 56–59. duction of the number of recorded channels. In German),
[19] J. Makhoul, Linear prediction: a tutorial review, Proc. Biomed. Technik 37 (1992) 194 – 199.
IEEE 63 (1975) 561 –580. [37] C.E. Thomsen, J. Gade, K. Nieminen, R.M. Langford,
[20] G.E.P. Box, G.M. Jenkins, Time series analysis, forecast- I.R. Ghosh, K. Jensen, M. van Gils, A. Rosenfalck, P.F.
ing and control, Revised edition, Holden-Day, London, Prior, S. White, Collecting EEG signals in the IMPROVE
1976. data library, IEEE Eng. Med. Biol. 16 (1997) 33 – 40.
196 M. 6an de Velde et al. / Computer Methods and Programs in Biomedicine 60 (1999) 183–196

[38] I.R. Ghosh, P.F. Prior, S.R. White, J. Gade, K. Jensen, tic, D.W. Klass, Interobserver variability in EEG inter-
R.M. Langford, A. Rosenfalck, C.E. Thomsen, Artefact pretation, Neurology 35 (1985) 1714 – 1719.
assessment in prolonged EEG-polygraphic recordings in [45] B. Vachon, B. Dubuisson, D. Samson-Dollfus, Etude
intensive care, Electroenceph. Clin. Neurophysiol. (In automatique de l’EEG: une méthode de détection des non
press). stationnarites (Automatic EEG processing: a method for
[39] M. van Gils, A. Rosenfalck, S. White, P. Prior, J. Gade, detection of non-stationarities. In French), Int. J. Biom.
L. Senhadji, C.E. Thomsen, I.R. Ghosh, R.M. Langford, Comput. 9 (1978) 147 – 162.
K. Jensen, Signal processing in prolonged EEG record- [46] T. Pietilä, S. Vapaakoski, U. Nousiainen, A. Värri, H.
ings during intensive care, IEEE Eng. Med. Biol. 16 Frey, V. Häkkinen, Y. Neuvo, Evaluation of a computer-
(1997) 56 – 63. ized system for recognition of epileptic activity during
[40] K. Nieminen, R.M. Langford, C.J. Morgan, J. Takala, A. long-term EEG recording, Electroenceph. Clin. Neuro-
Kari, A clinical description of the IMPROVE data li- physiol. 90 (1994) 438 – 443.
brary, IEEE Eng. Med. Biol. 16 (1997) 21–24. [47] M. Nakamura, T. Sugi, A. Ikeda, R. Kagigi, H.
[41] P. Royston, Shapiro–Wilk W test and its significance Shibasaki, Clinical application of automatic integrative
level. Algorithm AS R94, Appl. Stat. 44 (1995) 4. interpretation of awake background EEG: quantitative
[42] D.W. Klass, The continuing challenge of artifacts in the interpretation, report making, and detection of artifacts
EEG, Am. J. EEG Technol. 35 (1995) 239–269. and reduced vigilance level, Electroenceph. Clin. Neuro-
[43] P. Prior, The rationale and utility of neurophysiological physiol. 98 (1996) 103 – 112.
investigations in clinical monitoring for brain and spinal [48] J. Wu, E.C. Ifeachor, E.M. Allen, W.K. Wimalaratna,
cord ischaemia during surgery and intensive care, Comp. N.R. Hudson, Intelligent artefact identification in elec-
Meth. Prog. Biomed. 51 (1996) 13–27. troencephalography signal processing, IEE Proc. Sci.
[44] G.W. Williams, H.O. Lüders, A. Brickner, M. Goormas- Meas. Technol. 144 (1997) 193 – 201.

You might also like