Professional Documents
Culture Documents
Abstract—Existing approaches for quantifying mental while a subject undergoes a cognitive challenge. It has been
workload using electroencephalography often rely on probe demonstrated that the amplitude of the P300 varies inversely
stimuli to elicit stereotyped neural responses such as the P300 with task-complexity [9]. Although measuring ERPs using
wave. Here we explore probe-independent algorithms for auditory stimuli has been successful in evaluating mental
classifying three levels of task-complexity in a flight simulator workload in controlled settings, this approach may be less
experiment. Using input features derived from estimates of the practical when the cognitive tasks themselves involve a
average power in five frequency bands, we test a variety of strong auditory component, and hence the introduction of
classifiers, using 10-fold cross-validation to estimate test set extraneous sounds could be disruptive. Therefore, an EEG-
error. Classification accuracy was above 50% (chance based metric of mental workload that exploits information
performance: 33.33%) in 13 of 20 subjects on at least one of the from non-evoked “background” neural activity is desired.
four recorded channels, and reached as high as 87.35%. There
The goal of this research was to develop an EEG-based
was strong variability across subjects in both the strength and
direction of the relationships between the input features and
algorithm to classify different levels of task complexity that
task-complexity labels, suggesting that classifiers using these does not rely upon ERPs. By choosing subjects with a
input features must be trained to the individual to be useful. similar level of task-experience, we partially control for
differences in the capacity to perform the experimental task
I. INTRODUCTION and therefore use task-complexity as a surrogate for mental
workload. As we were particularly interested in
Mental workload can be described as a ratio between
understanding the response of aircraft pilots to the cognitive
task-complexity and a person’s cognitive capacity to meet
demands imposed by their flight-missions, we used flight
task demands [1]. This description captures the intuitive idea
simulator tasks of varying challenge-level as our
that mental workload depends both on external factors such
experimental paradigm. Furthermore, since pilots are
as the objective difficulty of required tasks, and internal
typically in persistent radio or intercom communications via
factors such as a person’s past experiences and skill set.
headset during flight, this also represents a scenario that
There is a growing body of research focused on developing
would be particularly well-suited to a non-ERP-based index
quantitative methods to assess mental workload in order to
of cognitive workload. Signal processing methods were used
improve the mental resiliency of people in high stress
to extract computational features from the EEG, and machine
environments. Various metrics derived from physiological
learning techniques were used to classify the data and assess
signals such as heart rate, blood pressure, galvanic skin
algorithm performance.
response, and eye-gaze have been investigated as biomarkers
of mental workload [2-4]. These signals have been used to II. METHOD
distinguish mental workload levels with accuracies
significantly better than chance, but there are still no widely A. Data Acquisition
accepted standards or commercial products for mental EEG data were collected from 20 United States Naval
workload monitoring. Academy (Annapolis, MD, USA) midshipmen between the
With recent improvements in the ease-of-use, reliability, ages of 19 and 23, all with basic flight training, while they
and costs of portable electroencephalography (EEG) systems, performed visuo-motor tasks in a flight simulator
there has been increasing interest in using brain signals to (Prepar3D® v1.4, Lockheed Martin Corporation, Orlando,
measure mental workload [5]. It is hypothesized that EEG FL, USA) under three levels of task-complexity. The three
offers a more direct assay of mental workload than other tasks were selected from predefined flight training exercises
physiological biomarkers because of the proximity of EEG developed with advice from experienced United States Navy
sensors to the neural substrates of cognitive stress [6]. pilots and distinguished in challenge-level by differences in
A common method of using EEG to assess mental weather intensity and mission requirements. Specifically, the
workload involves delivering an auditory probe to evoke three tasks were: 1) Easy: maintain aircraft’s current altitude
event-related potentials (ERPs) such as the P300 wave [7-8] (4000 ft), heading (180º), and airspeed (180 kn). The weather
was defined by no clouds, precipitation, or wind, and
unlimited visibility; 2) Medium: maintain the aircraft’s
*Research partially supported by the Lockheed Martin Corporation,
USA (project # 13051318) and DARPA Service Academies Challenge current heading (180º), airspeed (180 kn), and a “wings-
Award HR0011411994. level” attitude, while continuously making altitude changes
M. K. Johnson nad J. A. Blanco are with the United States Naval between 4000 and 3000 ft, with ascent and descent rates of
Academy, Annapolis, MD 21402 USA (phone: 817-721-9303 and 410-293- 1000 feet per minute (fpm). The sky was completely overcast
6184; e-mail: michaeljohnson015@gmail.com and blanco@usna.edu). (1/16 mi of visibility), but there was no precipitation and no
R. J. Gentili, K. J. Jacquess, H. Oh, and B. D. Hatfield are with the
University of Maryland, College Park, MD 20742 USA (phone: 301-405-
wind; and 3) Hard: maintain the aircraft’s current airspeed
2485; email: bhatfiel@umd.edu). (180 kn), while changing heading between 180 and 090º at a
15-degree angle of bank, ascending while turning right and
978-1-4673-6389-1/15/$31.00 ©2015 IEEE 581
(g.SAHARAsys®, g.tec medical engineering GmbH,
Schiedlberg, Austria). Electrode impedances were measured
below 5 kOhm. The right mastoid was used as ground for the
system and the left ear as the hardware reference. Data were
also collected from the right ear for later re-referencing. EEG
were sampled at a rate of 512 Hz, re-referenced in an EEG
analysis software (BrainVision Analyzer 2, Brain Products
GmbH, Munich, Germany) to an average-ear montage, and
digitally lowpass filtered (in forward and reverse to give zero
phase response) with a Butterworth filter with a cutoff
frequency of 50 Hz and 48 dB/octave rolloff..
Data were visually inspected for the presence of eye-blink
and muscle-activity artifacts, and these segments (47% of all
recorded data) were excluded from analysis. In addition, all
data within a 600 ms window following the onset of an
auditory stimulus were excluded in order to eliminate P300
responses. Fig. 2 shows a typical P300 waveform, generated
Figure 1. An individual performs a flight task in the simulator by averaging 30 post-stimulus intervals in a single trial.
while wearing an EEG cap. The three experimental tasks required
subjects to operate a T-6A Texan II SP2 United States Navy B. Feature Extraction
aircraft using the control stick, throttle, and rudder pedals.
The remaining data set was segmented into 1-second
descending while turning left at 1000 fpm. The sky was (512-sample) epochs for analysis, based primarily on a desire
completely overcast as in the Medium task, with no to have no more than 1-second lag in an envisioned real-time
precipitation, but with the presence of a moderate (16 kn) monitoring system. Linear trends were removed from each
easterly wind. One trial per task difficulty was conducted, in segment by subtracting the least squares line of best fit. We
random order, consisting of a 1-minute setup period followed estimated the power spectral density (PSD) of each segment
by a 10-minute flight segment. Additionally, for use in a using Welch’s method (8 sections with 50% overlap;
separate analysis, audible stimuli were administered to Hamming window applied to each section.) [11]. The average
participants via ear-bud speakers with random inter-stimulus powers in each of the Delta (1-4 Hz), Theta (4-8 Hz), Alpha
intervals between 6 and 30 seconds to evoke the P300 (8-13 Hz), Beta (13-30 Hz), and Gamma (30-40 Hz) [12]
response. Since the goal of this work was to analyze bands were then computed by integrating this PSD estimate
background EEG only, data surrounding these stimuli were over the corresponding frequency range. Finally, feature
excluded using a procedure described below. Fig. 1 vectors were formed for each 1-second epoch by
illustrates the experimental setup. concatenating the average power estimates for each of the
Four active, dry (gel-free) electrodes were used to five frequency bands.
measure EEG signals from sites along the frontal (Fz),
C. Classification
fronto-central (FCz), central (Cz), and parietal (Pz) midline,
based on the International 10-20 System [10]. The EEG cap Several different classifiers were then trained to predict
was connected to an amplifier with an online band-pass filter task-complexity based on EEG features. Specifically, the
from 0.01 to 60 Hz (g.USBamp®, g.tec medical engineering classification techniques used were: 1) k-Nearest Neighbors
GmbH, Schiedlberg, Austria) through a driver-interface box (kNN); 2) Linear Discriminant Analysis (LDA); 3) Quadratic
Discriminant Analysis (QDA); 4) Naïve Bayes; 5) Decision
Trees (with and without pruning); and 5) Support Vector
Machines (SVM). Test set error was estimated using 10-fold
cross-validation, with percent correct classification used as
the performance metric.
Additionally, in an attempt to improve classifier
performance by deemphasizing potentially irrelevant
features, we performed principal component analysis on the
unnormalized five-element feature data, reducing the number
of dimensions to three (by taking the first three principal
component projections as the new features), accounting for
nearly 90% percent of the data variance on average.
III. RESULTS AND DISCUSSION
A. Average Power Computations
As a preliminary assessment of the relationship between
Figure 2. A typical P300 waveform generated by averaging 30 post- frequency band power and task-complexity, for each channel
stimulus intervals. The peak of the response is located at approximately within a subject, the powers in each frequency band were
250 ms post-stimulus, and a return to baseline is seen by around 600 averaged over all one-second segments for each task. Fig. 3
ms, supporting the choice of the 600 ms exclusion window.
shows the average power in each frequency band for channel
FCz in Subject 6, Subject 8, Subject 10, and Subject 11.
582
Figure 4. Example of a Linear Discriminant Analysis classifier (Subject
8, channel FCz). The input features were the first, second, and third
principal components of the average power of the five frequency bands
considered. The accuracy for this classifier was 82.91%.
584