0% found this document useful (0 votes)
62 views6 pages

EEG-based Confusion Recognition Using Different Machine Learning Methods

Uploaded by

Sneha Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views6 pages

EEG-based Confusion Recognition Using Different Machine Learning Methods

Uploaded by

Sneha Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

2021 2nd International Conference on Artificial Intelligence and Computer Engineering (ICAICE)

EEG-based Confusion Recognition Using Different


Machine Learning Methods
2021 2nd International Conference on Artificial Intelligence and Computer Engineering (ICAICE) | 978-1-6654-2186-7/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICAICE54393.2021.00160

Shuwei He† Yanran Xu*† Lanyi Zhong*†


School of Electronic Engineering College of Medicine and Biological International School
Xidian University Information Engineering Beijing University of Posts and
Xian, China Northeastern University Telecommunications
hsw07251998@163.com Shenyang, China Beijing, China
* *
20196102@stu.neu.edu.cn 1145293663@bupt.edu.cn
†These authors contributed equally.

Abstract—Massive Open Online Course (MOOC) has emerged In this research, we propose a brain-computer interface
as a key trend. As a way of teaching online, the main shortcoming system based on real-time EEG to solve some problems of
of MOOC is lacking feedback because there is a distance in both webcast courses. The main contributions of the research are as
time and space between teachers and students. This follows:
study proposes the confusion recognition system based on
Electroencephalography(EEG). We apply machine learning •Exploration of machine learning and deep learning for the
methods, including Naive Bayes, KNN, Random Forest, XGBoost, classification of confusion.
and also a deep learning method, LSTM, on the EEG data set
respectively to detect whether a student feel confused. We find that
•A data mining strategy with high performance achieving
LSTM shows better performance than any machine learning 78.1% accuracy.
methods we use. The average accuracy of LSTM classifier is 78.1%. •A visualization of emotion classification results through
This study shows the significance of detecting confusion through Raspberry Pi.
EEG and helping students in improving learning efficiency.
The rest sections of this paper is organized as follows: A
Keywords—EEG, Brain-computer interface (BCI), Emotion concise summary of related works on EEG-based learning
recognition, Classification, Machine learning, LSTM emotion recognition is presented in Section 2. Section 3
describes the mechanism including EEG, confusion in learning
I. INTRODUCTION and algorithms. The data source, preprocessing and results are
Under the influence of multiple factors such as the presented in Section 4. Respectively, the discussion including
development of network technology, changes in the social the effective, significance application limitations and future
environment and lifestyle changes brought about by the work and the conclusion are available in Sections 5 and 6.
epidemic, online courses and tele-education are developing
rapidly, and may continue to develop rapidly. MOOC can II. RELATED WORK
provide services to many students of different ages at the same Silvia considers confusion as a learning emotion that
time, but it also has its shortcomings. One of the most serious encompasses knowledge [2]. In the process of learning, the
shortcomings is the lack of timely feedback on teaching. In most learner is accompanied by thinking about and understanding the
online education platforms, teachers cannot see the expressions problem, and confusion arises when the student does not
and actions of students, so it is difficult to judge whether understand the current learning or does not know how to solve
students understand the knowledge points through traditional the current activity to be performed, which in turn creates a
methods. It is not easy for teachers to control the teaching speed. conflict or dilemma. D'Mello analyzed the emotions generated
Therefore, clearly positioning the key and difficult points of by different learners in various learning environments and
knowledge in teaching has become an important task. We try to found that learning confusion was the second most common of
judge students' emotions through EEG signals, to discover the the fifteen emotions in the learning process [3]. Therefore, the
difficulties of knowledge accurately and quantitatively. study of learning confusion is an important issue that exists
Rhythmic fluctuations in the EEG signal occur within several widely in the learning process. Current detection of learning
particular frequency bands, and the relative level of activity confusion focuses on four approaches: text-based information
within each frequency band has been associated with brain left by learners, learner behavior data, learner physiological
states such as focused attentional processing, engagement, and signal information (e.g. EEG), and pictures of learners' facial
frustration and these functions are important for teaching. expressions. Researchers at Stanford University applied natural
Recently, the invention of simple, low-cost, portable EEG language processing to identify confusing posts by capturing
monitoring equipment makes it possible to detect students' EEG textual data such as comment messages left by learners on
signals from the surface of the scalp. It is ready for this discussion boards of the MOOC platform and questions asked
technology to enter the school from the laboratory [1]. For after class [4]. They developed a YouEDU teaching aid system
example, the NeuroSky Mindset is an audio headset and it is that classified messages by building classification algorithms to
equipped with a single-channel EEG sensor. find learners in confusion and recommended responding video
clips for relevant students at the same time. Baker built
classification models to identify emotions such as frustration,

978-1-6654-2186-7/21/$31.00 ©2021 IEEE 826


DOI 10.1109/ICAICE54393.2021.00160
Authorized licensed use limited to: Vivekananda Institute of Professional Studies. Downloaded on July 01,2024 at 08:34:44 UTC from IEEE Xplore. Restrictions apply.
boredom, and confusion using machine learning algorithms by boredom are the main emotional states experienced during
collecting learning behavior data left by learners on a math learning process. Kort [12] proposed a novel model for science,
tutoring platform called ASSISTment [5]. Based on this study, math engineering and technology (SMET) learners. The model
Botelho et al. applied RNN and its improved models, such as can be summarized as Figure1.
LSTM and GRU and other sequence model algorithms, to
implement deep learning-based learning confusion recognition
[6]. They found that the deep learning-based approach can
detect emotions such as confusion better than traditional. It was
found that the deep learning-based approach could better detect
emotions such as confusion than the traditional machine
learning approach. Wang collected EEG signal information
from students while learning MOOC videos, and processed
EEG data to build a classification model of learning confusion
based on EEG through which they could predict when students
felt confused while watching MOOC videos [7]. It was found
that due to the difficulty of processing EEG data and the high
dimensionality of the data, the accuracy of its identification was
low, only 51%-56%, which was not significantly different from
Figure 1. Kort’s model relating learning and emotions
the method of direct observation by applied educational
researchers. Learners' learning process will start from the first quadrant
III. MECHANISM and reach the fourth quadrant after a round of counterclockwise
rotation. However, if students cannot overcome the negative
A. Electroencephalography emotions in the third quadrant in time, learners will be
Emotion can be expressed in facial expression, tone, text completely out of learning. As a common learning emotion,
and many other ways. Among all these expressions, brain confusion is important. If learners are always confused in the
activity is very difficult to fake[8]. Brain activity performs as learning process, it will have a certain impact on their
the form of brain waves. EEG is a common way to capture the confidence and enthusiasm, and eventually lead to poor
brain waves. EEG is an image obtained by amplifying the learning effect.
spontaneous biological potential of the brain using invasive or C. Machine Learning Methods
noninvasive devices[9]. Conventionally, it can be recorded by
electrodes connected to the scalp with conductive gel. 1) Naive Bayes
Naive Bayesian classifier is a simple probability classifier
According to frequency and amplitude range, EEG can be based on Bayes' theorem. Naive Bayesian classifier assumes
divided into 5 different waves (Delta, Theta, Alpha, Beta and that each feature of the data set is independent of each other,
*DPPD  DV VKRZQ LQ 7$%/( პ [10]. Each wave indicates and then calculates the probability of the subject belongs
different information. Delta wave and theta wave are related to to each category respectively. The classification of the
sleep. Appearance of Delta wave generally represents that subject is completed by comparing the probability and selecting
people have entered a deep sleep period. Theta wave shows up the category with the greatest probability. To put it simply, the
when people feel sleepy or in the early stage of sleep. Alpha Bayesian classifier uses the prior probability and calculates the
wave is the most significant wave among all rhythmic waves. posterior probability through publicity. Therefore, different
When people feel relaxed or close eyes, Alpha wave can be prior probabilities also affect the accuracy of classification. We
detected. Beta wave means a state of concentration, active choose Gaussian distribution and Bernoulli distribution as the
thinking, or emotional fluctuation. Gamma wave occurs when prior probability to compare their classification accuracy.
people are excited by strong stimuli or cross modal perceptual Bayes’ Theorem [13] uses conditional probabilities to
processing. determine the likelihood of Class A based on Evidence, B, as
follows:
TABLE I. FREQUENCY AND AMPLITUDE OF RHYTHMIC EEG ACTIVITY
PATTERNS

Wave Frequency (Hz) Amplitude (ȝV)   


¨ 0-4 20-100
ș 4-7 10 The Bayesian classifier does not have high requirements for
Į 8-13 2-100 the training data. We set different prior probabilities, the
ȕ 14-30 5-10 resulting classification equations will also be different. We use
Ȗ >30 -
two different prior probabilities to generate the classifier and
B. Confusion in learning activities compare the classification results, Gaussian distribution, and
Bernoulli distribution.
In recent years, more and more researchers start to pay
attention to the role of emotion in learning activities. S
D’Mello[11] analyzed 24 selective studies, and came to the
conclusion that engagement/flow, confusion, frustration and

827

Authorized licensed use limited to: Vivekananda Institute of Professional Studies. Downloaded on July 01,2024 at 08:34:44 UTC from IEEE Xplore. Restrictions apply.
2) KNN
K-nearest neighbor (KNN) classifier assigns unlabeled
subjects to their most similar labeled classes. The similarity is
represented by the distance between the characteristics of each
subject. The parameter K decides the number of the neighbors.
The target subject to be classified will be classified into the
class that wins the most votes among the K neighbors. Figure2
shows a simple example to illustrate how KNN classifier works.

Figure 3. Simplified random forest

4) XGBoost
XGBoost is one of the boosting algorithms and the thought
of it is to integrate plenty of weak classifiers to form a strong
Figure 2. An example of KNN. When K=1, the subject will be classified to
class A. But when K changes to 5, the subject will be labeled as class B. So K one. The tree model XGBboost is using is the classification and
is a very important parameter. regression tree. XGBoost is an improvement on the GBDT
algorithm. It can further reduce the complexity of calculation.
There are three commonly used ways to calculate the Facing with a large amount of data, the XGBoost algorithm can
distance between subjects: Euclidean Metric, Manhattan perform operations at the same time and divide the data in turn
Distance and Minkowski Distance. The following equation according to different characteristics to form a tree sequence.
shows the mathematical formula of Euclidean Metric [14]. This algorithm is simpler and more effective. It can make
complex data present in an orderly and concise arrangement.
The process of the XGBoost algorithm on the target data is as
 
follows.
In (2), p and q are two subjects ˈ and n means the
characteristics of the two subjects. The different calculation    
method will make the neighbors found different, which will
affect the classification results. In (3), f(x)) represents the target tree model data, F is the set
3) Random Forest of all data, represents the predicted value of the data
Random Forest was proposed by Breiman in 2001 [15]. It is obtained after XGBoost operation. The next step is to construct
an ensemble learning method that integrates Bootstrap a tree sequence model based on the obtained data. The
aggregation (Bagging) and CART decision tree. The n samples calculation formula for building a tree sequence model is as
are randomly extracted from data set N by bagging to generate follows,
sub-data sets. Multiple CART decision trees grow freely
without pruning on different training sets. These trees can be
regarded as weak classifiers. They are combined to form a   
random forest, which is a strong classifier and the classification
results are generated by voting. Fig.3 shows a simplified In (4), obj represents the target data tree-like construction
random forest. For a decision tree, the training subset it is based model, and l represents the error between the predicted value
on is generated by bagging. Therefore, the sample that is not and the actual value. The smaller the error value is, the more the
involved in the construction of the tree is called an Out-Of-Bag accurate prediction and the better the data arrangement is.
sample (OOB), and the probability that the OOB sample is mis- 5) LSTM
predicted is defined as an OOB error. LSTM firstly, published by Jürgen Schmidhuber and Sepp
Random forest is flexible and practical. In the currently Hochreiter in 1997, is a RNN architecture [16]. The core of
proposed machine learning algorithms, random forest is easy to LSTM lies in the addition of three gates and a memory unit,
implement and has good accuracy. It can effectively deal with which are called forget gate, input gate, and output gate. The
large data sets with high dimensions without dimension function of them is to control the transmission of information in
reduction. the evolution direction and increase control gates to solve the

828

Authorized licensed use limited to: Vivekananda Institute of Professional Studies. Downloaded on July 01,2024 at 08:34:44 UTC from IEEE Xplore. Restrictions apply.
problems of input and output. It is mainly realized through a the middle of a topic. The student wears a wireless single-
neural layer and a point-by-point multiplication operation and channel Mindset. It is for measuring activity over the frontal
it solves the problems of long-term dependence, gradient lobe and the NeuroSky’s API is used to collect the signal
disappearance, and gradient explosion. The LSTM network streams.
model structure is shown in Figure 4 and the calculation
formula for forward transfer of the model is as follows: B. Result and analysis
We use machine learning methods and a deep learning
ft V˄:f u [h t 1 , X t ]  b f˅ (5) method to classify the confusion EEG respectively. Machine
learning methods including Naive Bayes, KNN, Random Forest
Ct tanh(Wc <[h t 1 , X t ]  b c ) (6) and XG Boost. The deep learning method is based on LSTM.
C t f t *C t 1  i t *C t (7) The accuracy of each method LVVKRZQLQ7DEOHჟ

O t V(:o [h t 1 , X t ]  b o ) (8) TABLE II. ACCURACY OF TRAINED MODELS

h t O t * tanh(C t ) (9) Model Accuracy


In the formula, Xt represents the input at time t, Ot represents
Naive Bayes (Gaussian distribution) 58.4%
the output at time t, ht-1 represents the memory information from
the previous moment, ht represents the memory information at Naive Bayes (Bernoulli distribution) 52.4%
time t.ft represents the value generated by the output ht-1 at the
previous moment and the input at time t at the same time, and KNN 56.5%
the values range from 0 to 1 .It determines the information from
~ Random Forest 66.1%
the previous moment Ct. C represents the new information
value generated by the tanh activation layer. it represents the XG Boost 68.4%
output of the input gate at time t. ı UHSUHVHQWV WKH VLJPRLG
activation layer in the model to get an initial output. W is the LSTM 78.1%
weight coefficient of the corresponding gate and b is the output
coefficient of the corresponding door. The original author of the data set has trained the classifiers.
The average classification accuracy of the classifiers for user-
defined confusion state is 56% and 51% respectively [17].
In our research, the accuracy of all classifiers is higher than
that of the original author. Among all the machine learning
methods, Random Forest and XG Boost show better
performance on this data set. They have an about 10% higher
accuracy than classifiers using Naive Bayes and KNN.
The accuracy of the Gaussian distribution Bayes classifier
is higher than that of the Bernoulli distribution Bayes classifier.
But in general, the result of using the naive Bayes classifier and
KNN classifier for brain wave emotion classification is not
ideal. The classification result with the accuracy of 0.5 can be
regarded as basically invalid. The Bayes classifier assumes that
each value in the multi-dimensional sample has an independent
effect on classification which is different from the actual
situation. This may be the main reason why this method is not
Fig. 4 LSTM network model structure effective on this data set.

IV. EXPERIENMENT The LSTM classifier shows much better performance than
other classifiers. Compared with the two classifiers proposed by
A. Data source and preprocessing the original author, the classification accuracy is 22.1% and
The EEG signal data is collected from ten university 27.1% higher respectively. Therefore, LSTM is more suitable
students when they are watching MOOC video clips [17]. We for processing this data set.
extract online education videos. They are considered not to be
V. DISCUSSION
confusing for university students, for example, there are videos
of simple addition and subtraction. At the same time, we A. Effective of the emotion BCI system
prepare some videos, such as the research of stem cell and The EEG signal is easily affected by the tester’s gender, age,
quantum mechanics, which are estimated to make the university physical state and even the surrounding environment, resulting
students feel confused because of lack of familiarity. Twenty in changes, and make the emotional judgments inaccurate. We
videos are prepared and there are ten videos in each category. propose the following ideas to improve the accuracy of emotion
The duration of each video is two minutes. To let the videos judgment.
more confusing, the clip lasting for two minutes is chopped in

829

Authorized licensed use limited to: Vivekananda Institute of Professional Studies. Downloaded on July 01,2024 at 08:34:44 UTC from IEEE Xplore. Restrictions apply.
1) The testers are grouped according to age, gender, etc., different subjects, it is suggested to collect more EEG data to
and each group uses the collected signals to train the classifier improve the universality of the classifier.
independently. Besides, there is still much room for improvement in the
2) Make the number of testers in each group small, and make recognition and classification of multiple emotions, and
the number of groups large. A small number allows the teacher further explore for the intensity recognition of emotions is
pay better attention to every member’s learning status. The data still needed. We can also consider combining EEG with facial
of multiple groups would be compared with each other to expression or other physiological information to further
improve the accuracy of judgment. improve the classification accuracy.
3) Perform certain preprocessing on the collected signals to
reduce interference. VI. CONCLUSION
B. Significance to learners In this paper, we applied different machine learning
methods and LSTM to the data set collected EEG when students
The EEG emotion recognition system can remind the
watched MOOC video clips. We only used EEG related features
teachers to slow down the teaching speed when the students are
in the data set to train the classifiers, and the classification result
confused, and the students do not need to interrupt the teacher’s
only used user-defined labels. In this way, we ensured that our
speech and ask questions. Teachers can also visually see their
confusion classification results were only related to EEG. Our
own teaching results. Although the emotion recognition system
LSTM classifier performed well, the classification accuracy
based on EEG signals can be used to detect whether students
reached 78.1%. Compared with classifiers trained by the author
can understand the knowledge being taught in the classroom,
of the data set, the accuracy of LSTM classifier is 22.1% and
the collection of EEG signals requires the use of special
27.1% higher, respectively. It shows that LSTM can help a lot
equipment. It will be difficult to promote EEG sensors to the
in recognizing confusion of learners, and has high feasibility in
classroom for a long time in the future. We believe that it is a
dealing with EEG data. The classifier is supposed to assist
feasible plan to judge the key and difficult points in existing
students studying with higher efficiency.
textbooks through small-scale experimental classes. There is a
big gap between the knowledge mastery of teachers and REFERENCES
students, so it is difficult to accurately determine which
[1] Ryan S.J.d. Bakera, Sidney K. D'Mellob, Ma.Mercedes T. Rodrigoc,
formulas and knowledge points will cause difficulties for Arthur C. Graesserb, 瀡Better to be frustrated than bored: The incidence,
students to understand. EEG emotion recognition system can persistence, and impact of learners' cognitive-affective states during
quantify the difficulty of each knowledge points, help teachers interactions with three different computer-based learning environments.
improve teaching plans and make timetables more reasonable International Journal of Human-Computer Studies,” 2010.
to improve teaching efficiency. The system still needs to wait [2] Silvia, J Paul, “Confusion and interest: The role of knowledge emotions
for the further improvement and popularization of EEG sensor in aesthetic experience,” Psychology of Aesthetics Creativity & the Arts,
2010.
hardware to play a greater role.
[3] AV Agrawal, J Venkatraman,S Leonard, A Paepcke, “Youedu:
C. Application Addressing confusion in mooc discussion forums by recommending
instructional video clips ,” International Educational Data Mining Society,
In order to realize the visualization of classification results, 2015.
we transfer them to the Raspberry Pi (RPI) with the bright lights [4] RSJD Baker, J Kalka, V Aleven, L Rossi, J Ocumpaugh, “Towards sensor-
out. If the result is confusing, the lights will be on. If the result free affect detection in cognitive tutor algebra,” International Educational
is intelligible, the lights will be off. The python environment is Data Mining Society, 2012.
set up and the RPI is settled to run the program automatically. [5] AF Botelho, RS Baker, NT Heffernan, “Improving sensor-free affect
It means that when the RPI is used by other students or teachers, detection using deep learning,” International Conference on Artificial
Intelligence in Education,2017.
they can simply classify whether the learners are confused or
[6] Yutao Wang, Neil Heffernan, Cristina L Heffernan, “Towards better affect
not without knowing the complex principles and process. If the detectors: Effect of missing skills, class features and common wrong
RPI is applied in the EEG acquisition equipment, the answers,” 2015.
classification results shown on the screen of the device vividly [7] S D’Mello, “A Selective Meta-Analysis on the Relative Incidence of
seem to be possible. Discrete Affective States During Learning With Technology Journal of
Educational Psychology,” 2013.
D. Limitations and future work [8] AF Shedeed, “A new method for person identification in a biometric
With the development of brain computer interface, EEG security system based on brain EEG signal processing,” 2011 World
emotion recognition has become popular. Adding emotion to Congress on Information and Communication Technologies, 2011.
the learning system would help adjust the learning content and [9] W. Mardini, G. A. Ali, E. Magdady, S. Al-momani, “Detecting human
difficulty reasonably and improve the learning efficiency. emotions using electroencephalography (EEG) using dynamic
programming approach,” 2018 6th International Symposium on Digital
However, due to the constraints of data set and other factors, Forensic and Security (ISDFS), 2018.
this research has some limitations. [10] H. T. Ocbagabir, K. A. I. Aboalayon, M. Faezipour, “Efficient EEG
One of the most critical limitations is the weak classifier analysis for seizure monitoring in epileptic patients,” 2013 IEEE Long
Island Systems, Applications and Technology Conference (LISAT), 2013.
performance. As the data we used is processed instead of
[11] S D’Mello, “A Selective Meta-Analysis on the Relative Incidence of
original time series, we cannot use different methods to select Discrete Affective States During Learning with Technology,” 2013.
features. The current features may not have a strong impact on [12] Kort B, Reilly R, Picard R, “An affective model of interplay between
the classifier. Furthermore, due to the difference of EEG among emotions and learning,” 2001.

830

Authorized licensed use limited to: Vivekananda Institute of Professional Studies. Downloaded on July 01,2024 at 08:34:44 UTC from IEEE Xplore. Restrictions apply.
[13] J. Platt, “Sequential minimal optimization: A fast algorithm for training
support vector machines,” 1998.
[14] Zhongheng, Zhang, “Introduction to machine learning: k-nearest
neighbors,” 2016.
[15] L. Breiman, “Random forests. Mach. Learning,” pp.45(1):5–32, 2001.
[16] S Hochreiter, J Schmidhuber, “Long Short-Term Memory,” 1997.
[17] Haohan Wang, Yiwei Li, Xiaobo Hu, Yucong Yang, Zhu Meng, Kai-min
Chang, “Using EEG to Improve Massive Open Online Courses Feedback
Interaction,” 2013.

831

Authorized licensed use limited to: Vivekananda Institute of Professional Studies. Downloaded on July 01,2024 at 08:34:44 UTC from IEEE Xplore. Restrictions apply.

You might also like