EEG-based Confusion Recognition Using Different Machine Learning Methods
EEG-based Confusion Recognition Using Different Machine Learning Methods
Abstract—Massive Open Online Course (MOOC) has emerged In this research, we propose a brain-computer interface
as a key trend. As a way of teaching online, the main shortcoming system based on real-time EEG to solve some problems of
of MOOC is lacking feedback because there is a distance in both webcast courses. The main contributions of the research are as
time and space between teachers and students. This follows:
study proposes the confusion recognition system based on
Electroencephalography(EEG). We apply machine learning •Exploration of machine learning and deep learning for the
methods, including Naive Bayes, KNN, Random Forest, XGBoost, classification of confusion.
and also a deep learning method, LSTM, on the EEG data set
respectively to detect whether a student feel confused. We find that
•A data mining strategy with high performance achieving
LSTM shows better performance than any machine learning 78.1% accuracy.
methods we use. The average accuracy of LSTM classifier is 78.1%. •A visualization of emotion classification results through
This study shows the significance of detecting confusion through Raspberry Pi.
EEG and helping students in improving learning efficiency.
The rest sections of this paper is organized as follows: A
Keywords—EEG, Brain-computer interface (BCI), Emotion concise summary of related works on EEG-based learning
recognition, Classification, Machine learning, LSTM emotion recognition is presented in Section 2. Section 3
describes the mechanism including EEG, confusion in learning
I. INTRODUCTION and algorithms. The data source, preprocessing and results are
Under the influence of multiple factors such as the presented in Section 4. Respectively, the discussion including
development of network technology, changes in the social the effective, significance application limitations and future
environment and lifestyle changes brought about by the work and the conclusion are available in Sections 5 and 6.
epidemic, online courses and tele-education are developing
rapidly, and may continue to develop rapidly. MOOC can II. RELATED WORK
provide services to many students of different ages at the same Silvia considers confusion as a learning emotion that
time, but it also has its shortcomings. One of the most serious encompasses knowledge [2]. In the process of learning, the
shortcomings is the lack of timely feedback on teaching. In most learner is accompanied by thinking about and understanding the
online education platforms, teachers cannot see the expressions problem, and confusion arises when the student does not
and actions of students, so it is difficult to judge whether understand the current learning or does not know how to solve
students understand the knowledge points through traditional the current activity to be performed, which in turn creates a
methods. It is not easy for teachers to control the teaching speed. conflict or dilemma. D'Mello analyzed the emotions generated
Therefore, clearly positioning the key and difficult points of by different learners in various learning environments and
knowledge in teaching has become an important task. We try to found that learning confusion was the second most common of
judge students' emotions through EEG signals, to discover the the fifteen emotions in the learning process [3]. Therefore, the
difficulties of knowledge accurately and quantitatively. study of learning confusion is an important issue that exists
Rhythmic fluctuations in the EEG signal occur within several widely in the learning process. Current detection of learning
particular frequency bands, and the relative level of activity confusion focuses on four approaches: text-based information
within each frequency band has been associated with brain left by learners, learner behavior data, learner physiological
states such as focused attentional processing, engagement, and signal information (e.g. EEG), and pictures of learners' facial
frustration and these functions are important for teaching. expressions. Researchers at Stanford University applied natural
Recently, the invention of simple, low-cost, portable EEG language processing to identify confusing posts by capturing
monitoring equipment makes it possible to detect students' EEG textual data such as comment messages left by learners on
signals from the surface of the scalp. It is ready for this discussion boards of the MOOC platform and questions asked
technology to enter the school from the laboratory [1]. For after class [4]. They developed a YouEDU teaching aid system
example, the NeuroSky Mindset is an audio headset and it is that classified messages by building classification algorithms to
equipped with a single-channel EEG sensor. find learners in confusion and recommended responding video
clips for relevant students at the same time. Baker built
classification models to identify emotions such as frustration,
827
Authorized licensed use limited to: Vivekananda Institute of Professional Studies. Downloaded on July 01,2024 at 08:34:44 UTC from IEEE Xplore. Restrictions apply.
2) KNN
K-nearest neighbor (KNN) classifier assigns unlabeled
subjects to their most similar labeled classes. The similarity is
represented by the distance between the characteristics of each
subject. The parameter K decides the number of the neighbors.
The target subject to be classified will be classified into the
class that wins the most votes among the K neighbors. Figure2
shows a simple example to illustrate how KNN classifier works.
4) XGBoost
XGBoost is one of the boosting algorithms and the thought
of it is to integrate plenty of weak classifiers to form a strong
Figure 2. An example of KNN. When K=1, the subject will be classified to
class A. But when K changes to 5, the subject will be labeled as class B. So K one. The tree model XGBboost is using is the classification and
is a very important parameter. regression tree. XGBoost is an improvement on the GBDT
algorithm. It can further reduce the complexity of calculation.
There are three commonly used ways to calculate the Facing with a large amount of data, the XGBoost algorithm can
distance between subjects: Euclidean Metric, Manhattan perform operations at the same time and divide the data in turn
Distance and Minkowski Distance. The following equation according to different characteristics to form a tree sequence.
shows the mathematical formula of Euclidean Metric [14]. This algorithm is simpler and more effective. It can make
complex data present in an orderly and concise arrangement.
The process of the XGBoost algorithm on the target data is as
follows.
In (2), p and q are two subjects ˈ and n means the
characteristics of the two subjects. The different calculation
method will make the neighbors found different, which will
affect the classification results. In (3), f(x)) represents the target tree model data, F is the set
3) Random Forest of all data, represents the predicted value of the data
Random Forest was proposed by Breiman in 2001 [15]. It is obtained after XGBoost operation. The next step is to construct
an ensemble learning method that integrates Bootstrap a tree sequence model based on the obtained data. The
aggregation (Bagging) and CART decision tree. The n samples calculation formula for building a tree sequence model is as
are randomly extracted from data set N by bagging to generate follows,
sub-data sets. Multiple CART decision trees grow freely
without pruning on different training sets. These trees can be
regarded as weak classifiers. They are combined to form a
random forest, which is a strong classifier and the classification
results are generated by voting. Fig.3 shows a simplified In (4), obj represents the target data tree-like construction
random forest. For a decision tree, the training subset it is based model, and l represents the error between the predicted value
on is generated by bagging. Therefore, the sample that is not and the actual value. The smaller the error value is, the more the
involved in the construction of the tree is called an Out-Of-Bag accurate prediction and the better the data arrangement is.
sample (OOB), and the probability that the OOB sample is mis- 5) LSTM
predicted is defined as an OOB error. LSTM firstly, published by Jürgen Schmidhuber and Sepp
Random forest is flexible and practical. In the currently Hochreiter in 1997, is a RNN architecture [16]. The core of
proposed machine learning algorithms, random forest is easy to LSTM lies in the addition of three gates and a memory unit,
implement and has good accuracy. It can effectively deal with which are called forget gate, input gate, and output gate. The
large data sets with high dimensions without dimension function of them is to control the transmission of information in
reduction. the evolution direction and increase control gates to solve the
828
Authorized licensed use limited to: Vivekananda Institute of Professional Studies. Downloaded on July 01,2024 at 08:34:44 UTC from IEEE Xplore. Restrictions apply.
problems of input and output. It is mainly realized through a the middle of a topic. The student wears a wireless single-
neural layer and a point-by-point multiplication operation and channel Mindset. It is for measuring activity over the frontal
it solves the problems of long-term dependence, gradient lobe and the NeuroSky’s API is used to collect the signal
disappearance, and gradient explosion. The LSTM network streams.
model structure is shown in Figure 4 and the calculation
formula for forward transfer of the model is as follows: B. Result and analysis
We use machine learning methods and a deep learning
ft V˄:f u [h t 1 , X t ] b f˅ (5) method to classify the confusion EEG respectively. Machine
learning methods including Naive Bayes, KNN, Random Forest
Ct tanh(Wc <[h t 1 , X t ] b c ) (6) and XG Boost. The deep learning method is based on LSTM.
C t f t *C t 1 i t *C t (7) The accuracy of each method LVVKRZQLQ7DEOHჟ
IV. EXPERIENMENT The LSTM classifier shows much better performance than
other classifiers. Compared with the two classifiers proposed by
A. Data source and preprocessing the original author, the classification accuracy is 22.1% and
The EEG signal data is collected from ten university 27.1% higher respectively. Therefore, LSTM is more suitable
students when they are watching MOOC video clips [17]. We for processing this data set.
extract online education videos. They are considered not to be
V. DISCUSSION
confusing for university students, for example, there are videos
of simple addition and subtraction. At the same time, we A. Effective of the emotion BCI system
prepare some videos, such as the research of stem cell and The EEG signal is easily affected by the tester’s gender, age,
quantum mechanics, which are estimated to make the university physical state and even the surrounding environment, resulting
students feel confused because of lack of familiarity. Twenty in changes, and make the emotional judgments inaccurate. We
videos are prepared and there are ten videos in each category. propose the following ideas to improve the accuracy of emotion
The duration of each video is two minutes. To let the videos judgment.
more confusing, the clip lasting for two minutes is chopped in
829
Authorized licensed use limited to: Vivekananda Institute of Professional Studies. Downloaded on July 01,2024 at 08:34:44 UTC from IEEE Xplore. Restrictions apply.
1) The testers are grouped according to age, gender, etc., different subjects, it is suggested to collect more EEG data to
and each group uses the collected signals to train the classifier improve the universality of the classifier.
independently. Besides, there is still much room for improvement in the
2) Make the number of testers in each group small, and make recognition and classification of multiple emotions, and
the number of groups large. A small number allows the teacher further explore for the intensity recognition of emotions is
pay better attention to every member’s learning status. The data still needed. We can also consider combining EEG with facial
of multiple groups would be compared with each other to expression or other physiological information to further
improve the accuracy of judgment. improve the classification accuracy.
3) Perform certain preprocessing on the collected signals to
reduce interference. VI. CONCLUSION
B. Significance to learners In this paper, we applied different machine learning
methods and LSTM to the data set collected EEG when students
The EEG emotion recognition system can remind the
watched MOOC video clips. We only used EEG related features
teachers to slow down the teaching speed when the students are
in the data set to train the classifiers, and the classification result
confused, and the students do not need to interrupt the teacher’s
only used user-defined labels. In this way, we ensured that our
speech and ask questions. Teachers can also visually see their
confusion classification results were only related to EEG. Our
own teaching results. Although the emotion recognition system
LSTM classifier performed well, the classification accuracy
based on EEG signals can be used to detect whether students
reached 78.1%. Compared with classifiers trained by the author
can understand the knowledge being taught in the classroom,
of the data set, the accuracy of LSTM classifier is 22.1% and
the collection of EEG signals requires the use of special
27.1% higher, respectively. It shows that LSTM can help a lot
equipment. It will be difficult to promote EEG sensors to the
in recognizing confusion of learners, and has high feasibility in
classroom for a long time in the future. We believe that it is a
dealing with EEG data. The classifier is supposed to assist
feasible plan to judge the key and difficult points in existing
students studying with higher efficiency.
textbooks through small-scale experimental classes. There is a
big gap between the knowledge mastery of teachers and REFERENCES
students, so it is difficult to accurately determine which
[1] Ryan S.J.d. Bakera, Sidney K. D'Mellob, Ma.Mercedes T. Rodrigoc,
formulas and knowledge points will cause difficulties for Arthur C. Graesserb, 瀡Better to be frustrated than bored: The incidence,
students to understand. EEG emotion recognition system can persistence, and impact of learners' cognitive-affective states during
quantify the difficulty of each knowledge points, help teachers interactions with three different computer-based learning environments.
improve teaching plans and make timetables more reasonable International Journal of Human-Computer Studies,” 2010.
to improve teaching efficiency. The system still needs to wait [2] Silvia, J Paul, “Confusion and interest: The role of knowledge emotions
for the further improvement and popularization of EEG sensor in aesthetic experience,” Psychology of Aesthetics Creativity & the Arts,
2010.
hardware to play a greater role.
[3] AV Agrawal, J Venkatraman,S Leonard, A Paepcke, “Youedu:
C. Application Addressing confusion in mooc discussion forums by recommending
instructional video clips ,” International Educational Data Mining Society,
In order to realize the visualization of classification results, 2015.
we transfer them to the Raspberry Pi (RPI) with the bright lights [4] RSJD Baker, J Kalka, V Aleven, L Rossi, J Ocumpaugh, “Towards sensor-
out. If the result is confusing, the lights will be on. If the result free affect detection in cognitive tutor algebra,” International Educational
is intelligible, the lights will be off. The python environment is Data Mining Society, 2012.
set up and the RPI is settled to run the program automatically. [5] AF Botelho, RS Baker, NT Heffernan, “Improving sensor-free affect
It means that when the RPI is used by other students or teachers, detection using deep learning,” International Conference on Artificial
Intelligence in Education,2017.
they can simply classify whether the learners are confused or
[6] Yutao Wang, Neil Heffernan, Cristina L Heffernan, “Towards better affect
not without knowing the complex principles and process. If the detectors: Effect of missing skills, class features and common wrong
RPI is applied in the EEG acquisition equipment, the answers,” 2015.
classification results shown on the screen of the device vividly [7] S D’Mello, “A Selective Meta-Analysis on the Relative Incidence of
seem to be possible. Discrete Affective States During Learning With Technology Journal of
Educational Psychology,” 2013.
D. Limitations and future work [8] AF Shedeed, “A new method for person identification in a biometric
With the development of brain computer interface, EEG security system based on brain EEG signal processing,” 2011 World
emotion recognition has become popular. Adding emotion to Congress on Information and Communication Technologies, 2011.
the learning system would help adjust the learning content and [9] W. Mardini, G. A. Ali, E. Magdady, S. Al-momani, “Detecting human
difficulty reasonably and improve the learning efficiency. emotions using electroencephalography (EEG) using dynamic
programming approach,” 2018 6th International Symposium on Digital
However, due to the constraints of data set and other factors, Forensic and Security (ISDFS), 2018.
this research has some limitations. [10] H. T. Ocbagabir, K. A. I. Aboalayon, M. Faezipour, “Efficient EEG
One of the most critical limitations is the weak classifier analysis for seizure monitoring in epileptic patients,” 2013 IEEE Long
Island Systems, Applications and Technology Conference (LISAT), 2013.
performance. As the data we used is processed instead of
[11] S D’Mello, “A Selective Meta-Analysis on the Relative Incidence of
original time series, we cannot use different methods to select Discrete Affective States During Learning with Technology,” 2013.
features. The current features may not have a strong impact on [12] Kort B, Reilly R, Picard R, “An affective model of interplay between
the classifier. Furthermore, due to the difference of EEG among emotions and learning,” 2001.
830
Authorized licensed use limited to: Vivekananda Institute of Professional Studies. Downloaded on July 01,2024 at 08:34:44 UTC from IEEE Xplore. Restrictions apply.
[13] J. Platt, “Sequential minimal optimization: A fast algorithm for training
support vector machines,” 1998.
[14] Zhongheng, Zhang, “Introduction to machine learning: k-nearest
neighbors,” 2016.
[15] L. Breiman, “Random forests. Mach. Learning,” pp.45(1):5–32, 2001.
[16] S Hochreiter, J Schmidhuber, “Long Short-Term Memory,” 1997.
[17] Haohan Wang, Yiwei Li, Xiaobo Hu, Yucong Yang, Zhu Meng, Kai-min
Chang, “Using EEG to Improve Massive Open Online Courses Feedback
Interaction,” 2013.
831
Authorized licensed use limited to: Vivekananda Institute of Professional Studies. Downloaded on July 01,2024 at 08:34:44 UTC from IEEE Xplore. Restrictions apply.