Professional Documents
Culture Documents
Article
Special Issue
Multimodal Emotion Recognition in Artificial Intelligence
Edited by
Prof. Dr. Valentina Franzoni, Dr. Giulio Biondi, Dr. Alfredo Milani and Prof. Dr. Jordi Vallverdú
https://doi.org/10.3390/s22051937
sensors
Article
Emotional Speech Recognition Method Based on
Word Transcription
Gulmira Bekmanova 1 , Banu Yergesh 1, *, Altynbek Sharipbay 1 and Assel Mukanova 1,2
Abstract: The emotional speech recognition method presented in this article was applied to recognize
the emotions of students during online exams in distance learning due to COVID-19. The purpose of
this method is to recognize emotions in spoken speech through the knowledge base of emotionally
charged words, which are stored as a code book. The method analyzes human speech for the presence
of emotions. To assess the quality of the method, an experiment was conducted for 420 audio
recordings. The accuracy of the proposed method is 79.7% for the Kazakh language. The method
can be used for different languages and consists of the following tasks: capturing a signal, detecting
speech in it, recognizing speech words in a simplified transcription, determining word boundaries,
comparing a simplified transcription with a code book, and constructing a hypothesis about the
degree of speech emotionality. In case of the presence of emotions, there occurs complete recognition
of words and definitions of emotions in speech. The advantage of this method is the possibility of its
widespread use since it is not demanding on computational resources. The described method can
Citation: Bekmanova, G.; Yergesh, B.; be applied when there is a need to recognize positive and negative emotions in a crowd, in public
Sharipbay, A.; Mukanova, A. transport, schools, universities, etc. The experiment carried out has shown the effectiveness of this
Emotional Speech Recognition method. The results obtained will make it possible in the future to develop devices that begin to
Method Based on Word Transcription. record and recognize a speech signal, for example, in the case of detecting negative emotions in
Sensors 2022, 22, 1937. https:// sounding speech and, if necessary, transmitting a message about potential threats or riots.
doi.org/10.3390/s22051937
Academic Editors: Valentina Keywords: emotion recognition; speech recognition; crowd emotion recognition; affective computing;
Franzoni, Giulio Biondi, Alfredo distance learning; e-learning; artificial intelligence
Milani and Jordi Vallverdú
(CNN) with a specially designed multi-attention module (MAM) from speech signals to
solve the problem of classifying native speakers according to their age and gender in speech
processing. In the following works, when recognizing emotions, they use hidden Markov
models for speech recognition [30–33].
According to the results of the analysis of studies, emotions play a vital role in e-
learning [5,34,35]. It is also possible to determine users’ moods using affective computing
methods [36] and use the result of moods to support the teacher by providing feedback
relevant to learning.
For example, paper [37] describes a model that determines the mood of users using
methods of affective computation and uses the results of mood to support the teacher by
providing feedback related to teaching. Experienced teachers can change their teaching
style according to the needs of the students.
With the help of the definition of emotion, it is possible to analyze the interests of
the students and the learning outcomes of the course. Emotions can be detected by facial
expressions [37,38]. Article [39] proposed a multimodal emotion recognition system that
relies on speech and facial information. For the speech-based modality, they fine-tuned the
CNN-14 of the PANNs framework, and for facial emotion recognizers, they proposed a
framework that consists of a pre-trained Spatial Transformer Network on saliency maps and
facial images followed by a bi-LSTM with an attention mechanism. Results on sentiment
analysis and emotion recognition in the Kazakh language are published in [40–44].
The rest of the article is organized as follows: the introduction presents the most
recent studies on the recognition of speech emotions. Section 2 describes the problem and
proposes a solution for our methods. In Sections 3 and 4, the experimental results and
discussion are presented. Finally, in Section 5, we conclude our work.
given by adjectives, verbs, and some types of nouns, as well as interjections (exclamations)
expressing some emotions.
Shown below is a method for recognizing students’ emotional speech during online
exams in distance learning—a model for determining emotions.
The use of a codebook allows one to use this method with low computing power. The
described method can be applied when there is a need to recognize positive and negative
emotions in a crowd, in public transport, schools, universities, etc. This makes it possible to
develop devices that begin to record and recognize a speech signal, for example, in the case
of detecting negative emotions in sounding speech and, if necessary, transmit a message
about a potential threat or a riot.
In this work, we tested the method on multi-user datasets to find solutions to the
following research question:
Is it possible to effectively recognize emotions in speech based on a database of
emotionally charged Kazakh words?
Figure 2. The result of automatic splitting of the voice section of the signal into quasi-periods.
This figure demonstrates vertical marks placed by a program that calculates successive
quasi-periods according to the following algorithm.
The value is calculated.
k −1
Lk : = ∑ x n0 + i − x n0 + i + k (3)
i =0
where n0 -the number of the sample from which the current quasiperiod began in the buffer,
MI N ≤ k ≤ MAX (4)
sections from non-quasi-periodic ones and to use this difference when checking the signal
for the presence of speech.
Based on this method, the beginning and end of speech is determined.
Next, the speech recognition algorithm works up to simplified transcription.
To construct a simplified transcription, it is necessary to consider the operation of
the transcriptor.
The phonetic alphabet is the basis of the speech recognition unit. The symbols of the
phonetic alphabet must unambiguously correspond to those sounds, the distinction of
which is essential in the process of recognition. Therefore, the minimization of the size of
the phonetic alphabet without compromising the quality of recognition should be carried
Sensors 2022, 22, 1937 6 of 17
out by identifying in the alphabet only those sounds that are the closest in sounding from
the person’s point of view [45].
Therefore, for the correct construction of the transcriptor and the use of the phonetic
alphabet, we first need to translate the record of the word into an intermediate alphabet
of 31 letters,Ё which И Ч Щ areЦ Һ Э Ю Я Ь
characteristic of Ъ the Kazakh language. At the second stage, the
word recording Ё И is
Ч translated
Щ Ц Һ Эaccording Ю Я Ь to Ъ the formalized linguistic (phonological) rules
of the Kazakh language [50]. At the third stage, the received words are translated into
transcriptional notation in symbols of international transcription.
The current Cyrillized alphabet of the Kazakh language Ё И contains
Ч Щ Ц Һ letters Э Ю Я Ь Ъ of
Ё И Ч Щ Ц 42 Һ Э Ю instead Я Ь Ъ
31 letters [53]. Of these, 11 letters were mistakenly introduced in 1940 only such that the
Ё
Ё И
И Ч
Ч Щ
Щ Ц
Ц Һ
Һ Э Ю Я Ь Ъ
writing and reading of Russian words in the KazakhЁ ЁЁtext
Ё И
И
ИЁ
ЁИИЧ
was
Ч
И
ЧЧЩЩ
Ч
Ч
ЩЩ Щ
ЩЦЦЦ
Ц
Ц ҺҺЭ
carried
Һ
Ц
Һ Э
Һ
out
Э
Һ
ЭЭЮ
Ю
Ю
Э
ЭЮ
Ю
Я
inЮ
Я
Ю
Я ЯЬ
Ь
Яaccordance
Ь
ЬЯ
ЯЬЪ
Ъ
Ъ
Ь
ЬЪЪ
Ъ Ъ
with the norms of the Russian language. These include: Ё И Ё, Ч
И, Щ
Ч, Щ, Ц Һ Ц, Э
Һ,Ю Я ЬЯ,ЪЬ, Ъ.
Э, Ю,
Therefore, for the correct construction of the transcriptor and the use of the phonetic
А А ɑ alphabet,Бwe first needБto translate the record of the word into an intermediate alphabet of
А
Ә А
Ә ɑ 31 letters.БВ Correspondence БВ of symbols with transcription is given in Table 1.
ӘЕ ӘЕ е ВГ ВГ
Е
О Е
О еɔ ГҒ
Table 1. Symbols of the current ГҒ alphabet and ɣ intermediate Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
alphabet.
А А ɑ Б Б
О
Ө О
Ө ɔɵ А ҒД ҒДА ɣ ɑ Б Б
Ә Ә В В
ӨҰ Ө
Ұ ʊɵ Current
А Ә Д
Ж А Ә Д
Intermediate
Ж Ʒ Current
Б В
Intermediate
Б ЦВ
АА Е А
А Е ɑɑɑ е
Transcription ББ ГЁ И Ч Щ ББ ГҺ Э Ю Я Ь Ъ
Transcription
ҰҮ Ұ
Ү ʊү
Alphabet
А А А
А Ж
Alphabet
А Ж
А А
А Ʒ ɑ Б Б Б
Alphabet
Б Б Б
Alphabet
Б Б
А
Ә О
Ә ЕЗ ӘЗО
А
Ә Е ɑ
ɑ е ɑ ɑ Б
ВВ Ғ Г БВ Ғ Г
Ү
Ы Ү
Ы үɯ
А
ӘӘӘA Ә
Ә З
Й
А
ӘA
Ә ӘЗ
Й Ә
Ә ( ) ɔ
(ɑ) Б
ВВВБВВЁ И Ч ЩВ Б
ВВВЦБВ
В Һ Э Ю Я Ь(b) ɣЪ
ӘЕ
Е О Ә
ЕЕ О е е ɔ ВГГ Ғ ВГГ Ғ ɣ
ӘЕЕӘ
Е ӨЕ Ә
ЕЕӘ Ө
Е (æ)
ееɔее ɵеɵе ВГ
BД
ГГ Г ВГГГBД
Г (v)
ЫІ ЫІ ɯ
ɪ ОЕ ӨЕЙ К Е
О Й
ЕКӨ Е ГҒ Г
Д ГҒ Г
Д ɣ
ОЕ О
О
ЕҰ О
ОЕОЕ Ұ (е) ɔ
еɔɔɔʊ ɔɔ Ғ Г
Г Ж Ғ
Г ЖГ ɣ (g)
Ʒ
І
Э ІЕ ɪɪ О
ӨОО
О
Ө
ОҰК Қ О
О
ӨО
Ө
К ҚО О
Ұ (ɵ
(ɔ) ɵ)ʊү Д
ДҒҒҒ
Ғ ҒЖ
Ғ
Ғ Д
ДҒҒҒҒ
Ғ Ж
Ғ
Ғ ɣɣɣɣ(ɣ) ɣ
(Ʒ)
О
Ө
Ө ӨҮ О
Ө
Ө Ү
Ө ɔ
ɵ ɵ Д
ДҒ З
Д Д
ДҒ З
Д ɣ
ЭЯ Е
ЙА ɪɑ ӨҰӨӨ ӨҮҚ Л Ө
ҰӨ Қ
Ө ЛӨ Ү (ɵɵ
ʊ(ɵ) ) ү ɵ ДД Д
Ж З ДДДД
Ж З Ʒ (d)
ӨҰ
Ұ Ы Ұ
ӨҰ Ы ʊ ɵ, u) ɯ Ж
ДЖ
Ж Й Ж
ДЖЖ
Ж Й ƷƷƷ(Ʒ)
Я
Ю ЙА
ЙУ ɑ Ұ
Ұ Ұ ҰҰ
Ұ Л
М Ұ
Ұ ҰЛ
ҰМ Ұ
Ұ ( ʊ
(ʊ,
ʊɑү ʊ ʊ
ʊ
u) Ж
Ж Ж Ж
Ж Ж
Ж Ж
Ж Ʒ
Ʒ (Ʒ)
А Ү Ы
Ү І Ү Ы
А
Ү І ү ɪɯ Б
ЗЗ Й К Б
ЗЗ Й К
Ю ЙУ Ұ
Ү
Ү ҮҮ
Ү Ұ
ҮҮ Ү Ү
Ү ʊ (ү)
ү ү ү
ү Ж
ЗЗ З З
З Ж
ЗЗ ЗЗ
З Ʒ (z)
Ё ЙО ɔ ӘЫ
ЫҮҮЫІМ Н ҮЫ
Ы
Ә
Ы
М
ҮНІ үү) ɪ
ɯ ЗЙ
Й
В
Й
ЗК ЗЗЙК
Й
В
Й
Ы
ЫҮЫЫ Э ҮЫ
Ы
Ы ЫЕ (ɯ)
( үɯɯ
ɯ ɪ З Й
Й
Й Қ З Й
Й
Й Қ (y)
Ё
И ЙО
ІЙ ɔ Ы
ЕІІА ЫЭН
IЯ Ң Ы
ЕІІАН
IҢЙА
ЫЕ (ɪɪɯ е,ŋɑi)
ɯ i) ɯɪɑ КБЙ
Й
Г
К КЙҚ
Л КБЙК
Й
Г
К
Й
Қ
Л (k)
И ІЙ Ы ІІ І ЭІІҢ
І Ы ІІҢ І І
І ɪ
ɪɯ
ŋ ɪ ɪ
ɪ Й
К
К К К
К Й
К
К К К
К
И ЫЙ ɯ ОЭЭ Я П О
ЕЕІЕ ПЙА ɪ
(
(jɔ ɪ ɪ ) i) ɑ К
Қ
Ғ
Қ Қ Л К
Қ
Ғ
Қ ҚЛ ɣ (q)
ЭІӘЭЯ Ю
Э ЕІӘ ЙУЕ ɪ(ɑ) ɪɪɪɑɪ) ɪɪ ҚВ
К ЛҚ
М ҚВ
К МҚ
ИЧ ЫЙ
Ш ɯʃ ӨЭ
ЭЯА
Я Е ЮЭП
Ё
Р ЙА
ЙА
Е
ЙA
Е
Ө А
Е
П
ЕРЙУ
ЙО
Е (y ɵ ɑ
е ɔ
Қ
Қ
Л
Д
Л
БҚ
Г
Қ
М
Н
Қ
Қ
Л
Д
Л
БҚЛ
Г
Қ
М
Н
(l)
Э
Я Я ЙА Е ЙА ɪ Қ
Л Л Қ
Л Л
ЯӘЯ Я
Я ЙА ӘР ɑ ɑ ЛВЛН
Л ЛВЛН
Л
Ч Ш Ю ЙУ ЙА (yw) М Л МЛ (m)
Щ Ш ʃ Ю
ЮҰ ЁР С ЙА
ЙУ
ЙУ
ЙА
Ұ С
ЙО ʊ(y(ɔ) ɑɑɑ ɑɔ М
Ж
М М
Ж
М Ʒɣ (n)
Щ Ш ʃ Ю
Ю ЯОЕ
ЮИТ
И
Ю
Ё
Ю С ЙА
ЙУ
ЙУЙО
ЙУ
О
ЕС
ІЙ
ЙУ
ЙУ еɑ ) Л
М
М Ғ
ГНМ
МҢ
Ң
М Л
М
М Ғ
Г М
Н
МҢ
Ң
М
ŋ
Ц С ЮҮЁ
Ё ЙУ
ЙО
ЙОҮ Т ІЙ ү ɔ
ɔ М
Н
З
Н М
Н
З
Н (ŋ)
Ю Ө
ЁОЁ Ё
Ё И
И ЙУ
ЙО
ЙО Ө
IЙ
ОТУЫЙ
ЙО (iy) ɵ ɯ
ɔɔ ɔ ɔɔ М ДҢ
НҒНН
Н П М Д
НҒНН
Н
ҢП ɣ
( )
ЦҺ С
Х ЫИЁ ИЁТУ ЙОЙО
ІЙ
Ы ЙО
ЫЙ ɯ Н
Ң
Й Н
П Н
Ң
Й Н
П ŋ
И
ИЁӨҰИ И
Ч ІЙ
ЙО
ЫЙ
ІЙ Ұ ІЙ Ш (ɯ)
( ʊ ɵɔ ʃ j) Ң
Н
ҢЖ Π Р Ң
Н
ҢЖ ΠР ŋŋƷŋ ŋ
(p)
И И ІЙ ӨУІЙ Ң Д Ң Ң Д Ң ŋ
Һ
Ъ Х И
ИІҮ ИЧ ЧШ У ІЙ
ЫЙ
ЫЙ ІШІЙ
Ш Ш ɯү
ɪ(ɯ
(t ʃ) ʃ Ң
П
К
П РҢ
Ң Р Ң
П
К
П
ҢРҢ
Р ŋ ŋ
(r)
И
И Ұ Щ
И ІЙ
ЫЙ
ЫЙ Ү
Ұ ЫЙШ ʊɯ
ɯ ɯ ʃ Ң
П
П З
Ж С
П Ң
П
П З
Ж С
П ŋ Ʒ
ЪЬ ИЧ ИЩ И ШФ ЫЙЫЙ
Ш Ш ЫЙ
ФШ ((ɯʃɪɯ ɯ ПРЙП П ПРЙ П
ПCС
Э
Ч
ИЧЫ Щ
Ц Ш
ЫЙ
Е
Ш
Ш
Ы С ɯɯ ʃʃʃ) ʃ Қ
Р
П
CС
РЗР Т Қ
Р
П Т
РЗРТР
(s)
Ь Ч
Ч ҮЧ Ч
Ч
Ц ФХ Ш
Ш Ү
Ш
CФ Х Ш
Ш (tc) үʃ
ʃ ʃ ʃʃ Р
Р Т Р
Р Р
Р Р (t)
Щ
ЩЯІ Ц Һ ЙА
Ш ІШ С
Х ɪʃʃʃɯɑ СК У
Л
С Т СК У
Л
С Т
Щ
ЩЧ ЫЩ Щ
Щ
Һ Х ШШ ЫХ
ШХ Ш (h) ʃ ʃʃ ʃʃ РЙ
С
С У
С С
С РЙ
С
С СУС
С (w)
Щ
ЮЦ
Ц Һ Ш
ЙУСЕ Х
С ʃ С
М
ТТҚ ШУ С
М
ТТҚ ШУ
ЩЦ
Ц ЭІЪ Ъ
Ц Ш
СІС
С - С ɪʃɔɪ С
ТТКШТ С
ТТКТШТ (( ʃ )
Ц
ЁҺ Ц Ц
Ъ ЙОС
Х С Т
У
Н ТШТ Т
У
Н ШТ ʃ
Һ Я Ь Х
ЙА У
УЛ φФ У
УЛ
И
Ц
Һ
Һ
Ъ
Һ ЭҺЬ Һ
Һ С
Х
Х
ІЙ
ХЕХ- Х
Х ɪɑ Т
У
У
Ш
Ң ҚУ У
У Т
У
У
Ш
Ң ҚУφФ
У
У ŋ ʃ
(f)
ЪЮ Ь ЙУ Ш МХ Ф
Х Ш М ХФ
Х ʃ (h)
ҺЪ ЪЪ Х У
Ш ШШ У
Ш ШШ ʃʃʃ ʃ ʃʃ
ИЪ
ЪЬЯ
Ь Ё
Ъ ЙА
ЫЙ
ЙО ɯɔ
ɑ Ш
Ш
Ф
П
Ф
Л
Н
ШХ Ш
Ш
Ф
П
Ф
Л
Н
ШХ
ЪЬ
Ю
Ь Ь
Ь ЙУ Ш
Ф
Ф МФФФ Ш
Ф
Ф МФФФ ʃ
Ч ЬЬFormalization Ш ʃ Ф
Х
Р
Х Ф
Х
Р
Х
2.1.3.ЬИЁ Language ЙО ІЙ
of Phonological Rules of Sound
ФХҢ Combinations
Х ФХҢ in the
Х ŋ
Щ
Kazakh Ш ʃɯ
ɔ ХН
Х
С ХХ ХН
Х
С ХХ
И ЫЙ Х П Х П
ЦИ There are nine vowel
Ч
ІЙ
СШ and 22 consonant ʃ ТҢ
sounds Р in the Kazakh ТҢР language.
ŋ
Һ И ЫЙ
Х ɯ УСП П
УСin [50]; in general, in the
Щ The formalization Ш of phonologicalʃ rules is described in detail
ЪЦ Ч Ш ʃ Ш Р Ш Р ʃ
Kazakh language, there С is a law of syngharmonicity, Т when soft affixes
Т are attached to the
Щ Ш
soft Ьstem of a word (with soft vowels), and hard Ф ʃ С Ф
affixes are attachedС to the hard stem of
Һ
Ц Х
С У
Т are rules for У
Т voicing and stunning
the word. In addition, in the Kazakh language, there Х Х
Ъ
Һ Х Ш У Ш У ʃ
consonants; for example, the word “aq manday” sounds similar to “agmanday”. There are
Ь Ф Ф
sevenЪsimilar rules for the Kazakh language. Ш Ш ʃ
Ь simplified transcription is automatically built
A Х
Ф based on the Х
Ф complete transcription
of a word. Х Х
Sensors 2022, 22, 1937 7 of 17
Classes Symbols Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
Meaning
W aұыoеәүiөу Ёvowels
И Ч and
Щ consonant
Ц Һ Э Ю «У»Я Ь Ъ
C бвгғджзйлмнңр voiced consonants
F сш voiceless hush consonants
Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
P кқптфх voiceless consonants
Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
АА АА ɑɑ ББ ББ
ӘӘ
As expected, there are few ӘӘ words with the same В
structure,
В for example,ВВfor a dictionary
of 41,791 words,
ЕЕ there are only ЕЕ three words with
е е the CWCCWCW ГГ ГГ (Table 3).
structure
О О ɔɔ Ғ Ғ ɣɣ
А О of the words
Table 3. Example А withО
a structureɑCWCCWCW. Б Ғ Б Ғ
А Ө Ө А Ө Ө ɑ ɵ ɵ Б Д Д Б Д Д
Ә Ә В В
Ә ҰҰ Ә ҰҰ ʊ ʊ В ЖЖ В Ж ЖTranscription ƷƷ
Е
Kazakh Word Е е
Transcription Г Г
Generalized
ҮҮ ҮҮ ү
үnw З З
Е бұлдaну
О А Е А
О еld(ɑ)
b(ɔ) ҒГ БЗ ҒГ БЗ
CWCCWCW ɣ
О ЫЫ А О Ы Ы А ɯ Ғ ЙЙ Б Ғ ЙЙ Б ɣ
Ө Ә
бұлдырa Ө Ә b ɵ
(ɔ) ld (ɯ) (ɑ)
r Д В Д В
CWCCWCW
Ө ІІӘ
бүлдiру Ө ІІӘ ɵ
bүld (ɪɪrwi) Д ККВ Д ККВ
CWCCWCW
Ұ Е Ұ Е ʊ е Ж Г Ж Г Ʒ
Ұ О
Ү ЭЭ Е Ұ О
Ү ЕЕ Е ʊү ɔɪɪ е Ж Қ
З ҒҚ Г Ж Қ
З ҒҚ Г Ʒ ɣ
ЯЯ О ЙА
ЙА О ɑ ɔ ЛЛ Ғ ЛЛ Ғ
Next,Ү Ү
Ө signal is segmented
Ы the Ы Ө ү ɑ
ɯ ɵ a generalized
to create З
Й Д transcriptionЙ Д as describedɣ
З
Ю ЙУ М МД
І Ю
Ы
in [46–48,54].Ұ Ө ЫІ ЙУҰ Ө ɪɯ ʊ ɵ Й
К Ж МД Й
К Ж М Ʒ
ЁЁ Ұ
ThisЭІalgorithm could І ЙО
ЙО
help Ұ
to develop ɪ even ɔʊɔ more К НН Ж
generalized К ННЖ
transcription, namely, Ʒ
Ү Е Ү ɪ ү Қ З Қ З
to divide ЯЭ
allЫИИҮsounds into
the Е ІЙҮnatural classes:
ІЙ
two ɪ ү vowels Қ ҢҢ Зconsonants.
and Қ ҢҢSuch
З a ŋ
divisionŋ
ЙА Ы ɑɯ Л Й Л Й
gives good И
Я results ЫЙ ɯɯ ПЙ ПЙ
Ю І Ы in littleЙА
И vocabularies
ЙУЫЙ І Ы as well.
ɑ ɪɯ Л К
М П Л К
М П
Ч
Ч І Ш ʃ Р Р
Ю
Ё Э
Construction of Averaged
ЙУ Ш
ЙО Е І
Standards ɔ ɪʃ ɪ Н ҚР К
М Н ҚР К
М
Щ ШЕ ɔ ʃɑʃ ɪ СС Қ СС Қ
Ё Щ
И Я Э ЙО
ІЙЙА Ш Н Л
Ң Н Л
Ң ŋ
In orderЦ to
Ц reduce the dependence
С of the recognition system
Т on the speaker,
ТТ Л we applied
ИЮЯ
И ІЙЙУ
ЫЙ СЙА ɯ ɑ ҢМ
П ТЛ ҢМ
П ŋ
the procedureҺof averaging theХstandards spoken by severalУspeakers. ForУexample,
И
Ч Ё Һ Ю ЫЙ
ШЙО Х ЙУ ɯ ʃ ɔ П
Р Н У М П
Р Н У М
Ч И ЪЪЁ Ш ІЙЙО E = (e ʃʃ, e , . . ɔ. , e ) С Р Ң ШШН Р Ң ШН
Ш ʃʃ
Щ Ш 1 2 27 С ŋ(5)
Ь
ЬИ Ф Ф
Щ
Ц И С ЫЙІЙ
Ш ʃ ɯ С
Т П ФҢ С
Т П ФҢ ŋ
Ц И С ЫЙ = , , ɯ
. . . , Т ХХ П Т ХХ П (6)
Һ Ч Х Ш A ( a 1 2ʃ a a 27 У Р
) У Р
Һ
Ъ Щ Ч Х Ш Ш ʃ ʃ У
Ш С Р Ш Р
У Сassume ʃ the
two standards of the same word, and for the sake of generality, we will that
Ъ Щ
Ь Цalready been obtained
standard has С Ш ʃ
by averaging the standards Ш
Ф Т С spokenШ
Ф by С
Т n speakers, ʃ and
Ь Ц С
Һ of the n + 1stХspeaker. We take a vectorХand
a is the standard Ф Т Ф Т
У then a j +Х. .У. + a j+k , which is
Ъ Һ
all the corresponding vectors from Х set (5) in the meaning ШУ
Х described ХШУ
above. Then we place ʃ
Ь Ъ Ф Ш Ф Ш ʃ
Ь n n a j + . . . + a j+Ф Ф
′
ei = e + Х k
Х (7)
n+1 i n+1 k Х Х
Calculating this for all i = 1, 2, . . . 27, we obtain the result of averaging the standards
for n + 1 speakers:
E′ = e1 , e2 , . . . , e27
′ ′ ′
(8)
The coefficients
n 1
, (9)
n+1 n+1
were introduced in order to make all speakers equal. However, as the number n increases,
the changes introduced by new speakers become smaller and smaller. The same procedure
allows for averaging several standards of one speaker in order to increase their reliability.
The effectiveness of this procedure becomes especially clear if we apply it to averaging the
Sensors 2022, 22, 1937 8 of 17
standards of different words. Thus, for example, it allows one to teach the computer to
perceive each word of the line
y1 = (1 + ε ) y (10)
y2 = (1 − ε ) y (11)
here, ε is the splitting parameter with a value from 0.01 to 0.05.
3. Use the k-means Algorithm to obtain the best possible set of code vectors for a
double-size codebook.
4. Repeat steps 2 and 3 until we obtain the codebook of the required size.
The dimension of a codebook constructed in this way is a power of 2.
Sensors 2022, 22, 1937 9 of 17
ЁSensors
И Ч2022, Ц 1937
Щ 22, Һ Э Ю Я Ь Ъ 10 of 17
Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
The works in [41] describe methods for determining the polarity of the text. When an-
Б Б Б Б alyzing sentiment and emotion, the results of morphological [55,56] and syntactic analysis
Б В Б В of the Kazakh language Ё И Ч [57–59] Щ Ц Һ areЭused. Ю Я Ь Ъ
В В
Ё И Ч Щ
To apply this method to other languages, Ц Һ Э Ю Я Ь Ъ it is necessary to determine the list of
ВГ Г В
Г Г Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
Г
Ғ Ғ Г
Ғ Ғ ɣ ɣ words that give emotional coloring to speech, and then the proposed method will, after
Ғ Д Ё И ЧД ҒЩДЦ Һ Э Ю ɣ Я Ь Ъ performing simplified speech recognition, find words that have the same structure as
Д Б Б words with Ё emotional
ИЁЧИЩЧ Цcoloring ҺЦЭҺЮ ЭЯ
and aЬhypothesis
ЪЬ Ъ will be put forward that the sounding
Д Ж
Ж В
Д Ж
Ж В Ʒ Ʒ ЁЁИИ ЧЧЩ ЩЦЩ ЦҺҺ ЭЭisЮ Ю ЯЮ ЯЬЬЯЪЪ
speech is emotional and there a need to perform full-fledged speech recognition, which
Ж
З ГЗ Б ЖЗ ГЗ Б Ʒ
is demanding on computing resources.
З ЙВ
Й ЙЗ ЙВ Ё ЁИ ИЧ ЁЧЩИ ЦЧЦҺЩҺЭЦ
Й
Ғ
Й
Ғ ɣ
Ё И ofЧЩ ЭЮ
Щ Ц ҺVocabulary
ҺЯЭЬЮЪ Я Ь Ъ
ЭЮЮЯЯЬЬЪ Ъ
К ДК Г К ДК Г 2.2.1. Construction Emotion Generalized Transcriptions
А А ɑ Ё Ё И И Б ЧЧ ЩЩ Ц Ц ҺҺ ЭБЭ ЮЮ ЯЯ Ь Ь ЪЪ
К
Қ Қ Ғ К
ҚЖҚ Ғ
А ЖӘ А Ʒɑ ɣ The Б emotion Ё Иvocabulary
Ч Щ Ц Һallows Э Ю for Я Ь Ъ
constructing its generalized transcriptions and
Қ ЛД
Л Қ ЛӘ
Л Д В Ё БИ ЧВЩ Ц Һ Э Ю Я Ь Ъ
applying them to search for words from emotion vocabulary. Examples of emotional
Ә ЗА ӘЗ А ɑе В ЁБ ЁЧИИ ВЧЦ Щ Б Ц Һ ЭЯ Ю Я ЬЪЪ
Л Е
М М Ж Л МЖ
М Е Ʒ Ё И
Kazakh wordsВwith generalized ГЁ И ЧЁ
И ЩЧ Щ Ч
Ц
Щ ЩҺЦ ГҺ
ЭЦ
Һ Э
Ю Һ
Э Ю Э
Ю
Я Ю
Ь Я Ь
Ъ Я
Ь Ъ Ь
Ъ
transcription are shown in Table 5.
Е ЙӘ ЕЙ Ә е Г Г В
Н КО
М НЗ Н КН О
М З ɔ Ё ЁИҒИЧ Ч ЁЩЩ ИЦЧ ЦҺЩ ҺҒЭ ЭЦЮЮ ҺЯ Э ЯЬ Ю ЬЪ Ъ ɣЯ Ь Ъ
АОҢ ЕБ
Ө А ОБӨ Е ɑ ɔ Table еɵ Б Ғ Ё ГИ ЁЧ ИЩ
Д Б Ч Ғ Ц Щ Һ ГЦЭ ҺЮЭЯЮ
Д ɣ Ь Я
Ъ Ь Ъ with generalized transcription.
НА Ң Й НА Ң
Ң Қ ВАЙ ŋ ɑŋ ɑ ɔ БЁ БИҒ Ч Б Ғ emotion vocabulary
А 5. Example of words Б from
А Ө ҚОВ А Ө О ɑ ɵ Б Д Щ Б ЦД Һ Э Ю Я Ь Ъ ɣ
Ә Ң ҰПК Ә
Ң ӘК Ұ ŋ ʊ В ВЖ В ВЖ Ʒ
Ә ӘПӘЛӨ Г Word Ә
ПӘЛП
Ұ ГӨ ɵ В В ЁДЁ ИИ ЧЧ ЩВ ВЩ ЦЦ ДҺҺ ЭЭ Ю Ю Я
ƷЯ ЬЬ Ъ Ъ Emotion
ЕҰП РЕ ҮР Қ Е
ПРЕМРЕ Қ Ү еʊ
Transcription
еʊ ү Г Ж
ГЖ ЗTranslation Г Ж
ГЖ З POS Generalized Transcriptions
Е А Е М Ұ ҒАқoрқaқтық Е А Ғ Ұ А е е ɣ Г БГ Б Г Б Г Б Nɣ Ʒ
ОҮА РА Ы О Ү
РО СА А Ы ɑ ү ɑ Ғ З Б Й Ғ З Б Й
Ғ Б Ғ Б
С Л ОЛ q(ɔ) rq (ɑ) qt (ɯ) q cowardice fear PWCPWPPWP
ААО ӘО
Ы
ОНҮС Д Ә А
С
О ӘЫ НД Ү Ә ɔ ɔ ɑɔ ү Б Ғ ВЙ
ҒЗ
В Б Ғ ВЙ
ҒЗ
В ɣ ɣ ɣ
Ө Ә Ә ІТ М aқылсыз АТӨҢӘТӨ
Ө ӘМ І (ɑ) ɑɵ ɵ ɵ ɪ
q (ɯ) lswz Д ВЁД
Б ВИ К Ч ЩБ
stupid Д ЦВД ВҺКЭ Ю Я Ь Ъ N anger WPWCFWC
Ө ӨС ТӨ
АҢЫ ЖЕ көз жaсы
С
Ө ЕІ ЕЖ АЫ еŋ ДГД БЙ ДГД БЙ
ӘӘҰЕІ Е ТҰ
У Э А
Е У Нқұрмет Ә ӘҰ
ТҰ УҰ
У Е Е ʊ
Е Н А q(ʊ,rmеt
k(ɵ) ɪ
z(Ʒ) (ɑ)s(ɯ)
е
ʊеɪ ɪ е В Ж
ɑ ВЖ Ж К Г Қ
Г honor
Г
tear В Ж
Б ВЖ Ж К Г Г
Г Қ Б NƷƷ Ʒ
N sadness PWC CWFW
Ұ ОҰ Ә П ІЗОӘ Ұ О П ӘЗ І О ʊ ɔ u) Ж Ғ В К Ж Ғ В К Ʒ ɣ happiness PWCCWP
ЕЕҮЭА О
УА О ААҢшaпшaң
Я ЕЕ Ү
УҮА
ЕООЙА АА Ә ( еʃ(ɑ) еүү ɪ ɔ ɔ ɑɑɑɔ ГГЗҚБҒ Ғ Л БҒ В ГГЗҚБҒ Ғ Л
Бquick
Ғ
ББ В Adv ɣ ɣ
ɣ
А
Ү Ө Ү Ш
Е ҮРЭ Ш
Й
ША
Ү Ө Р ША
ЙЕҮҢ Е ү ɵ p( ɑ ʃ(ɑ)
еү ɪ
(ŋ) БДЗ Б
З ГЗҚ БДЗ Б
З ЗҚ
Г happiness FWPFWC
ОО Я Ө
ӘПЕ Ё ИО ЙА Ч Щ Ц Ө Һ Э ɔ Ю ɑ Я Ь ɵЪ Ғ Л Д Ғ Л Д ɣ
ЫА
Ә Ы АӘ
ШӨ
ФӘ ЫӨЮ ӘФ А Ы
О
ШФ
А
Ә
шaрaсыздaн Ы АӘӨФ
ӘЫӨЙУ ӘӘ
П А ɯ
Е ( ʃ(ɑ)ɔɯ (ɑ)
r
ɵ(ɯ)
s ɵ zd(ɑ)nе Й ҒВ
Б Й Б ДЙ
ВВ ДМ В Г Й
Вinvoluntarily
Б Ё И ҒВ
Б Ч
Й Б ДЙ
ВВДМ
Щ ВЦВ Г ɣ
Б Һ Э
Adv Ю Я Ь Ъ sadness FWCWFWCCWC
Ы
ӨӨ ІҰ ОСЯ КҰ ӨӨ Ы ҰСҰО КЙА Ұ О ɵɪʊɯ ʊ ɔ ʊ ɑ Й Ж ҒН ЛЖ Й Ж ҒН ЛЖ Ʒ ɣ Ʒ
ӘЕ
Ю
АІӘ Ұ
Е
Ф
ХӨЕ ҰІ АЕХЕ
Ё Ә О
Рсөзқұмaр Ф
Х
Ә
Е
ІЙУА ІӘ ЕХ ЕҰІЙОА
РЕЕ Ә s(ɵ) е ɪ ɑ(ʊ,
zq еɪеm(ɑ)u) еɔrе ɔ ДД КВГ
М
БВЖ
К ГГЖ
Ё К
ГЧ ҒЩ
ИБГgarrulous,
В
ДД КВГ
М
БВЖ
Ц
К ГЖҺ
К
chatty
Г БГЭГВЮ Я Ь ƷЪ
Ғ Adj Ʒ ɣ disgust FWCPWCWC
І ҮЁҮ ИО Т Ю Қ І Т
Ү Ү ІЙО ӨҚ ЙУ ɪ ү ɔү ɵ К Д М К Д М
ҰА
ҰЭ Ә ОХ Ү О
Ә
Ү тaту
СӨ ҰАҰЕ
Х
ЙО Ә О Ү О
Ә
С
Ү Ө ʊ(ɑ)
ʊt ɪ tw ɔ ү ɔ ɔү ɵ Ж ЖБҚЗН В З
ҒЁ З Ң
И
ВЁ
Ғ ҒЗЁЧ
И Д
amicablyИЩЖ
ЧЖБҚ Ч
ЗНЗ ҢҒЗ Adj
Щ
В Ц Щ
Ғ ЗҺ
Ц ВҒЦЭ
Һ Һ
ДЮЭ Э Ʒ
Ю
ЯƷ Ю Ь Я
ɣ ЯЪЬ ŋ
ɣЬ ɣ Ъ happiness
Ъ PWPW
ОЕЫ ЭЕ ҰЭ
О УЁЛӨ Е ЕЫЕУ
О ЕО ҰЕ ЙО Е еɔɯ ɪеʊnɔ(ɑ) ɪ ɔе ГЙ
Ғ Қ ГЖ ҒҚ Г ГЙ
Ғ Қ ГЖ ҒҚН Г ɣ ɣƷ
ҮӘЭ
Я И А Ы А И Ытиянaқсыз
Ұ Ү
ЙАӘЕ ІЙ А
Ы Л АЫЙ ЫҰ ү
Ө tiyy ɪɑ(ɑ) ɯ ɯ
qs ɵ
(ɯ) ʊz ЗВ Қ
Л Ң БЙБН ПЁ ЙЁ
Д И
fragile
Ж ИЗВ
Ч Қ
Л Ң
ЧЩ БЙЩ БЦПЦ Й
ДҺ ЖҺЭ ЭЮ
Adj ŋ
Ю Я ЯЬ Ь
Ъ Ъ
Ʒ anger PWCCWCWPFWC
ҮОЯОӨ
Ө Е Ө ЫЯ Ө
Е ОТпiшту!ЙА ҮО
ӨЙА Е Ө
ОЙА ӨЫ Ө
Е
ТО ү е
ɵɔʃi)ɑtw! ɵ ɯ
ɔ ɵүɑ ɔ еɵ ЗД Г
ҒЛҒ ДД Й Д
Г
Л my Ғ ЗД Г
ҒЛҒ ДД Й Д
Г
Л Ғ Intj ɣ ɣ ŋ ɣ disgust
Ш ІШ
ЫЕЯІО
Ю
Ы ИӘ ҰІҮ ӘИ
І Ч
О
Ұ
МҰ ІҮ
У
Ы
ЙУЕЫЙ
Ы О
ӘІМ
Ұ
ҮӘІЙ
І ШО
Ұ
У ҰІ Ү ɯ p(е(ɪɑɯ
ɯ ʊ
ɔ ɪ ɪ ʊɔʊʃɪɵ ү ЙЙ ГЛКП
М Ғ
ВКЗВҢ
Ж К Р
Ғ
Ж ЖК gosh
З ЙЙГЛКП
М Ғ
ВКЗВҢ
Ж К Р
Ғ
Ж ЖКЗ
ɣ ƷƷ ɣƷ Ʒ
PWFPW!
ЮӨҰЭ Ю Ө Ю
ҰФИ
Ы Ө ЙУ Ө
ҰЙУ ӨФ ЙУ
Ұ
Ы Ө ʊ ɵ ɵʊ МЖД М Д Ж
ЙМ Д МЖД М Д Ж
ЙМ Д Ʒ
ЧЕ ЕЩ НЭ туу ІОІЕ Ш ЕЕНЕЫЙ Е Ы ɪ ɔɔɪ ʃеɪ еɯ
tww ɯ ʃʊүɪ ɯ КК ҒНҚД РГЗ ГП Қ Й КҒНҚРГҚ ГП
Holy ҚЙ ɣ Intj sadness PWW
ОЁ
ІҮ
Ұ А ӨЁҰ ЭЭ
Ү Ү Ё ҮҮ
Ө ҰШ Ы
уaй
ЙО Ұ
Ү А
ЙО ӨҰҮ ЙО
Ү ЕШШҮҮ
Ө Ұ ʊ
wɪ (ɑ)
ү ʊɵyɔүүɪɔʃ ɵү ЖЗ БНЖ ҚН
З ҚД С З
ЗWow
Ж КЖЗ БД
НЖ ҚД
ЗЗН СЗЖЗ
IntjƷ Ʒ Ʒ happiness WWC
ЁЯ ОЯІО ХЧ ҢЫЯІ ЙОЙА Х
ОҢ ІОШ ɪɵɪɔɑ(ɯ) ɔŋɪɑɔ ʃɯ(ɑɪ i) ҚД НЛСҒЛКҒ Р НЛСҒЛКҒ Р
ЭӨ
И
Э Ү Ә
Щ
Ұ ҮЫ Я
И Ц
Ұ
Ы Ү Ф
ЕӨ
бұзықтықІЙ
Е
Ү Ә
ШҰЙА
ҮЫ ЙА
ІЙ
iстеу СЙА
Ұ
Ы
Ф Ы І b(ʊ,
Ү ү z
ʃu)
ү qtʊ
ɑ(ɯ) qү stеw ҚҢЗ ВЖ ЗЙ Л
Ң ЖЙ ЙЛ К ҚД
Тroughhouse
З ҚҢЗ ВЖ ЗЙ Л
Ң ЖТЙ
Й
ЛК ŋ ɣ ɣ
З V Ʒ ŋ Ʒ anger CWCWPPWP WFPWW
ИЫ И Ы Ы
ІЙ ІЙ Ы ɯ ɯ Й
Ң Ң Й Й
Ң Ң Й ŋ
ЯҰ
И
Я
Ю Ц
Ү
ӨЮІ
ЭӨЩ
Ю Һ
Ү
ПЮ
І Э ЙА
І Хбәрекелдi
ЫЙ
ЙАҰЙУ
Ү
ӨП
СЙУ І
ЙУ
ЕӨШ Х
Ү
Х І
ЙУ ɯɑɑүɪɵ ɪɵɪү(ʃɪ i) ɪ Л
І Е ʊbærеkеld ЖЛП М
ЗТД М
К
ҚДС
М У
З
К
МҚ Л
К
Bravo ЖЛП М
ЗТД М
К
ҚДС
М У
З
К
М Қ Intj
К Ʒŋ happiness CWCWPWCCW
Ы ІЕ
ЁИ ЫЯ ІИЦРЫ Ы
ЫЙІЕЫЙА ЫЙІ Ы ɪɯеɯɔɯɪɯɑ ɯ ЙК ГПЙК ПТ Й ЙК ГПЙК ПТ Й
ЮИ
Ү
ЮЧ О
І ЫҺҰ
ІЁЧ
Э
ҰЪЭ
ЁЫ
Ё Я ЙУ
Э ІБәй
ЫЙ
ЙУ
ЙО
Ү
Ш О
І ЫХЙОҰРҰС
ІЕЙОШ ЫБЙО
ЕЕІЙА æy
ɯ
үʃɯ
ɪ ɔ
ʊ ʊɔ
ɪ ɪ ɪʃɔɯɪɪ ɪ ɑ М
ɔ З
М
П
РК
Н
Ғ
УЖ
Й К НЛ
Қ Н
ЖШ
Р Й
Қ
Н Л МЗ
Қ
hey
К М
П
РК
Н
ҒЙУЖ
К НЛ
Қ
ЖШҚ
РЙ
Н
Н
Қ К Л Intjɣ
Ʒ Ʒʃ anger WC
ЧЭ И Ч ЮЭ ҺА
С ШЕ
ІЙ Ш ЙУ Е
С ХА ʃ ɪ ʃ Қ
Р ҢР Қ
М УБ Қ
Р ҢР Қ
М УБ ŋ
ЁЫ Ъ Ү Ү И В ЙО
әттеген-aй Ү Ү В ІЙ ү
ɔʃɔɪ ɑ ɪ ɑ ɑ Н
ættеgеn- ү (ɑ) y Ш З З Ң
ЛМН
What a Ш
pity З З Ң Intj ʃ ŋ sadness WPPWCWC-WC
Щ
ЁАЭ
ЯӨ Щ ІЭЯИЩ
ЯИЬ ЯІЯЭЮ қaп ЙО
Ы
ЙА
ШЕ
АӨ Ш ІЕІЙ
ЙАЙА ІЙ
Ш ЙА ІЙА ɯ
ЕЙУ q(ɑ) ɵɪɑpʃ ɪ ɑɔʃ ɪ
Й
Н СҚ
Л
Б ДК
С ҚЛҢҢ
ЛСШ
Ф
Л
К Й
Н
Қ a shame СҚ
Л
Б ДК
С ҚЛҢҢ
ЛСШ ЛЛ
Ф
К Қ М Intj ŋ ŋ ʃ
ИЩ И ЬЫ Ё Ы ЪӘТИ
Ю Г ІЙ Ш
ЫЙ Ы ЙО Т Ы Ә
ГЫЙ
ЙУ ʃ
ɯ ɯ ɯ ɯ Ң С ПФ Й Н Й В
it’s
МП Ң С ПФ Й НЙ ВМ П ŋ sadness PWP
Ц
ИІ А ЭЯЮ ИА И Ю
А Ё
Э Я мaсқaрaй ІЙІ
СА ЫЙ
ЕЙУ АСЙУ
ЫЙ АЕЙА ЙО m ɪ ɪɯ ɯr(ɑ)ɪy ɑ ɔ Ң К Т БҚ МПБП ХWhat
БМ
Қ Н aҢ К Т БҚ МПБП Х Л Н Intj
БМ
Қ ŋƷ
ЮӘ
ЦЯ Ұ Ц Ю Ц ЬЕ ЙА
ЙУӘ Ұ ЙА
С ЙУ ЕШ ʊ (ɑ) ɑ sq ɑ(ɑ) М
е ɔʃ ПТРХКҢКФГНЛ
Ж
В Т Л М Т Л МЛ
Ж
В
mess
Т Л МТ
ТРХКРҢКФГН
sadness CWFPWCWC
ИЭҺӘ Ч ІЧИ А ІА УЁЧҒмәссaғaн
И ЫЙ ЕС
ХШШ ІІЙ
АУІ А
ӘҒЙО
ɯɯɪ ʃɪɑ (ɑ) ɪ (ɑ)
ɔʃ(ɣ) Қ УЗ РВБУ БGeeР ПҚ У БУ БМ Р Ң Intj ŋ
И
ЮЕ
ҺЁ
Щ ҮҺЯЮ ЁӘЁЧ
И ҺӘ Я
ЁЮ
О
Ш
ЫЙЙУ
ЙО Е
Х
ЙА
Ш
Ә
ҮЙУХЙОЙО
ЫЙ
Ә
Ш
ШХЙА
ЙО
О ЙУІЙ mæss еүɔʃ ʃɯ ɔʃ ɔɑɔn П М
Н
УГСВЛ
УМ Н Н
П
РВЛН
ХҒ МҢ П М
Н
УГЗВЛ
СУ МННВ
П
РВЛ Н
Х Ғ ɣŋ
ŋ fear CWFFWCWC
ЧЯЪ Э
Щ Ә ЭӘИ Щ Д Ш
ЙА Е
Ш Ә Е Д
Ә Ш
ІЙ ʃ ɑ ɪʃ ɪ ʃ РЛ
Ш ҚС ВҚ В Ң С РЛ
Ш ҚСВҚ В ҢС ʃ
ЧО
ИЁ
Ы Ю
ЕЪЁЯИИ Щ
Е ЪЮ ЕИ Ё И Ш
ЙО О
ІЙЙУ
Ы ЕЙО ІЙ ІЙШЙУ
Е ІЙ
Е ЙО ЫЙ ɔɯ ʃеɔ ɔеʃ е ɔɯ Р Н
ҢҒЙГ
ШМН ҢҢГСМ
Ш ГҢН П Р Н
ҢҒЙГ
ШМ НҢҢГСМ
Ш ГҢН П ɣŋ ʃ ŋŋʃ ŋ
Щ Ъ Ц Ч ЯЕ ӨФИЦЖЧ Ш С ЙА Ш Ф
ЙА ЖӨ
ЫЙ СШ ʃ ɑеʃƷɑеɵɯ ʃ С Ш Т ЛТР ЛГ ДП ТР С Ш Т ЛТРЛГ ДТ ʃ
ЮЩЬ
И
Ө О ІЁ
ЬИИЦ
И
Е
О Ц
Ь ОЁ
И И
ЙУШ
ЫЙІЙ
ӨЙО
О ІЫЙ
ІЙ С
ЫЙ ОЕС ЙО
ЫЙ
ОЕ ІЙ ɯ ʃ
ɪ
ɵ ɔ ɯ
ɔ ɯɔ ɯ
ɔ ɔ МСФ
Ң
Д
П Ғ
КН
ФҢ П Ғ
П
ГТ
Ф ҒН
П Ң
МСФ
Ң
Д
П Ғ
КН
Ф ҢП Ғ
П
Г
Т
Ф ҒПП
Н Ң Рof emotional
ŋɣ ŋ ɣ ɣ ŋ words of the Kazakh language and speech
Ұ ҰШ ʃ ʊɔ ʃ ТН An example МЖ of a МЖ
database Ʒ
ЁЬӨ
ЦЦ ҺЮ Һ
ЧӨ
ЩО ЮО
И
ХЧ
ЧИ
ҺЗЩ ЙО С С
ХЙУ
Ш ХШОХО
ЙУ
ІЙ
Ш З ХШ ɔ ɔ ʃ
Ф
Х
Т
УМ УСҒ ҢҒ РУ С ТН Ф
Х
Т
УМ УСҒ ҢҒ РУС ɣ ŋɣ
И
Ұ
Ч ЭЪ
И И Ч
Ц
ҺӨ
Ү ЫЙШҰӨ ІЙ
ЕЫЙ ШӨХӨ
С ҮЫЙ ʊɯɵʃɪɯʃɵʃ ɵrecognition ү
ʃɯ ЖП
ХР
Ш
Д
ҚҢ
ХПРД Р
Т
У
ХД
demoР П code
З ЖП
ХР
Ш
Д
ҚҢ
Х П РД
canР
Т
У
ХД Р Пdownloaded
be
З Ʒʃŋ here.
ҺИ ЁЪӨЁӨЩ ЪЙЦ Х
ІЙ ЙО ЙО
Ө ЙШ С
Ө ɔɵ ɔɵ ʃ УҢ НШ Д НДШ С Т УҢ НШ ДНДШ С Т ŋƷ ʃƷʃ Ʒ ʃ
ҺЧ ЯҰИ Щ Ұ Ъ Щ
ҰИ ХЫЙ
ЙА Ұ Ш Ұ ЫЙ
Ш
Ұ ʊ ɯ ʊ ʃ ʊ ɯ ʃ У Ж П СШ
Ж Ж
РЛsimplified П
С У Ж
РЛ П СШ
Ж Ж П
С
Ъ
ЩҮ ЬЧ ИЬҺ
Щ ЫЧ
ИҰЦ Ь
ШҮШШ
ІЙХ ІЙК ЫШ үʃʃɑ ʃ ʃ ɯ ʃ TheЗ
ПФ
С РС
Ң УҢЙТ Р transcription
ФУШ
З
С ФР Ң
С
УҢЙТ Р in this article has two main purposes. First, simplified
Ф У ʃ ŋ Ʒŋ Ʒ
И
Ъ
Щ
ЫЮ
Ц ҮЧЩ ЦЦ ҰЬ
Ү ҮЦЩКҺ ЫЙ
Ч ЙУ
ШС
Ы Ш
Ү Ш СС ҮҰ Ш
ҮҰШ
С С Х ɯүʃʊүʊү
ɯ ʃ ʃ ʃʃ Ш
transcription Ш ЙМ
С
Т ЗРСТ ФЖ
ЗФЗ
Т
Ж
ТС
Р
is used Пin
Ш ЙМ
С
Т ЗРСТ ФЖ
ЗФЗ
the
Т
Ж
РТspeech
С ʃrecognition task. A simplified transcription is built,
ЬЧ ИЪ Ү ИҮІҺҚЪ Ш ЫЙ ЫЙ
Ү Қ
Ү
ІХ
ʃ ɯ үɯ ɪү Ф Р ХПХШ ЗПЗ КУ ХШ ФРХПХШ ЗПЗ КУ ХШ ʃ
ʃ
ЬЦ
Һ ІЫЩ
Ё ЦҺЫ Һ ЫЩҺЦ С
ЙО
ХІЫШ СХЫ ХЫ ШХС ɪ ɯɔʃ ɯ ɯ as Ф
ʃ for example,КТ
УНЙСТУЙ У ХЙ У Тthe Ф
С word
КТ
УНЙСТУЙ УХЙ У Т(ata—grandfather)
С
ata is similar to CPC (vowel + voiceless
Ь Э Е ɪ Ф Қ Ф Қ
ЩІЦЧ Ъ ЫІ
ЧЫЪЛЬ
Ц
Ъ
І ШІСШЫ ШЫ
І Л
С
І ʃɪ ʃɪɯɪʃconsonant ɯ ХСХ КТ Ш
Р+ЙР
К Ш
КЙ
Т Ш Ф ХIn
vowel). СХ КТ Ш
РЙРtask
theК Ш
КЙ
Т ШФ of speech ʃ ʃ ʃ
recognition, this simplified transcription is used
Һ
Ъ
Э ИҺЪ ЯҺ Х Х ЙАХ
ІЙ
Е ɪ ШҚУҢУШ ХС ЛФ У ШҚУҢУШ ХС ЛФ У ʃŋ ʃ
ЦЭҺЩ Щ
І І Ь М С ШШ І М І ʃɪ ʃɪtoɑsearch Т СК К Х Т СК К Х
Ъ
ЯЬИЪЭ Ь ЬЭ Һ
Ь Ъ ЙАЫЙ Х
Е ЕЕ Х ɯɑɪ ɪ ɪ ШЛФП У
ҚШФ Ф
for У
Ф
possible
ҚҚ Ш Ш ЛФП У
ҚШФ Ф
candidates
ҚҚ Ш У
Ф for ʃ ʃ recognition ʃ from the dictionary of all words of the
ҺЯЪЦЯ Э ЦЮ Э Н Х СЕСЙУ Н
Е ɪ ɪ
Kazakh У ТҚТ Қ
language
МХ
that У have
МХ
ТҚТsimplified
Қ transcriptions (in our work, these are more than three
Ю Ь ЧЬ Я ЪЬ ЙУЙА
Ш ЙА ЙА ʃɑ ɑ ɑɔ М ХЛ
Ф Ш
Р ФХЛХЛ ШХФ М ХЛ
Ф Ш
Р ФХЛХЛ ШХФ ʃ ʃ
Һ Һ Ё Ң Х Х ЙО
Ң ŋ У У Н У У Н word forms of the Kazakh language). For our example,
ЪЮ ЬЮ ЯЮ Я
Ь ЙА ЙА million, Ш two ЛМ ЛХ Ш
hundred
Ф ЛМ
thousand ЛХ
Ф ʃ
Щ
Ё ИП ЙОЙУ
Ш ЙУ ЙУ
ІЙ ɔʃ
ɑ ɑ
НХМСФХМ Ң НХМСФХМ Ң
ЬЁ ЪЮ ЪЮ ЙУ ЙУП the word Ф Ш М
apa ШМ Ф
(apa—grandmother) ШМ ШМ hasʃ the ʃ ŋ same simplified transcription CPC (vowel +
И Ц Ё ЁИ ЙО
ІЙ С ЙО ЙО ɔ ɔɔ Ң ТХ НН
Н Х Ң ТХ НН
Н Х ŋ
Ь ЁЬ Ё Р ЫЙ Р ɯ ФНФН П П
ФНФН A list of candidates for recognition is compiled, and then a
И И
Һ И ІЙХ ЙО ІЙІЙ ЙО ɔvoiceless
ɔ ХҢconsonant
У Ң Ң
+Хvowel).
Ң
УbasedҢҢ ŋ ŋŋ
И Ч С ЫЙ Ш ɯ ʃ П Р П Рon the code book is run. Second, simplified transcription is
И И ІЙ С
ІЙ recognition Х Ң Х
algorithm
Ң ХҢХҢ
Ч И
Ъ И И
ЩТ
ЫЙ
Ш ЫЙ ЫЙ
Ш
ɯ ʃ ɯ ɯ
ʃ to determine
used Ш
Р П П П
С the Р Ш П
presence
П П ʃ ŋ ŋ colored words in speech and can significantly
С of emotionally
Ч И И ЫЙ ЫЙ Т ɯ ɯ П П П П
Щ Ь Ч ЧЦ ШШ ШШ
С ʃ ʃ ʃ reduce ʃ СФ
theР search
РР
Т time. СФ Р At Р Рthe initial stage, the algorithm allows for understanding of
Т
Щ ЩЧЩ Ч У Ш ШШШ ШУ ʃ ʃʃ ʃʃ С СР СР С Р СР
С
Ц ҺШ С Х whetherТХa speech, У audio ТХ recording У or text contains emotionally charged words. Thus, for
ЩЩ ШШ ʃʃʃ С ТС С ТС
ҺЦ Ц ЦЪ ХС С С example, УТtheТ word Ш У Т Т
“уaй”—(way) Ш is the equivalent ʃ of “wow” in English; it has a simplified
Һ ЦҺ
Һ Ц Ф Х ХС ХФ С У УТ Уof
Т WWC, У Т УТ occurs only once in the entire dictionary of emotionally
У
Ъ Ь transcription Ш Ф Ш which Ф ʃ
ҺҺ Х ХХ УШ У УШ У
ЬЪ Ъ Ъ chargedФ Ш words.Ш
Х ФШШ
Х
ʃ ʃ ʃ
Ь ЬЬ Ъ Ъ Ш Ш Ш Ш ʃ ʃ
ХФ Ф Ф ХФ Ф Ф
Ь Ь Х ХХ Ф Ф Х ХХ Ф Ф
ХХ ХХ
Sensors 2022, 22, 1937 11 of 17
Designation Purpose
α, β, γ, . . . , ζ, ξ, . . . , Many words in the language—Variables
ω ω = ζ ·α· β·ξ—Lexical units (non-empty word or phrase)
L Set of sentences in the language
N Set of nouns
Adj Set of adjectives
Pron Set of pronouns
ɑ Б Б V_Post Set of positive verb forms
В В V_Negt Set of negative verb forms
Intj Set of interjections
е Г Г
AdvIntens Set of adverbs or enhancing
ɔ Ғ Ғ ɣ
emo Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
Emotion Establishment—Predicate
ɵ Д Д @ Ё И Чwords
Negation Щ Ц“емес/жoқ”(no)—Constants
Һ Э Ю Я Ь Ъ
ʊ Ж Ж Ʒ¬ Transformation to negative form—Operation
ү З З · Concatenation—Operation
ɯ Й Й
ɪ К К Below is a model based on formalized production rules for determining the emotion
ɪ Қ of lexical units (phrases) in a Kazakh text. Below are examples of the rules for defining
Қ
ɑ Л emotions in the Kazakh language.
Л
М А М
1. А
If a lexical unit contains ɑ a noun Ё И with Ч БЩthe Ц emotional
Һ Э Ю Б Яcolor Ь Ъof happiness and the next
Ё И Ч ЁЩИЦЧҺЩЭ Ц ЮҺЯЭЬ Ю Ъ Я
ɔ Н
А Ә Н А wordӘafter it ɑis a verb (of aБ positive В form) Б withВa neutral tone, then the emotional
Ң
Ә Е Ң Ә description
Е ŋ of this phrase е isВhappiness. Г В Г
ɯ П
Е О П Е О е ɔ Г Ғ Г Ғ ɣ
ʃ Р
О Ө Р О Ө ɔ ɵ Ғ Д Ғ Д ɣ
ʃ С С Ө ω ∈ L, ω = ζ ·α· β·ξ, α ∈ N, emo ( a) = happiness, β ∈ VPostЁ И (Чβ)Щ
, emo = 0Ц Һ Э Ю Я Ь Ъ
Ө Ұ Ұ ɵ ʊ Д Ж Д Ж Ʒ (12)
Т Т Ұ emo (ω ) = happiness
Ұ Ү Ү ʊ ү Ж З Ж З Ʒ
У
Ү Ы У Ү Here Ы and below ү ζ, ξ—anyɯ ЗstringsЙof words, З Й Ё Ё
including ИИ ЧЧ ЩЩ
empty Ц ЦҺҺ
ones. ЭFor
Э
ЮЮ Я ЯЬ Ь
ЪЪ
example,
АШЫ І Ш
А Ы келдi
шaбыт І (( ʃ(ɑ)b(ɯ)t kеld( ɪ).i) Б Й К Б Й К
А А А А ɑ ɑ Б Б Б Б
Ә Ф
І Э Ф І
Ә Е ɪ
If a lexicalӘunit contains ɪ В К Қ В К Қ
2. Ә anӘadjective Ә describing the emotion В anger В andВthe next В word
Х
Е Э Я Х
Е Е afterЙА Л Г Қ Л
it is aЕеverb ɑ Гform)
ɪ (positive
Е Е
Қ with
Е a neutral е tonality,е then Г the emotional
Г Г description
Г
О Я Ю ОЙАof this ЙУphrase ɔ ɑ Ғ Л М Ғ Л М ɣ
О is anger. О О О ɔ ɔ Ғ Ғ Ғ Ғ ɣ ɣ
ӨЮ Ё ӨЙУ ЙО ɵ ɔ ДМ Н ДМ Н
Ө А Ө Ө А Ө ɵ ɑ ɵ Д Б Д Д Б Д
Ұ Ё И Ұ ЙО ІЙ∈ ʊL, ω ɔ= ζ ·α· β·ξ,ЖαН∈ Adj, Ңemo Ж( aН) = anger, Ң Ʒ β ∈ЖVPost ŋ ( β) = 0
, sent
ω Ұ Ә Ұ Ұ Ә Ұ ʊЁ ИЁ Ё Ч
ИПʊИЩЧ ЧЩ ЦЩЁ В Ж Ж ЬВ ЪЖ Я(13)
Ь ƷЪ Ʒ
Ү И И Ү ІЙ ЫЙ ү ɯ З Ң emoП(ωЗ ) =Ңanger ŋҺЦИЦЭҺЧҺ
ЮЭЩ ЭЮ
ЯЮ ЦЬЯҺЪЯЬ ЭЪЮ
Ү Е Ү Ү Е Ү ү е ү З Г З З Г З
ЫИ Ч ЫЫЙ Ш ɯɯ ʃ ЙП Р ЙП Р
ForА А Ы тиянaқсыз
example, О Ы А А бoлды Ы О (tiy Ы(ɑ)n(ɑ)qs(ɯ)z b(ɔ)ld(ɯ) Б ).Б Й Ғ Б ЙБ Й Ғ Й ɣ
І Ч Щ І Ш Ш ɪ ʃ ʃ К Р С К Р С
3. Ә Ә І Ө І Ә Ә І Ө І ɪ ɵ ɪВ В К Д В
К В К Д К
ЭЩ Ц Е Ш If a lexical
С unit
ɪ ʃ contains Қ anСinterjection Т Қwith С an emotional
Т connotation sadness, then
Е Е Э Ұ Э Е Е Е Ұ Е е е ɪ ʊ Гɪ Г Қ Ж ГҚ Г Қ Ж Қ Ʒ
Я Ц Һ ЙА С the emotional
Х ɑ description Л of Т this phrase
У Л is Т sadness.У
О О Я Ү Я О О ЙА Ү ЙАɔ ɔ ɑ ү Ғɑ Ғ Л З ҒЛ Ғ Л З Лɣ ɣ
ЮҺ Ъ ЙУ Х МУ ШМУ Ш ʃ
Ө Өω Ю ∈ L, Ы ω=ЮζӨ · αӨ ЙУα ∈ЫIntj,
· β·ξ, ЙУɵ emo Д Д Мβ ∈Й
ɵ ( a) =ɯsadness, ДМ (Дβ) М
, sent =0Й М
Ё Ъ Ь ЙО ɔ НШ Ф НШ Ф ʃ
Ұ Ұ Ё І Ё Ұ Ұ ЙО emo І ЙО ʊ
(ωʊ) = sadness ɔ ɪ ЖɔЖ Н К Ж НЖ Н К Н Ʒ Ʒ(14)
И Ь ІЙ ҢФ Х ҢФ Х ŋ
Ү Ү И Э И Ү Ү ІЙ Е ІЙү ү ɪ З З Ң Қ З ҢЗ Ң Қ Ң ŋ ŋ
И АЫЙА А For example,
А АɯАмaсқaрa(m
А А(ɑ)
ПsqХ (ɑ)r(ɑ)), қaп(qП Б p).
(ɑ) ХБ Б ББ Б Б Б
Ы Ы И Я И Ы Ы ЫЙ ЙАЫЙ ɯɯ ɯ ɑ Й
ɯЙ П Л Й ПЙ П Л П
Ч ӘШ Ә Ә The use Ә ofӘ formal
ʃӘ Ә rules Ә for
Р determiningР Вemotions
ВВ ВВ Вfor
allows В extractingВ emotions from
І І Ч Ю Ч І І Ш ЙУ Ш ɪ ɪ ʃ К ʃ К Р М К
РК Р М Р
Щ ЕШ Е Е recognized
speech Е Е ʃ Е in Е the textЕС е е еon a codebook.
based СГе Г Г ГГ Г Г Г
Э Э Щ Ё Щ Е Е Ш ЙО Ш ɪ ɪ ʃ ɔ Қʃ Қ С Н Қ СҚ С Н С
Ц О СО О О О ОО ОТɔ ɔ ɔ ТҒɔ Ғ Ғ ҒҒ Ғ Ғ Ғɣ ɣ ɣ ɣ
Я Я Ц И ЦЙА
3. Experiment ЙА С ІЙ С ɑ ɑ ЛЛ Т Ң Л ТЛ Т Ң Т ŋ
Һ Ө ХӨ Ө Ө Ө ӨӨ ӨУ ɵ ɵɵ Д
У ɵ Д Д Д Д Д Д Д
Ю Ю Һ И Һ ЙУ ЙУ Х ЫЙ Х ɯ М М У П МУ М У П У
Ъ Ұ Ұ Ұ In ourҰresearch,
Ұ Ұ Ұ wordsҰʊШ thatʊ express
ʊ emotions
Ж ЖЖ are highlighted
ʊШ Ж
Ж ʃ ЖЖ
and ЖƷ divided
ƷƷ intoƷsix
classes Ё Ё 7).
(Table Ъ Ч ЪЙО ЙО Ш ɔ ɔ ʃ НН Ш Р Н ШН Ш Р Ш ʃ ʃ
Ь Ү ҮҮ Ү Ү ҮҮ ҮФ ү үү ФЗү З З ЗЗ З З З
И И Ь Щ Ь ІЙІЙ Ш ʃ ҢҢ Ф С Ң ФҢ Ф С Ф ŋ ŋ
Ы ЫЫ Ы Ы ЫЫ ЫХ ɯ ɯɯ Й
Хɯ ЙЙ ЙЙ ЙЙ Й
ИИ Ц ЫЙ ЫЙ С ɯɯ ПП Х Т П ХП Х Т Х
І І І І І І І Іɪ ɪ ɪ ɪК К К К К КК К
ЧЧ Һ ШШ Х ʃ ʃ Р Р У Р Р У
Э ЭЭ Э Е ЕЕ Еɪ ɪ ɪ Қɪ Қ Қ Қ Қ ҚҚ Қ
ЩЩ Ъ ШШ ʃ ʃ СС Ш СС Ш ʃ
Я ЯЯ ЯЙАЙА ЙА ЙАɑ ɑ ɑ Лɑ Л Л Л Л ЛЛ Л
Sensors 2022, 22, 1937 12 of 17
The table shows the classes of emotions, the main ones of which correspond to Ekman,
according to which students’ speeches were classified. This classification has been applied
in previous works and has shown its effectiveness.
For the experiment, 420 audio recordings were studied, in which the greatest activity
(voice acting) was found during the online exam.
At the moment for the Kazakh language, there are no sufficiently published results of
works on emotional recognition or synthesis of Kazakh speech. Therefore, it is currently
not possible to evaluate the proposed method in the traditional way. Thus, the evaluation
was carried out as follows. Accuracy was evaluated for the Kazakh language using the
Emotional Speech Recognition Method and DNN model presented in [60] (Table 8).
Table 8. Quantitative evaluation of the different methods on the speech emotion recognizer.
4. Results
To recognize emotions, we used audio from video recordings of taking online exams
during examination sessions. The database contains about 20,000 videos, which contain
the speech of students and their comments during and after the exam. However, many
audio recordings do not contain emotional speech, since during the online computer test,
students restrain and do not express emotions.
We have studied 420 audio recordings. Audio recordings of exams were used, which
were automatically processed by the speech recognition program and converted into text.
Sensors 2022, 22, 1937 13 of 17
All converted text was translated into a simplified transcription. Further, in the dictionary
of emotionally colored words, a search for the spoken text was carried out using a simplified
transcription. That is, we looked for emotionally charged words in audio recordings. If
the text of the audio recording contained emotionally colored words, the text was further
processed using the Emotion Recognition Model. That is, the emotions of the text were
automatically placed by the program, and the audio recordings received tags with emotions,
which were then manually checked by human taggers for the purity of the experiment. In
79.7% of cases, emotionally charged words were found correctly.
The confusion matrix of Emotional Speech Recognition Method for the Kazakh lan-
guage is presented in Figure 4.
5. Discussion
The results of this work were from a method for recognizing emotional speech based
on the emotion recognition model and the emotional vocabulary of emotionally charged
words of the Kazakh language with generalized transcription, as well as from a code book.
There are various works on recognizing emotions in speech, but they do not rely on
the proposed model for determining emotions, and this work also showed for the first time
the construction and use of an emotional vocabulary of emotionally charged words of the
Kazakh language with generalized transcription, as well as a code book, which significantly
reduced the time search and was not resource consuming, which expanded the scope of the
proposed method. This method can be adapted for other languages as well if an emotional
vocabulary is available.
The main disadvantage of methods for direct recognition of emotions in speech can be
considered as the need to process speech before its recognition. Thus, most often, speech in
Sensors 2022, 22, 1937 14 of 17
6. Conclusions
Speech recognition technologies and emotion detection contribute to the development
of intelligent systems for human–machine interaction. The results of this work can be used
when it is necessary to recognize positive and negative emotions in a crowd, in public
transport, schools, universities, etc. The experiment carried out has shown the effectiveness
of this method. The results obtained will allow one in the future to develop devices that
begin to record and recognize a speech signal, for example, in the case of detecting negative
emotions in speech and, if necessary, transmitting a message about a potential threat or
a riot.
In this paper, we presented a method for recognizing emotional speech based on an
emotion recognition model using the technique of constructing a code book. The code book
allows for significantly narrowing the search for candidates for recognizing emotionally
charged words. Answering the research question, we can say that it is possible to effectively
recognize the emotions of the Kazakh language based on a database of emotionally charged
words. An emotional vocabulary of the Kazakh language was built in the form of a code
book. The use of this method has shown its effectiveness in recognizing the emotional
speech of students during an exam in a distance format with an accuracy of 79.7%. At the
same time, there are certain risks in terms of speech recognition, which must be effective.
Its effectiveness directly depends on the level of noise and the use of grammatically correct
language (no slang, no use of foreign words in speech, etc.). There are also risks associated
with incorrect recognition of emotions, which may result from the incompleteness of the
emotional vocabulary. In addition, the meanings of polarity in emotional vocabulary may
need to be automatically reconfigured in the context of subject areas. For example, polarity
meanings should differ when recognizing the emotions of a crowd at a rally from the
polarity emotions of a student taking an exam, etc. This raises questions about verifying
the correctness of automatically configured polarity. This gives food for thought and the
formulation of new research topics.
In the future, we can think about using other types of features and apply our system
on other bases, including larger ones, and using a different method to reduce the size of the
feature. Finally, we can also consider emotion recognition using an audiovisual base, and
in this case, benefit from descriptors from speech and others from image. This allows for
improving the rate of recognition of each emotion.
Author Contributions: All authors contributed equally to this work. All authors have read and
agreed to the published version of the manuscript.
Funding: This research is funded by the Science Committee of the Ministry of Education and Science
of the Republic of Kazakhstan (grant no. BR11765535).
Institutional Review Board Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
Sensors 2022, 22, 1937 15 of 17
References
1. Franzoni, V.; Milani, A.; Nardi, D.; Vallverdú, J. Emotional machines: The next revolution. Web Intell. 2019, 17, 1–7. [CrossRef]
2. Majumder, N.; Poria, S.; Hazarika, D.; Mihalcea, R.; Gelbukh, A.; Cambria, E. DialogueRNN: An attentive RNN for emotion
detection in conversations. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1
February 2019; AAAI Press: Palo Alto, CA, USA, 2019; Volume 33.
3. Biondi, G.; Franzoni, V.; Poggioni, V. A deep learning semantic approach to emotion recognition using the IBM watson bluemix
alchemy language. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics); Springer: Cham, Swizerland, 2017; Volume 10406, pp. 719–729. [CrossRef]
4. Stappen, L.; Baird, A.; Cambria, E.; Schuller, B.W.; Cambria, E. Sentiment Analysis and Topic Recognition in Video Transcriptions.
IEEE Intell. Syst. 2021, 36, 88–95. [CrossRef]
5. Yang, D.; Alsadoon, A.; Prasad, P.W.C.; Singh, A.K.; Elchouemi, A. An Emotion Recognition Model Based on Facial Recognition
in Virtual Learning Environment. Procedia Comput. Sci. 2018, 125, 2–10. [CrossRef]
6. Gupta, O.; Raviv, D.; Raskar, R. Deep video gesture recognition using illumination invariants. arXiv 2016, arXiv:1603.06531.
7. Kahou, S.E.; Pal, C.; Bouthillier, X.; Froumenty, P.; Gülçehre, Ç.; Memisevic, R.; Vincent, P.; Courville, A.; Bengio, Y.; Ferrari, R.C.;
et al. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 2013 ACM
International Conference on Multimodal Interaction, Sydney, Australia, 9–13 December 2013; pp. 543–550.
8. Özdemir, M.; Elagöz, B.; Alaybeyoglu, A.; Akan, A. Deep Learning Based Facial Emotion Recognition System (Derin Öğrenme
Tabanlı Yüz Duyguları Tanıma Sistemi). In Proceedings of the 2020 Medical Technologies Congress (TIPTEKNO), Antalya, Turkey,
19–20 November 2020. [CrossRef]
9. Kahou, S.E.; Michalski, V.; Konda, K.; Memisevic, R.; Pal, C. Recurrent neural networks for emotion recognition in video. In
Proceedings of the ACM International Conference on Multimodal Interaction, ICMI 2015, Seattle, DC, USA, 9–13 November 2015;
pp. 467–474. [CrossRef]
10. Hossain, M.S.; Muhammad, G. Emotion recognition using deep learning approach from audio–visual emotional big data. Inf.
Fusion 2019, 49, 69–78. [CrossRef]
11. Rao, K.S.; Koolagudi, S.G. Recognition of emotions from video using acoustic and facial features. Signal Image Video Process. 2015,
9, 1029–1045. [CrossRef]
12. Cruz, A.; Bhanu, B.; Thakoor, N. Facial emotion recognition in continuous video. In Proceedings of the 21st International
Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan, 11–15 November 2012; pp. 1880–1883.
13. Tamil Selvi, P.; Vyshnavi, P.; Jagadish, R.; Srikumar, S.; Veni, S. Emotion recognition from videos using facial expressions. Adv.
Intell. Syst. Comput. 2017, 517, 565–576. [CrossRef]
14. Mehta, D.; Siddiqui, M.F.H.; Javaid, A.Y. Recognition of emotion intensities using machine learning algorithms: A comparative
study. Sensors 2019, 19, 1897. [CrossRef]
15. Franzoni, V.; Biondi, G.; Milani, A. Emotional sounds of crowds: Spectrogram-based analysis using deep learning. Multimed.
Tools Appl. 2020, 79, 36063–36075. [CrossRef]
16. Salekin, A.; Chen, Z.; Ahmed, M.Y.; Lach, J.; Metz, D.; De La Haye, K.; Bell, B.; Stankovic, J.A. Distant Emotion Recognition. Proc.
ACM Interact. Mob. Wearable Ubiquitous Technol. 2017, 1, 1–25. [CrossRef]
17. Fayek, H.M.; Lech, M.; Cavedon, L. Towards real-time speech emotion recognition using deep neural networks. In Proceedings
of the 9th International Conference on Signal Processing and Communication Systems, ICSPCS 2015, Cairns, Australia, 14–16
December 2015. [CrossRef]
18. Mirsamadi, S.; Barsoum, E.; Zhang, C. Automatic speech emotion recognition using recurrent neural networks with local attention.
In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, LA, USA, 5–9
March 2017. [CrossRef]
19. Franzoni, V.; Biondi, G.; Milani, A. A web-based system for emotion vector extraction. In Lecture Notes in Computer Science
(Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Swizerland, 2017;
Volume 10406, pp. 653–668. [CrossRef]
20. Franzoni, V.; Li, Y.; Mengoni, P. A path-based model for emotion abstraction on facebook using sentiment analysis and taxonomy
knowledge. In Proceedings of the 2017 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2017, Leipzig,
Germany, 23–26 August 2017; pp. 947–952. [CrossRef]
21. Canales, L.; Martinez-Barco, P. Emotion detection from text: A survey. In Proceedings of the Processing in the 5th Information
Systems Research Working Days, JISIC 2014, Hague, The Netherlands, 24–26 September 2014; pp. 37–43. [CrossRef]
22. Abdulsalam, W.H.; Alhamdani, R.S.; Abdullah, M.N. Facial emotion recognition from videos using deep convolutional neural
networks. Int. J. Mach. Learn. Comput. 2014, 9, 14–19. [CrossRef]
23. Gervasi, O.; Franzoni, V.; Riganelli, M.; Tasso, S. Automating facial emotion recognition. Web Intell. 2019, 17, 17–27. [CrossRef]
24. Gharavian, D.; Bejani, M.; Sheikhan, M. Audio-visual emotion recognition using FCBF feature selection method and particle
swarm optimization for fuzzy ARTMAP neural networks. Multimed. Tools Appl. 2017, 76, 2331–2352. [CrossRef]
25. Sinith, M.S.; Aswathi, E.; Deepa, T.M.; Shameema, C.P.; Rajan, S. Emotion recognition from audio signals using Support Vector
Machine. In Proceedings of the IEEE Recent Advances in Intelligent Computational Systems, RAICS 2015, Trivandrum, Kerala,
India, 10–12 December 2015; pp. 139–144. [CrossRef]
26. Kwon, S. A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 2020, 20, 183. [CrossRef]
Sensors 2022, 22, 1937 16 of 17
27. Kannadaguli, P.; Bhat, V. Comparison of hidden markov model and artificial neural network based machine learning techniques
using DDMFCC vectors for emotion recognition in Kannada. In Proceedings of the 5th IEEE International WIE Conference on
Electrical and Computer Engineering, WIECON-ECE 2019, Bangalore, India, 15–16 November 2019.
28. Tursunov, A.; Choeh, J.Y.; Kwon, S. Age and gender recognition using a convolutional neural network with a specially designed
multi-attention module through speech spectrograms. Sensors 2021, 21, 5892. [CrossRef] [PubMed]
29. Shahin, I. Emotion recognition based on third-order circular suprasegmental hidden markov model. In Proceedings of the IEEE
Jordan International Joint Conference on Electrical Engineering and Information Technology, JEEIT 2019, Amman, Jordan, 9–11
April 2019; pp. 800–805. [CrossRef]
30. Abo Absa, A.H.; Deriche, M. A two-stage hierarchical multilingual emotion recognition system using hidden markov models and
neural networks. In Proceedings of the 9th IEEE-GCC Conference and Exhibition, GCCCE 2017, Manama, Bahrain, 8–11 May 2017.
[CrossRef]
31. Quan, C.; Ren, F. Weighted high-order hidden Markov models for compound emotions recognition in text. Inf. Sci. 2016, 329,
581–596. [CrossRef]
32. Sidorov, M.; Minker, W.; Semenkin, E.S. Speech-based emotion recognition and speaker identification: Static vs. dynamic mode of
speech representation. J. Sib. Fed. Univ.-Math. Phys. 2016, 9, 518–523. [CrossRef]
33. Immordino-Yang, M.H.; Damasio, A. We feel, therefore we learn: The relevance of affective and social neuroscience to education.
Mind Brain Educ. 2007, 1, 3–10. [CrossRef]
34. Durães, D.; Toala, R.; Novais, P. Emotion Analysis in Distance Learning. In Educating Engineers for Future Industrial Revolutions;
Auer, M.E., Rüütmann, T., Eds.; Springer: Cham, Switzerland, 2021; Volume 1328, pp. 629–639. [CrossRef]
35. Baker, M.; Andriessen, J.; Järvelä, S. Affective Learning Together. Social and Emotional Dimension of Collaborative Learning; Routledge:
London, UK, 2013; 312p.
36. van der Haar, D.T. Student Emotion Recognition in Computer Science Education: A Blessing or Curse? In Lecture Notes in
Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham,
Swizerland, 2019; Volume 11590, pp. 301–311.
37. Krithika, L.B.; Lakshmi, G.G. Student Emotion Recognition System (SERS) for e-learning Improvement Based on Learner
Concentration Metric. Procedia Comput. Sci. 2016, 85, 767–776. [CrossRef]
38. Franzoni, V.; Biondi, G.; Perri, D.; Gervasi, O. Enhancing Mouth-Based Emotion Recognition Using Transfer Learning. Sensors
2020, 20, 5222. [CrossRef] [PubMed]
39. Luna-Jiménez, C.; Griol, D.; Callejas, Z.; Kleinlein, R.; Montero, J.M.; Fernández-Martínez, F. Multimodal emotion recognition on
RAVDESS dataset using transfer learning. Sensors 2021, 21, 7665. [CrossRef]
40. Yergesh, B.; Bekmanova, G.; Sharipbay, A.; Yergesh, M. Ontology-based sentiment analysis of kazakh sentences. In Lecture Notes
in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham,
Swizerland, 2017; Volume 10406, pp. 669–677. [CrossRef]
41. Yergesh, B.; Bekmanova, G.; Sharipbay, A. Sentiment analysis of Kazakh text and their polarity. Web Intell. 2019, 17, 9–15.
[CrossRef]
42. Zhetkenbay, L.; Bekmanova, G.; Yergesh, B.; Sharipbay, A. Method of Sentiment Preservation in the Kazakh-Turkish Machine
Translation. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics); Springer: Cham, Swizerland, 2020; Volume 12250, pp. 538–549. [CrossRef]
43. Yergesh, B.; Bekmanova, G.; Sharipbay, A. Sentiment analysis on the hotel reviews in the Kazakh language. In Proceedings of
the 2nd International Conference on Computer Science and Engineering, UBMK 2017, Antalya, Turkey, 5–8 October 2017; pp.
790–794. [CrossRef]
44. Bekmanova, G.; Yelibayeva, G.; Aubakirova, S.; Dyussupova, N.; Sharipbay, A.; Nyazova, R. Methods for Analyzing Polarity of
the Kazakh Texts Related to the Terrorist Threats. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial
Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Swizerland, 2019; Volume 11619, pp. 717–730. [CrossRef]
45. Shelepov, V.J.; Nicenko, A.V. Recognition of the continuous-speech russian phrases using their voiceless fragments. Eurasian J.
Math. Comput. Appl. 2016, 4, 54–59. [CrossRef]
46. Shelepov, V.Y.; Nitsenko, A.V. On the recognition of Russian words using generalized transcription. Probl. Artif. Intell. 2018, 1,
50–56. (In Russian)
47. Nitsenko, A.V.; Shelepov, V.Y. Algorithms for phonemic recognition of words for a given dictionary. Artif. Intell. [Iskusstv. Intell.]
2004, 4, 633–639. (In Russian)
48. Shelepov, V.Y. The concept of phonemic recognition of separately pronounced Russian words. Recognition of syn-tactically
related phrases. Materials of international scientific-technical conference. Artif. Intell. 2007, 162–170. (In Russian)
49. Shelepov, V.Y.; Nitsenko, A.V. To the problem of phonemic recognition. Artif. Intell. [Iskusstv. Intell.] 2005, 4, 662–668. (In Russian)
50. Sharipbayev, A.A.; Bekmanova, G.T.; Shelepov, V.U. Formalization of Phonologic Rules of the Kazakh Language for System
Automatic Speech Recognition. Available online: http://dspace.enu.kz/handle/data/1013 (accessed on 29 December 2021).
51. Bekmanova, G.T.; Nitsenko, A.V.; Sharipbaev, A.A.; Shelepov, V.Y. Algorithms for recognition of the Kazakh word as a whole. In
Structural Classification of Kazakh Language Words; Bulletin of the L.N. Gumilyov Eurasian National University: Astana, Kazakhstan,
2010; pp. 45–51.
Sensors 2022, 22, 1937 17 of 17
52. Shelepov, V.J.; Nitsenko, A.V. The refined identification of beginning-end of speech; the recognition of the voiceless sounds at
the beginning-end of speech. On the recognition of the extra-large vocabularies. Eurasian J. Math. Comput. Appl. 2017, 5, 70–79.
[CrossRef]
53. Kazakh Grammar. Phonetics, Word Formation, Morphology, Syntax; Astana-Poligraphy: Astana, Kazakhstan, 2002; p. 784.
(In Kazakh)
54. Bekmanova, G.; Yergesh, B.; Sharipbay, A. Sentiment Analysis Model Based on the Word Structural Representation. In Lecture
Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer:
Cham, Swizerland, 2021; Volume 12960, pp. 170–178. [CrossRef]
55. Sharipbaev, A.A.; Bekmanova, G.T.; Buribayeva, A.K.; Yergesh, B.Z.; Mukanova, A.S.; Kaliyev, A.K. Semantic neural network
model of morphological rules of the agglutinative languages. In Proceedings of the 6th International Conference on Soft
Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, SCIS/ISIS 2012, Kobe,
Japan, 20–24 November 2012; pp. 1094–1099. [CrossRef]
56. Yergesh, B.; Mukanova, A.; Sharipbay, A.; Bekmanova, G.; Razakhova, B. Semantic hyper-graph based representation of nouns in
the Kazakh language. Comput. Sist. 2014, 18, 627–635. [CrossRef]
57. Sharipbay, A.; Yergesh, B.; Razakhova, B.; Yelibayeva, G.; Mukanova, A. Syntax parsing model of Kazakh simple sentences. In
Proceedings of the 2nd International Conference on Data Science, E-Learning and Information Systems, DATA 2019, Dubai,
United Arab Emirates, 2–5 December 2019. [CrossRef]
58. Razakhova, B.S.; Sharipbaev, A.A. Formalization of Syntactic Rules of the Kazakh Language; Bulletin of the L.N. Gumilyov Eurasian
National University: Astana, Kazakhstan, 2012; pp. 42–50.
59. Yelibayeva, G.; Sharipbay, A.; Mukanova, A.; Razakhova, B. Applied ontology for the automatic classification of simple sentences
of the kazakh language. In Proceedings of the 5th International Conference on Computer Science and Engineering, UBMK 2020,
Diyarbakir, Turkey, 9–10 September 2020. [CrossRef]
60. Kozhakhmet, K.; Zhumaliyeva, R.; Shoinbek, A.; Sultanova, N. Speech emotion recognition for Kazakh and Russian languages.
Appl. Math. Inf. Sci. 2020, 14, 65–68. [CrossRef]