Sensors 22 01937 With Cover

3.9 6.
Article
Emotional Speech Recognition

Method Based on Word
Transcription
Gulmira Bekmanova, Banu Yergesh, Altynbek Sharipbay and Assel Mukanova
Special Issue
Multimodal Emotion Recognition in Artificial Intelligence
Edited by
Prof. Dr. Valentina Franzoni, Dr. Giulio Biondi, Dr. Alfredo Milani and Prof. Dr. Jordi Vallverdú
https://doi.org/10.3390/s22051937
sensors
Article
Emotional Speech Recognition Method Based on
Word Transcription
Gulmira Bekmanova 1 , Banu Yergesh 1, *, Altynbek Sharipbay 1 and Assel Mukanova 1,2
1 Faculty of Information Technologies, L.N. Gumilyov Eurasian National University,

Nur-Sultan 010008, Kazakhstan; gulmira-r@yandex.kz (G.B.); sharalt@mail.ru (A.S.);
asiserikovna@gmail.com (A.M.)
2 Higher School of Information Technology and Engineering, Astana International University,
Nur-Sultan 010000, Kazakhstan
* Correspondence: b.yergesh@gmail.com
Abstract: The emotional speech recognition method presented in this article was applied to recognize
the emotions of students during online exams in distance learning due to COVID-19. The purpose of
this method is to recognize emotions in spoken speech through the knowledge base of emotionally
charged words, which are stored as a code book. The method analyzes human speech for the presence
of emotions. To assess the quality of the method, an experiment was conducted for 420 audio
recordings. The accuracy of the proposed method is 79.7% for the Kazakh language. The method
can be used for different languages and consists of the following tasks: capturing a signal, detecting
speech in it, recognizing speech words in a simplified transcription, determining word boundaries,
comparing a simplified transcription with a code book, and constructing a hypothesis about the
degree of speech emotionality. In case of the presence of emotions, there occurs complete recognition
of words and definitions of emotions in speech. The advantage of this method is the possibility of its

widespread use since it is not demanding on computational resources. The described method can
Citation: Bekmanova, G.; Yergesh, B.; be applied when there is a need to recognize positive and negative emotions in a crowd, in public
Sharipbay, A.; Mukanova, A. transport, schools, universities, etc. The experiment carried out has shown the effectiveness of this
Emotional Speech Recognition method. The results obtained will make it possible in the future to develop devices that begin to
Method Based on Word Transcription. record and recognize a speech signal, for example, in the case of detecting negative emotions in
Sensors 2022, 22, 1937. https:// sounding speech and, if necessary, transmitting a message about potential threats or riots.
doi.org/10.3390/s22051937
Academic Editors: Valentina Keywords: emotion recognition; speech recognition; crowd emotion recognition; affective computing;
Franzoni, Giulio Biondi, Alfredo distance learning; e-learning; artificial intelligence
Milani and Jordi Vallverdú
Received: 30 December 2021

Accepted: 25 February 2022
1. Introduction
Published: 2 March 2022
Automatic recognition of human emotions is a new and interesting area of research.
Publisher’s Note: MDPI stays neutral
Achievements in the field of artificial intelligence are used in the development of affective
with regard to jurisdictional claims in
computing and creation of emotional machines [1–3].
published maps and institutional affil-
Today, research is being carried out on recognizing facial emotions from videos [4–14],
iations.
on determining emotions by voice rh2ythm from audio information [15–18], and by writing
style from texts [19–21]. In the following works [22,23], a deep convolutional neural
network is used to recognize facial emotions from videos and images.
Copyright: © 2022 by the authors.
Today, the speech emotion recognition system (SER) assesses the emotional state of the
Licensee MDPI, Basel, Switzerland. speaker by examining his/her speech signal [24–26]. Work [27] proposes key technologies
This article is an open access article for recognition of speech emotions based on neural networks and recognition of facial
distributed under the terms and emotions based on SVM, and in paper [28], they show a system of emotion recognition
conditions of the Creative Commons based on an artificial neural network (ANN) and its comparison with a system based on
Attribution (CC BY) license (https:// the scheme Hidden Markov Modeling (HMM). Both systems were built on the basis of
creativecommons.org/licenses/by/ probabilistic pattern recognition and acoustic phonetic modeling approaches. Work [29]
4.0/). presents a novel end-to-end convolutional neural network for age and gender recognition
Sensors 2022, 22, 1937. https://doi.org/10.3390/s22051937 https://www.mdpi.com/journal/sensors

Sensors 2022, 22, 1937 2 of 17
(CNN) with a specially designed multi-attention module (MAM) from speech signals to
solve the problem of classifying native speakers according to their age and gender in speech
processing. In the following works, when recognizing emotions, they use hidden Markov
models for speech recognition [30–33].
According to the results of the analysis of studies, emotions play a vital role in e-
learning [5,34,35]. It is also possible to determine users’ moods using affective computing
methods [36] and use the result of moods to support the teacher by providing feedback
relevant to learning.
For example, paper [37] describes a model that determines the mood of users using
methods of affective computation and uses the results of mood to support the teacher by
providing feedback related to teaching. Experienced teachers can change their teaching
style according to the needs of the students.
With the help of the definition of emotion, it is possible to analyze the interests of
the students and the learning outcomes of the course. Emotions can be detected by facial
expressions [37,38]. Article [39] proposed a multimodal emotion recognition system that
relies on speech and facial information. For the speech-based modality, they fine-tuned the
CNN-14 of the PANNs framework, and for facial emotion recognizers, they proposed a
framework that consists of a pre-trained Spatial Transformer Network on saliency maps and
facial images followed by a bi-LSTM with an attention mechanism. Results on sentiment
analysis and emotion recognition in the Kazakh language are published in [40–44].
The rest of the article is organized as follows: the introduction presents the most
recent studies on the recognition of speech emotions. Section 2 describes the problem and
proposes a solution for our methods. In Sections 3 and 4, the experimental results and
discussion are presented. Finally, in Section 5, we conclude our work.
2. Problem Description and Proposed Solution

With the development of modern speech technologies, a fundamental possibility has
appeared as a transition from formal languages—intermediaries between man and machine
to natural language in oral form as a universal means of expressing the goals and desires of
a person.
Recognizing emotions in speech is a difficult problem. The opposite task to it is the
synthesis of emotional speech. Moreover, the approaches for recognizing emotions in
speech and the synthesis of emotional speech can be used the same. The solution to the
problem of recognizing emotions in speech is difficult to separate from the problem of
speech recognition itself, since the recognition of emotions is often associated with deter-
mining the meaning of what has been said. At the same time, there is currently no effective
system for recognizing emotional Kazakh speech. Simplified, we can divide the solution
of the problem into two parts, directly into speech recognition and emotion recognition.
Previously, work was carried out on speech recognition based on a simplified transcription
for the Russian language under the leadership of Shelepov V.Yu. in works [45–49], and for
Kazakh speech in works [50,51]. The method was tested and published for the Russian
and Kazakh languages and was also tested, but not published, by the authors for the
English language, and it allows one to find words of a given structure after acoustic tuning
for any language that has vowel (pronounced with the participation of vocal cords) and
consonant (voiced and sonorous, pronounced with the participation of the vocal cords;
deaf, pronounced without the participation of the vocal cords) sounds. This method allows
one to find the given words in recognizable speech; in the case of emotion recognition, it
makes it possible to find words that convey emotions. Therefore, this article discusses the
issue of definitions of emotion from audio data. The method considered in this work is
applied to the Kazakh language, but it can also be applied to any other language, only
taking into account the definition of a list of words that carry emotions.
The Kazakh language (ISO 639-3, kaz) belongs to agglutinate languages and belongs
to the Kypchak group of Turkic languages. The emotional coloring of the Kazakh speech is
Sensors 2022, 22, 1937 3 of 17
given by adjectives, verbs, and some types of nouns, as well as interjections (exclamations)
expressing some emotions.
Shown below is a method for recognizing students’ emotional speech during online
exams in distance learning—a model for determining emotions.
2.1. Emotional Speech Recognition Method

The emotional speech recognition method analyzes speech for the presence of emo-
tions. The proposed method first assumes signal recognition, detection of speech in it,
recognition in simplified transcription, definition of word boundaries, comparison of sim-
plified transcription with a code book, and a hypothesis about the degree of emotionality
of speech. In case of the presence of emotions, there occurs a full recognition of words and
detection of emotions in speech (Figure 1).
Figure 1. Emotional speech recognition process.
The use of a codebook allows one to use this method with low computing power. The
described method can be applied when there is a need to recognize positive and negative
emotions in a crowd, in public transport, schools, universities, etc. This makes it possible to
develop devices that begin to record and recognize a speech signal, for example, in the case
of detecting negative emotions in sounding speech and, if necessary, transmit a message
about a potential threat or a riot.
In this work, we tested the method on multi-user datasets to find solutions to the
following research question:
Is it possible to effectively recognize emotions in speech based on a database of
emotionally charged Kazakh words?
2.1.1. Stages of Speech Signal Recognition

Before recognizing an emotion, there is a need to recognize the speech signal, i.e.,
convert spoken speech into text.
Recognition of a speech signal can consist of several stages: signal capture; definition
of word boundaries; calculation of a sequence of quasi-periods; checking the recorded
material for the presence of speech; clarification of the boundary of the speech end [46].
Sensors 2022, 22, 1937 4 of 17
Capturing an Audio Signal

Capturing and digitizing an audio signal is carried out in accordance with the follow-
ing parameters: 8 bit audio signal with a sampling rate of 22,050 Hz, such that its values
have gradations from 0 to 255.
When a speech signal is input, 30,000 samples of “silence” are recorded, and successive
segments of 300 samples each are analyzed in the recorded signal. For each of them, the
following ratio is calculated [52]
V/C (1)
where
298
V= ∑ | x i +1 − x i | (2)
i =0
This is a numerical analogue of full variation, C is the number of points of constancy,

that is, points in time for which, at the next moment, the magnitude of the signal remains
unchanged. The value of the variable (1), typical for the used sound card, is automatically
determined as the most frequently encountered in the array of values. Further, this value,
increased by 0.1, is used as a characteristic of the “silence” for a particular sound card and
noise level.
Checking a Signal for Speech

The recorded audio signal needs to be processed such that its boundaries (beginning
and end) can be determined.
The method for determining the beginning and end of speech is based on the fact that
speech is quasiperiodic, since it is formed by trembling of the vocal cords [52]. Vowel and
voiced consonants are formed using the vocal cords, which, under the action of the air flow
from the lungs, vibrate quite periodically. Therefore, the corresponding signals (amplitude-
time representations) are quasiperiodic. For example, this is how the amplitudetime
representation of sound A in the word “ata” (“ata”—grandfather) appears (Figure 2):
Figure 2. The result of automatic splitting of the voice section of the signal into quasi-periods.
This figure demonstrates vertical marks placed by a program that calculates successive
quasi-periods according to the following algorithm.
The value is calculated.
k −1
Lk : = ∑ x n0 + i − x n0 + i + k (3)
i =0
where n0 -the number of the sample from which the current quasiperiod began in the buffer,
MI N ≤ k ≤ MAX (4)
Here, we determine k = k0 , at which the value Lk takes the minimum value. By

definition, k0 is the length of the quasiperiod. The MIN and MAX numbers are determined
by the speaker’s pitch. MIN and MAX are determined depending on the speaker’s pitch;
for example, for a tenor, they are respectively equal to 60 and 200.
Taking the end of the found quasiperiod as the beginning of the next one, we find
the second quasiperiod, and so on. This makes it possible to distinguish quasi-periodic
Sensors 2022, 22, 1937 5 of 17
sections from non-quasi-periodic ones and to use this difference when checking the signal
for the presence of speech.
Based on this method, the beginning and end of speech is determined.
Next, the speech recognition algorithm works up to simplified transcription.
To construct a simplified transcription, it is necessary to consider the operation of
the transcriptor.
2.1.2. Development of an Automatic Transcriptor

An automatic transcriptor is a program for translating any spelling text into a tran-
scriptional record and vice versa based on linguistic rules.
A transcriptor of words to a transcriptional notation is present in all natural language
processing systems such as translators, speech recognition or speech synthesis systems.
However, depending on the intended purpose of the system, one or another transcriptor
model is selected. Thus, for example, for speech recognition systems, a simplified transcrip-
tion can be used, while for speech synthesis systems, it is necessary to use the most detailed
transcription containing the acoustic characteristics of sounds.
This transcriptor model was developed for automatic Kazakh speech recognition
systems (Figure 3).
Figure 3. Kazakh speech recognition system.
The phonetic alphabet is the basis of the speech recognition unit. The symbols of the
phonetic alphabet must unambiguously correspond to those sounds, the distinction of
which is essential in the process of recognition. Therefore, the minimization of the size of
the phonetic alphabet without compromising the quality of recognition should be carried
Sensors 2022, 22, 1937 6 of 17
out by identifying in the alphabet only those sounds that are the closest in sounding from
the person’s point of view [45].
Therefore, for the correct construction of the transcriptor and the use of the phonetic
alphabet, we first need to translate the record of the word into an intermediate alphabet
of 31 letters,Ё which И Ч Щ areЦ Һ Э Ю Я Ь
characteristic of Ъ the Kazakh language. At the second stage, the
word recording Ё И is
Ч translated
Щ Ц Һ Эaccording Ю Я Ь to Ъ the formalized linguistic (phonological) rules
of the Kazakh language [50]. At the third stage, the received words are translated into
transcriptional notation in symbols of international transcription.
The current Cyrillized alphabet of the Kazakh language Ё И contains
Ч Щ Ц Һ letters Э Ю Я Ь Ъ of
Ё И Ч Щ Ц 42 Һ Э Ю instead Я Ь Ъ
31 letters [53]. Of these, 11 letters were mistakenly introduced in 1940 only such that the
Ё
Ё И
И Ч
Ч Щ
Щ Ц
Ц Һ
Һ Э Ю Я Ь Ъ
writing and reading of Russian words in the KazakhЁ ЁЁtext
Ё И
И
ИЁ
ЁИИЧ
was
Ч
И
ЧЧЩЩ
Ч
Ч
ЩЩ Щ
ЩЦЦЦ
Ц
Ц ҺҺЭ
carried
Һ
Ц
Һ Э
Һ
out
Э
Һ
ЭЭЮ
Ю
Ю
Э
ЭЮ
Ю
Я
inЮ
Я
Ю
Я ЯЬ
Ь
Яaccordance
Ь
ЬЯ
ЯЬЪ
Ъ
Ъ
Ь
ЬЪЪ
Ъ Ъ
with the norms of the Russian language. These include: Ё И Ё, Ч
И, Щ
Ч, Щ, Ц Һ Ц, Э
Һ,Ю Я ЬЯ,ЪЬ, Ъ.
Э, Ю,
Therefore, for the correct construction of the transcriptor and the use of the phonetic
А А ɑ alphabet,Бwe first needБto translate the record of the word into an intermediate alphabet of
А
Ә А
Ә ɑ 31 letters.БВ Correspondence БВ of symbols with transcription is given in Table 1.
ӘЕ ӘЕ е ВГ ВГ
Е
О Е
О еɔ ГҒ
Table 1. Symbols of the current ГҒ alphabet and ɣ intermediate Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
alphabet.
А А ɑ Б Б
О
Ө О
Ө ɔɵ А ҒД ҒДА ɣ ɑ Б Б
Ә Ә В В
ӨҰ Ө
Ұ ʊɵ Current
А Ә Д
Ж А Ә Д
Intermediate
Ж Ʒ Current
Б В
Intermediate
Б ЦВ
АА Е А
А Е ɑɑɑ е
Transcription ББ ГЁ И Ч Щ ББ ГҺ Э Ю Я Ь Ъ
Transcription
ҰҮ Ұ
Ү ʊү
Alphabet
А А А
А Ж
Alphabet
А Ж
А А
А Ʒ ɑ Б Б Б
Alphabet
Б Б Б
Alphabet
Б Б
А
Ә О
Ә ЕЗ ӘЗО
А
Ә Е ɑ
ɑ е ɑ ɑ Б
ВВ Ғ Г БВ Ғ Г
Ү
Ы Ү
Ы үɯ
А
ӘӘӘA Ә
Ә З
Й
А
ӘA
Ә ӘЗ
Й Ә
Ә ( ) ɔ
(ɑ) Б
ВВВБВВЁ И Ч ЩВ Б
ВВВЦБВ
В Һ Э Ю Я Ь(b) ɣЪ
ӘЕ
Е О Ә
ЕЕ О е е ɔ ВГГ Ғ ВГГ Ғ ɣ
ӘЕЕӘ
Е ӨЕ Ә
ЕЕӘ Ө
Е (æ)
ееɔее ɵеɵе ВГ
BД
ГГ Г ВГГГBД
Г (v)
ЫІ ЫІ ɯ
ɪ ОЕ ӨЕЙ К Е
О Й
ЕКӨ Е ГҒ Г
Д ГҒ Г
Д ɣ
ОЕ О
О
ЕҰ О
ОЕОЕ Ұ (е) ɔ
еɔɔɔʊ ɔɔ Ғ Г
Г Ж Ғ
Г ЖГ ɣ (g)
Ʒ
І
Э ІЕ ɪɪ О
ӨОО
О
Ө
ОҰК Қ О
О
ӨО
Ө
К ҚО О
Ұ (ɵ
(ɔ) ɵ)ʊү Д
ДҒҒҒ
Ғ ҒЖ
Ғ
Ғ Д
ДҒҒҒҒ
Ғ Ж
Ғ
Ғ ɣɣɣɣ(ɣ) ɣ
(Ʒ)
О
Ө
Ө ӨҮ О
Ө
Ө Ү
Ө ɔ
ɵ ɵ Д
ДҒ З
Д Д
ДҒ З
Д ɣ
ЭЯ Е
ЙА ɪɑ ӨҰӨӨ ӨҮҚ Л Ө
ҰӨ Қ
Ө ЛӨ Ү (ɵɵ
ʊ(ɵ) ) ү ɵ ДД Д
Ж З ДДДД
Ж З Ʒ (d)
ӨҰ
Ұ Ы Ұ
ӨҰ Ы ʊ ɵ, u) ɯ Ж
ДЖ
Ж Й Ж
ДЖЖ
Ж Й ƷƷƷ(Ʒ)
Я
Ю ЙА
ЙУ ɑ Ұ
Ұ Ұ ҰҰ
Ұ Л
М Ұ
Ұ ҰЛ
ҰМ Ұ
Ұ ( ʊ
(ʊ,
ʊɑү ʊ ʊ
ʊ
u) Ж
Ж Ж Ж
Ж Ж
Ж Ж
Ж Ʒ
Ʒ (Ʒ)
А Ү Ы
Ү І Ү Ы
А
Ү І ү ɪɯ Б
ЗЗ Й К Б
ЗЗ Й К
Ю ЙУ Ұ
Ү
Ү ҮҮ
Ү Ұ
ҮҮ Ү Ү
Ү ʊ (ү)
ү ү ү
ү Ж
ЗЗ З З
З Ж
ЗЗ ЗЗ
З Ʒ (z)
Ё ЙО ɔ ӘЫ
ЫҮҮЫІМ Н ҮЫ
Ы
Ә
Ы
М
ҮНІ үү) ɪ
ɯ ЗЙ
Й
В
Й
ЗК ЗЗЙК
Й
В
Й
Ы
ЫҮЫЫ Э ҮЫ
Ы
Ы ЫЕ (ɯ)
( үɯɯ
ɯ ɪ З Й
Й
Й Қ З Й
Й
Й Қ (y)
Ё
И ЙО
ІЙ ɔ Ы
ЕІІА ЫЭН
IЯ Ң Ы
ЕІІАН
IҢЙА
ЫЕ (ɪɪɯ е,ŋɑi)
ɯ i) ɯɪɑ КБЙ
Й
Г
К КЙҚ
Л КБЙК
Й
Г
К
Й
Қ
Л (k)
И ІЙ Ы ІІ І ЭІІҢ
І Ы ІІҢ І І
І ɪ
ɪɯ
ŋ ɪ ɪ
ɪ Й
К
К К К
К Й
К
К К К
К
И ЫЙ ɯ ОЭЭ Я П О
ЕЕІЕ ПЙА ɪ
(
(jɔ ɪ ɪ ) i) ɑ К
Қ
Ғ
Қ Қ Л К
Қ
Ғ
Қ ҚЛ ɣ (q)
ЭІӘЭЯ Ю
Э ЕІӘ ЙУЕ ɪ(ɑ) ɪɪɪɑɪ) ɪɪ ҚВ
К ЛҚ
М ҚВ
К МҚ
ИЧ ЫЙ
Ш ɯʃ ӨЭ
ЭЯА
Я Е ЮЭП
Ё
Р ЙА
ЙА
Е
ЙA
Е
Ө А
Е
П
ЕРЙУ
ЙО
Е (y ɵ ɑ
е ɔ
Қ
Қ
Л
Д
Л
БҚ
Г
Қ
М
Н
Қ
Қ
Л
Д
Л
БҚЛ
Г
Қ
М
Н
(l)
Э
Я Я ЙА Е ЙА ɪ Қ
Л Л Қ
Л Л
ЯӘЯ Я
Я ЙА ӘР ɑ ɑ ЛВЛН
Л ЛВЛН
Л
Ч Ш Ю ЙУ ЙА (yw) М Л МЛ (m)
Щ Ш ʃ Ю
ЮҰ ЁР С ЙА
ЙУ
ЙУ
ЙА
Ұ С
ЙО ʊ(y(ɔ) ɑɑɑ ɑɔ М
Ж
М М
Ж
М Ʒɣ (n)
Щ Ш ʃ Ю
Ю ЯОЕ
ЮИТ
И
Ю
Ё
Ю С ЙА
ЙУ
ЙУЙО
ЙУ
О
ЕС
ІЙ
ЙУ
ЙУ еɑ ) Л
М
М Ғ
ГНМ
МҢ
Ң
М Л
М
М Ғ
Г М
Н
МҢ
Ң
М
ŋ
Ц С ЮҮЁ
Ё ЙУ
ЙО
ЙОҮ Т ІЙ ү ɔ
ɔ М
Н
З
Н М
Н
З
Н (ŋ)
Ю Ө
ЁОЁ Ё
Ё И
И ЙУ
ЙО
ЙО Ө
IЙ
ОТУЫЙ
ЙО (iy) ɵ ɯ
ɔɔ ɔ ɔɔ М ДҢ
НҒНН
Н П М Д
НҒНН
Н
ҢП ɣ
( )
ЦҺ С
Х ЫИЁ ИЁТУ ЙОЙО
ІЙ
Ы ЙО
ЫЙ ɯ Н
Ң
Й Н
П Н
Ң
Й Н
П ŋ
И
ИЁӨҰИ И
Ч ІЙ
ЙО
ЫЙ
ІЙ Ұ ІЙ Ш (ɯ)
( ʊ ɵɔ ʃ j) Ң
Н
ҢЖ Π Р Ң
Н
ҢЖ ΠР ŋŋƷŋ ŋ
(p)
И И ІЙ ӨУІЙ Ң Д Ң Ң Д Ң ŋ
Һ
Ъ Х И
ИІҮ ИЧ ЧШ У ІЙ
ЫЙ
ЫЙ ІШІЙ
Ш Ш ɯү
ɪ(ɯ
(t ʃ) ʃ Ң
П
К
П РҢ
Ң Р Ң
П
К
П
ҢРҢ
Р ŋ ŋ
(r)
И
И Ұ Щ
И ІЙ
ЫЙ
ЫЙ Ү
Ұ ЫЙШ ʊɯ
ɯ ɯ ʃ Ң
П
П З
Ж С
П Ң
П
П З
Ж С
П ŋ Ʒ
ЪЬ ИЧ ИЩ И ШФ ЫЙЫЙ
Ш Ш ЫЙ
ФШ ((ɯʃɪɯ ɯ ПРЙП П ПРЙ П
ПCС
Э
Ч
ИЧЫ Щ
Ц Ш
ЫЙ
Е
Ш
Ш
Ы С ɯɯ ʃʃʃ) ʃ Қ
Р
П
CС
РЗР Т Қ
Р
П Т
РЗРТР
(s)
Ь Ч
Ч ҮЧ Ч
Ч
Ц ФХ Ш
Ш Ү
Ш
CФ Х Ш
Ш (tc) үʃ
ʃ ʃ ʃʃ Р
Р Т Р
Р Р
Р Р (t)
Щ
ЩЯІ Ц Һ ЙА
Ш ІШ С
Х ɪʃʃʃɯɑ СК У
Л
С Т СК У
Л
С Т
Щ
ЩЧ ЫЩ Щ
Щ
Һ Х ШШ ЫХ
ШХ Ш (h) ʃ ʃʃ ʃʃ РЙ
С
С У
С С
С РЙ
С
С СУС
С (w)
Щ
ЮЦ
Ц Һ Ш
ЙУСЕ Х
С ʃ С
М
ТТҚ ШУ С
М
ТТҚ ШУ
ЩЦ
Ц ЭІЪ Ъ
Ц Ш
СІС
С - С ɪʃɔɪ С
ТТКШТ С
ТТКТШТ (( ʃ )
Ц
ЁҺ Ц Ц
Ъ ЙОС
Х С Т
У
Н ТШТ Т
У
Н ШТ ʃ
Һ Я Ь Х
ЙА У
УЛ φФ У
УЛ
И
Ц
Һ
Һ
Ъ
Һ ЭҺЬ Һ
Һ С
Х
Х
ІЙ
ХЕХ- Х
Х ɪɑ Т
У
У
Ш
Ң ҚУ У
У Т
У
У
Ш
Ң ҚУφФ
У
У ŋ ʃ
(f)
ЪЮ Ь ЙУ Ш МХ Ф
Х Ш М ХФ
Х ʃ (h)
ҺЪ ЪЪ Х У
Ш ШШ У
Ш ШШ ʃʃʃ ʃ ʃʃ
ИЪ
ЪЬЯ
Ь Ё
Ъ ЙА
ЫЙ
ЙО ɯɔ
ɑ Ш
Ш
Ф
П
Ф
Л
Н
ШХ Ш
Ш
Ф
П
Ф
Л
Н
ШХ
ЪЬ
Ю
Ь Ь
Ь ЙУ Ш
Ф
Ф МФФФ Ш
Ф
Ф МФФФ ʃ
Ч ЬЬFormalization Ш ʃ Ф
Х
Р
Х Ф
Х
Р
Х
2.1.3.ЬИЁ Language ЙО ІЙ
of Phonological Rules of Sound
ФХҢ Combinations
Х ФХҢ in the
Х ŋ
Щ
Kazakh Ш ʃɯ
ɔ ХН
Х
С ХХ ХН
Х
С ХХ
И ЫЙ Х П Х П
ЦИ There are nine vowel
Ч
ІЙ
СШ and 22 consonant ʃ ТҢ
sounds Р in the Kazakh ТҢР language.
ŋ
Һ И ЫЙ
Х ɯ УСП П
УСin [50]; in general, in the
Щ The formalization Ш of phonologicalʃ rules is described in detail
ЪЦ Ч Ш ʃ Ш Р Ш Р ʃ
Kazakh language, there С is a law of syngharmonicity, Т when soft affixes
Т are attached to the
Щ Ш
soft Ьstem of a word (with soft vowels), and hard Ф ʃ С Ф
affixes are attachedС to the hard stem of
Һ
Ц Х
С У
Т are rules for У
Т voicing and stunning
the word. In addition, in the Kazakh language, there Х Х
Ъ
Һ Х Ш У Ш У ʃ
consonants; for example, the word “aq manday” sounds similar to “agmanday”. There are
Ь Ф Ф
sevenЪsimilar rules for the Kazakh language. Ш Ш ʃ
Ь simplified transcription is automatically built
A Х
Ф based on the Х
Ф complete transcription
of a word. Х Х
Sensors 2022, 22, 1937 7 of 17
2.1.4. Structural Classification of Kazakh Words and Use of Generalized Transcriptions

For good representation, in the structural classification, the symbols of the language
are divided into classes, presented below. (Table 2). ЁЁ И
И ЧЧ Щ
ЩЦ Ц ҺҺ ЭЭ Ю Ю ЯЯ ЬЬ Ъ
Ъ
Table 2. Symbols of the current alphabet and intermediate alphabet.
Classes Symbols Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
Meaning
W aұыoеәүiөу Ёvowels
И Ч and
Щ consonant
Ц Һ Э Ю «У»Я Ь Ъ
C бвгғджзйлмнңр voiced consonants
F сш voiceless hush consonants
Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
P кқптфх voiceless consonants
АА АА ɑɑ ББ ББ
ӘӘ
As expected, there are few ӘӘ words with the same В
structure,
В for example,ВВfor a dictionary
of 41,791 words,
ЕЕ there are only ЕЕ three words with
е е the CWCCWCW ГГ ГГ (Table 3).
structure
О О ɔɔ Ғ Ғ ɣɣ
А О of the words
Table 3. Example А withО
a structureɑCWCCWCW. Б Ғ Б Ғ
А Ө Ө А Ө Ө ɑ ɵ ɵ Б Д Д Б Д Д
Ә Ә В В
Ә ҰҰ Ә ҰҰ ʊ ʊ В ЖЖ В Ж ЖTranscription ƷƷ
Е
Kazakh Word Е е
Transcription Г Г
Generalized
ҮҮ ҮҮ ү
үnw З З
Е бұлдaну
О А Е А
О еld(ɑ)
b(ɔ) ҒГ БЗ ҒГ БЗ
CWCCWCW ɣ
О ЫЫ А О Ы Ы А ɯ Ғ ЙЙ Б Ғ ЙЙ Б ɣ
Ө Ә
бұлдырa Ө Ә b ɵ
(ɔ) ld (ɯ) (ɑ)
r Д В Д В
CWCCWCW
Ө ІІӘ
бүлдiру Ө ІІӘ ɵ
bүld (ɪɪrwi) Д ККВ Д ККВ
CWCCWCW
Ұ Е Ұ Е ʊ е Ж Г Ж Г Ʒ
Ұ О
Ү ЭЭ Е Ұ О
Ү ЕЕ Е ʊү ɔɪɪ е Ж Қ
З ҒҚ Г Ж Қ
З ҒҚ Г Ʒ ɣ
ЯЯ О ЙА
ЙА О ɑ ɔ ЛЛ Ғ ЛЛ Ғ
Next,Ү Ү
Ө signal is segmented
Ы the Ы Ө ү ɑ
ɯ ɵ a generalized
to create З
Й Д transcriptionЙ Д as describedɣ
З
Ю ЙУ М МД
І Ю
Ы
in [46–48,54].Ұ Ө ЫІ ЙУҰ Ө ɪɯ ʊ ɵ Й
К Ж МД Й
К Ж М Ʒ
ЁЁ Ұ
ThisЭІalgorithm could І ЙО
ЙО
help Ұ
to develop ɪ even ɔʊɔ more К НН Ж
generalized К ННЖ
transcription, namely, Ʒ
Ү Е Ү ɪ ү Қ З Қ З
to divide ЯЭ
allЫИИҮsounds into
the Е ІЙҮnatural classes:
ІЙ
two ɪ ү vowels Қ ҢҢ Зconsonants.
and Қ ҢҢSuch
З a ŋ
divisionŋ
ЙА Ы ɑɯ Л Й Л Й
gives good И
Я results ЫЙ ɯɯ ПЙ ПЙ
Ю І Ы in littleЙА
И vocabularies
ЙУЫЙ І Ы as well.
ɑ ɪɯ Л К
М П Л К
М П
Ч
Ч І Ш ʃ Р Р
Ю
Ё Э
Construction of Averaged
ЙУ Ш
ЙО Е І
Standards ɔ ɪʃ ɪ Н ҚР К
М Н ҚР К
М
Щ ШЕ ɔ ʃɑʃ ɪ СС Қ СС Қ
Ё Щ
И Я Э ЙО
ІЙЙА Ш Н Л
Ң Н Л
Ң ŋ
In orderЦ to
Ц reduce the dependence
С of the recognition system
Т on the speaker,
ТТ Л we applied
ИЮЯ
И ІЙЙУ
ЫЙ СЙА ɯ ɑ ҢМ
П ТЛ ҢМ
П ŋ
the procedureҺof averaging theХstandards spoken by severalУspeakers. ForУexample,
И
Ч Ё Һ Ю ЫЙ
ШЙО Х ЙУ ɯ ʃ ɔ П
Р Н У М П
Р Н У М
Ч И ЪЪЁ Ш ІЙЙО E = (e ʃʃ, e , . . ɔ. , e ) С Р Ң ШШН Р Ң ШН
Ш ʃʃ
Щ Ш 1 2 27 С ŋ(5)
Ь
ЬИ Ф Ф
Щ
Ц И С ЫЙІЙ
Ш ʃ ɯ С
Т П ФҢ С
Т П ФҢ ŋ
Ц И С ЫЙ = , , ɯ
. . . , Т ХХ П Т ХХ П (6)
Һ Ч Х Ш A ( a 1 2ʃ a a 27 У Р
) У Р
Һ
Ъ Щ Ч Х Ш Ш ʃ ʃ У
Ш С Р Ш Р
У Сassume ʃ the
two standards of the same word, and for the sake of generality, we will that
Ъ Щ
Ь Цalready been obtained
standard has С Ш ʃ
by averaging the standards Ш
Ф Т С spokenШ
Ф by С
Т n speakers, ʃ and
Ь Ц С
Һ of the n + 1stХspeaker. We take a vectorХand
a is the standard Ф Т Ф Т
У then a j +Х. .У. + a j+k , which is
Ъ Һ
all the corresponding vectors from Х set (5) in the meaning ШУ
Х described ХШУ
above. Then we place ʃ
Ь Ъ Ф Ш Ф Ш ʃ
Ь n n a j + . . . + a j+Ф Ф
′
ei = e + Х k
Х (7)
n+1 i n+1 k Х Х
Calculating this for all i = 1, 2, . . . 27, we obtain the result of averaging the standards
for n + 1 speakers:
E′ = e1 , e2 , . . . , e27
′ ′ ′
(8)
The coefficients
n 1
, (9)
n+1 n+1
were introduced in order to make all speakers equal. However, as the number n increases,
the changes introduced by new speakers become smaller and smaller. The same procedure
allows for averaging several standards of one speaker in order to increase their reliability.
The effectiveness of this procedure becomes especially clear if we apply it to averaging the
Sensors 2022, 22, 1937 8 of 17
standards of different words. Thus, for example, it allows one to teach the computer to
perceive each word of the line
“Aқ киiмдi, денелi, aқ сaқaлды,”
as symbol 0, and each word of the line
“Coқыр мылқaу тaнымaс тiрi жaнды”
as symbol 1, constructing for them the corresponding averaged standards.
2.1.5. Codebook and Its Construction Technique

When solving the problem of determining emotions in speech, it is proposed to use
the technique of constructing a code book, which allows for significantly narrowing the
search for candidates for recognizing emotionally charged words. The knowledge base of
emotionally charged words is presented in the form of a code book.
Storing the patterns described above containing arbitrary vectors requires a lot of
memory. It can be significantly reduced, and at the same time, a significant gain in recogni-
tion speed is obtained by using a relatively small set of so-called code vectors instead of
arbitrary vectors. These latter are used to approximate arbitrary vectors and construct a
code book. Code vectors are also referred to as codebook words [46].
To construct a codebook with M size, the so-called k-means method is used.
1. Initialization:
From the number of L training vectors, we arbitrarily choose M vectors as the initial
set of words in the code book.
2. Finding the nearest neighbor:
For each training vector, we find the closest codebook vector. The set of training vectors
“gravitating” in this sense to the same code vector will be called its corresponding cell.
3. Modernization with a centroid:
For each cell, we replace the corresponding code vector with the centroid (average) of
the set of training vectors that fall into this cell.
4. Iteration:
We repeat steps 2 and 3 until the sum of the distances of all training vectors to the
corresponding code words stops decreasing by more than a predetermined threshold.
Although the described method of constructing a codebook works quite well, it has
been shown that it is more expedient to build a codebook by increasing its dimension step
by step. Starting with a book with one code vector and successively doubling the number
of code vectors using the splitting method. This procedure is called a binary splitting
algorithm and can be described as follows:
1. We create a codebook from one word, taking for it the centroid of the set of all
training vectors.
2. Then, double the size of the codebook by splitting each code vector according to
the rule.
y1 = (1 + ε ) y (10)
y2 = (1 − ε ) y (11)
here, ε is the splitting parameter with a value from 0.01 to 0.05.
3. Use the k-means Algorithm to obtain the best possible set of code vectors for a
double-size codebook.
4. Repeat steps 2 and 3 until we obtain the codebook of the required size.
The dimension of a codebook constructed in this way is a power of 2.
Sensors 2022, 22, 1937 9 of 17
2.1.6. Recognizer Using a Codebook

The method for constructing standards using a codebook consists of replacing each of
the 27 vectors included in the standard with the nearest code vector (in the meaning of the
l1 metric described above). Then, it becomes possible to store the standard in the form of
a sequence of numbers of the corresponding code vectors. This, taking into account the
need to store the codebook, gives a significant memory savings with a sufficiently large
vocabulary. Further, the recognition process is built as follows. The word to be recognized
is written as a set of 27 arbitrary (non-code) vectors. Then, a table of the distances of
these vectors to all the vectors of the codebook is built. Next, the DTW-distances of the
considered word to all standards are calculated. In this case, the distances between the
vectors are taken from the mentioned table and are not calculated every time as it was
when the codebook was not used. This consumes significantly less time. Thus, a significant
gain is achieved, both in recognition speed and in the amount of required memory.
2.1.7. Step Recognition Algorithm

If a large vocabulary is recognized and the number of standards is large, then recogni-
tion by full comparison of what has been said with each of them is too long of a process.
Accelerating
Ё И Ч Щit and Ц ҺatЭthe Юsame Я Ь time Ъ increasing the recognition reliability can be possible
by our proposed “Stepwise Recognition Algorithm”. Let us describe it using a vocabulary
of 12,000 emotional words. Its essence is as follows. First, the spoken word is compared
Ё ИЁ ЧИ ЩЧ all
with Ц
ЩDTW-based
ҺЦ ЭҺ Ю Э ЮЯ Ьstandards,
ЯЪЬ Ъ but only the first two thousand samples are involved in
the recognition. The result is a list of 180 closest candidate words (this number may vary
depending on the size of the original vocabulary. For the mentioned vocabulary, it seems to
be optimal). Further, recognition is carried out within this list using the first four thousand
samples, as a result of which the list of candidates is halved. Then the same is performed
sequentially for segments of six thousand, eight thousand and, finally, ten thousand sam-
А ɑ Б We arrived
ples. Б at this algorithm, which provides faster and more reliable recognition of
Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ Ё ЁИИЧЧЩЩЦЦҺҺЭЭЮЮЯЯЬ ЬЪЪ
Ә large
В vocabularies, В as follows. At first, a system was made that worked with voice input
Е е with
Г preliminary Г selection of a sufficientlyЁshort segment of the recognized vocabulary
А А ɑ ɑ Б Б Б Б И Ч Щ Ц Һ Э Ю Я Ь Ъ
О ɔ by
Ғ typing one,Ғtwo or three initial ɣ characters
Ё И Ч of Щthe ЦwordҺ Э being
Ю Я entered
Ь Ъ on the keyboard.
А
Ә Ә ɑ Б В
В ConvincedБ В of the high reliability of this method, we noticed that close words (words with a
В
Ө ɵ Д Д Ё И ЧЁ Щ И Ц Ч Щ Һ ЭЦ Ю Һ ЯЭ Ь ЮЪЯ Ь Ъ
Ә
Е Е е е В
Г Г similar В
Г beginning)
Г are recognized. Recognition Ё ЁИИ Ч ofЧЩwords
Щ ЦЦ Һ with
Һ
ЭЭ Юdifferent
Ю Я Я Ь Ь ЪЪ origins should be
Ұ ʊ Ж Ж Ʒ
Е О
О еɔ ɔ Г
Ғ Ғ evenГ Ғmore Ғ reliable,ɣandɣ recognition on a shortened initial segment is sufficient to isolate
Ү ү З З Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
О Ө
Ө ɵɔ ɵ Ғ Д
Д ДҒ Д
this beginning ɣ
[49,50]. Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
Ы ɯ Й Й
Ө
Ұ Ұ ʊɵ ʊ Д Ж Ё И ЧЖ
Ж ДЩЖЦ Һ Э Ю Ʒ ЯƷ Ь Ъ Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
І ɪ К Defining Emotions
2.2. К
Ұ ҮА
Ү ʊү ү ɑ АА Ж
З З Б ЖЗААЗ Б Ʒ ɑɑ ББ ББ
Е ɪ Қ The classification Қ uses the six main Ё И Ч
classes Щ Ц Һ Э Ю
ofВemotions that Я correspond
Ь Ъ to Ekman’s
ЫҮ ЫӘ ү
ɯ ɯ З
ӘӘЙ Й В ЙЗӘӘЙ В ВВ В
ЙА ɑ Л
classification, Л
such as anger, fear, Ё И БЁ
disgust, Чhappiness,
Щ ЧЦЩ
И ҺЦ ЭБ Ю ЭЯ Ю
Һ
sadness ЬandЪЯ Ь
one Ъneutral class. The
Ы І І Е ɪɯ ɪ е Е ЕЙ К К ГА Й
КЕ ЕК Г А ее ɑ ГГ ГГ
ЙУ А М
neutral А class М
includes all
ɑ cases that Б
could
Ё И Ёnot
ЧИЩ Ч defined
be ЦБЩҺЦЭҺfor Ю Эother
ЯЮЬ Яclasses.
Ъ Ь Ъ
ЕІ Е О ɪɪ ɪ ɔ ООК Қ Ә Қ Ғ К
Қ ООҚ Ә Ғ ɔɔ ɣ ҒҒ В ҒҒ В ɣɣ
ЙО ɔАӘ А Н А Table
Ә Н
4ЕАshows a ɑfragment ɑ of the БВ emotional
Б БВ
vocabulary Б of the Kazakh language:
Е ЙАӨ
ЙА ɪɑ ɑ ɵ ӨӨЛ Қ Е
А АЛ Д Ң Л Қ ӨӨА ЛА Д ɵ ɵɑ ɑ е ДДБ Б Г ДДБ Б Г
ІЙ ӘЕ О Ә ӘЕ О Ә Ң е Ʒɔ ŋ ВГ ҒВ ВГ В
ЙА
ЙУ ЙУҰ ɑ ʊ ҰҰМ Л М Ж Л
М Ұ Ұ М Ж ʊ ʊ Ж Ж Ё И Ё ЧИ ЩЧ Ж Ц
ЩЖВҺЦВҒЭҺ Ю Я ƷЬƷЯЪЬɣ Ъ
Э Ю
ЫЙ ɯ ЕОӘ Ә Е П
Table Е 4.
О
Ә Ә П of the emotional vocabulary.
FragmentЕ е ɔ е Г Ғ
В В
Г Г Ғ Г
ЙУ ЙОҮ
ЙО ɔ ɔ ү АҮҮМ Н
Е Е ӨНЗ А
М
НҮҮЕН Е ӨЗ ɑү үе еɵ БЗ ЗЁГ ЁИ ГДИЧ Ч ЁЩЩИБЗЦЗЧ Ц
ГҺЩ ГД
ҺЭ ЭЦЮЮҺЯ Э ЯɣЬ ЮЬЪ ЪЯ Ь Ъ
Ш ʃОА Ө ҰО Р О А
Ө ҰО Р ɔ ɑɵ ɔ Ғ Б
Д Ж Ғ Ё И ҒЧБ
Д Щ Ж ҒЦ Һ Э ɣ Ю Я ɣЬ Ъ
ЙО
Й ІЙ Ы
А Ш
ɔ ɯ Ә
ʃ ӨӘ
ЫЫҢ Н
ОАОҢ Б
Й
С
Word Ә
Н
Ң ЫЫО
Ә
ҢА
Б
Й
О С
ŋ ɯɯ ŋɔɑ ɔ
Transcription ʊ ВЙЙҒБҒ
В
Translation В ЙЙҒБҒ
В
POS ɣ ɣƷ Emotion
ɑ
ЕІҰІҢ ҮӨ ЕӨҰ ҮӨ ʊɵ ɪ i) үɵ ГД ЖК ЗД ГД ЖК ЗД Ʒ
Й ЫЙ І
ЫЙ ɯ ɯɪ П
ӨӘӨ П К дiрiл Ң
П І ІӨ ПӘ К d(еɪ(ri)
ŋ К ВД К ДВДN
ВӨҰ Т еүl ɵ ʊɵ Д
trembling fear
Ә С Ұ Е
Ү В
Ұ Т ҰЕ Ү ʊ Ж Г
З Ж Ж Г
З Ж Ʒ Ʒ fear
ЫЙ
Ш ШЕ ɯʃ ʃ ɪ ОЭЭА П Ы
ҰЕҰР Қ
Р О
П А
РЕ ЕҰРЕҰҚ
қoрқaқтық Ы q(ɔ) ɪʊɪ еʊ
rq (ɑ) qt (ɯ) q Ғ Б
ҚҚЖГЖ Й
cowardice Ғ Б
ҚҚЖГЖ Й N ɣ
Е Х е АҮО ЫРА ІҮГ Л Уaқылсыз АЙА ҮОЫӘА Г Ү У (ɑ)qү ɔ lswzү БДЗЛЙҒ stupid З БДЗЛЙҒ ЗN ɣƷ Ʒ
ʃʃ ʃ ɑ ӨЯЯӘ Ө І ɵ ɪ В К В К
Ш ШЙА Р ЙА Л (ɯ) Л Б Л Б anger
С О
Ү ҮҒ С С СО
ҮҒ Ү Ш ɑ ɑ ɑ
ɔ
ү ү( ʃ Ғ
З З Ғ
З З ɣ
О ɔ Ә Ы Ө І А Ы Шқызғaныш
Ә Ы ӨІ А Ы q(ɯ) ɪ ɵz(ɣ) (ɑ) n(ɯ) В Й Д
К Б Й
jealousy В Й Д
К Б ЙN anger
ШС СЙУ ʃ АҰЮЮ Е
С
ТӨӘЫ ЭТ М АҰ
С
ЙУ
ТЙУ ЕТӨӘМ Е ʊɑ ɯ е ɪ Ж
БММЙ ГДВЙ Қ Ж
БММЙ ГДВЙ Қ Ʒ
Ө ɵ Е І Ұ
Э
Ы
Ә Д
І Фқұрмет
Е І Ұ
Е
Ы
ӘДЫІ Ф qе(ʊ, ɪ u)ɵɯɪ
rmеt
ɪ Г КЖҚ В
honor К Г КЖҚ В КN Ʒ happiness
С
Х ХЙО ɔ Ә ҮЁ ЁО Е ІЯУ Н нәзiктiк
ТІҰ
У ӘҮ
ЙОТЙО
У ОІУҰ
ЕЙАНІ Х ү ɔ( ɪɔɔʊi)(еɪ i)ɑ ВЗННҒ К ГК
Ж Л ВЗННҒ КЖГК Л ɣƷ
Ұ ʊ Ү
ЭЯӨО ЕЮ Ж
Э Х Ү
Е ӨО ЖЕЙУ Е næz ү Ʒ
kt k
еɔ ɪ З tenderness
ҚЛДГҒ М Қ З
ҚЛДГҒ М ҚN happiness
Х ІЙ ОЫЕ ИИ УАҮЭ
Ш Ш Ң шaпшaң
ОЫ
ШЕЙА
У
ІЙ ІЙШ ҮЕ Ң ɔɯ еɪp(ɑʃɵ(ɑ)
( ʃ(ɑ)
ҒЙ
ГБ ҢҢҚ ҒЙ
ГБ ҢҢҚБЗҚAdv ɣ ŋ ŋ ɣ
А ЫЭ А Ы ЕА ɯ ɪɔү ɪ ɑ БЗҚ
(ŋ) quick happiness
Ү ү Ө Я Ю О З
Я Ө ЙА ЙУ ОЗ ЙА ɵ ɑ Д Л Й
М Ғ Л Д Л Й
М Ғ Л ɣ
ЫЙ ɯ О ІАИ
ӘІАЯӘИ Ұ
Ш
Ф Ө Ё
ЫЯЙАФ П О
Ш
ЫЙ
Ф
АІ
ӘІАЫЙ
шaрaсыздaн Ұ
ЙА Ф
ӘӨ
Ы ЙО
ЙААП ( ɪ
ʃ ɔɯ ʊ
ɯ
(ɑ)r(ɑ)s(ɯ) ɵ
ɑ zd ɔ
ɑ(ɑ)n В К
Ғ П П
Б БЛЖ Д Н
ВЙЛ Б
involuntarily К
Ғ П ПЖ Д
Б БЛВЙЛAdv
В Н Б ɣ Ʒ sadness
Ы ɯ Ю ӨИ Ю ҰЙУ Й
Ө ЙУ ʊs(ɵ)ɪɪzq ɵ u)r МК ДҢ М МК ДҢ М
Ш ʃ ҰӨ Э
ӘЕЧЁӘЧЮҮ
ХЕҰІЮ
Ф АХӘР сөзқұмaрӨЕ
Ф
Ә
ЙО
Ш
Х
Е Ш
Ә ҮХ
ЙУ Е ІЙ
ҰІЙУАРӘ е ʃ ɔ(ʊ,
ʃүɪеm(ɑ) ЖД
Қ В
ГРНВ З
РМЖ
ГК Б В
garrulous,
М
ЖҚ
Д ВРНВ
chatty
Г ЗГЖ
РМ КМ Б В
Adj Ʒ Ʒ ŋ disgust
І ɪ Э Ұ Ё К Е К Ұ ɪʊ ɔ НҚ Ж НҚ ҢЙЖ
Ш ʃ ҮҰ Я
ОЩ
Е
ЁИ
Щ
Е Ы
Х
ЁОҮЁ
Э И
Ә ЕС ҮЙО
ЙАҰОХ
Ш
Е
ІЙШЕЫО
ЙО Ү
ЕЙО
ЙО
ЫЙӘ
СЕ ʊүеɔɑʃɔеʃɯ ɔүɪ ɯɔ е ЗЖ Л Г
ҒСҢС
Г Й
НҒЗН
Қ ВНГ ЗЖ
П Л ГСЛС
Ғ ГНҒҚЗН ВНГ
П Ʒɣ ŋ ɣƷ
Е ɪ Ы И ИЯ Ү Қ
И Ы ЙА
ІЙ
ЫЙ Қ
Ү ІЙ ɯ ɯ ɑ ү Й Ң Л
П З Ң Й Ң П З Ң ŋ ŋ
С Ю
ҮО
Ө ЦО ЦИІӨЫ
ЯИ Ч
Е ОТ ЙУҮО
Ө СО С ІӨ
ЙА
ІЙ ЫІЙШ Е
ТО үɵɔ ɔɪ ɵɯɑеʃ ɔ М
ЗДҒТ ҒТК
ҢДЙ
ЛҢ Р
ГҒ М
ЗДҒТ ҒТК
ҢДЙ
ЛҢ Р
ГҒ ɣ ɣŋ ŋ ɣ
ЙА ɑ ІЫ И Ю Ы
ЧҺЭ І Щ Л
И ЫЙ
ІЫХЙУШ ЫЛ ЫЙ ɯ
ɪ ɯɔ ʃ ɪɪ ɔʃ ɯ ɯ КЙ П М Й
РУҚК С П КЙ П М Й
РУҚК С П
Х Ё
ӨҰҺЁӨ ИЮИ
Ұ О ӨУ ЙО Ө
Ұ ӨХЕЙУ
ЫЙ Ұ ІШ
ЫЙ О
УӨ ʊɵ ɔɵʊɯ ɯ ɵ Н
ЖДУНД Ж
П МПҒ Д Н
ЖДУНД Ж
П МПҒ Д Ʒ Ʒ ɣ
ЙУ ЭИ Ч Щ І М
Ч Е ШЙОШ М І Ш ɪ ʃ ʃ ɪ ʃ Қ Р С К Р Қ Р С К Р
ІҮЪ Ъ Я Э
ҰИҰЧҮЁЧНҰ Ц
Ө Ш ІЙ
І
ҰІЙ
Ү ЙА
ҰШЙО
ҮЕ Ө
Ш
ШҰС ɪ
ʊүʊ ү ɑ ɪʃ ɵ
ʃ ɔ ʃʊ Ң
К
Ж Ш
ЗҢ Ш Л Қ
ЖРЗНР ЖТ
Д Ң
К
Ж Ш
ЗҢ Ш Л Қ
ЖРЗНР ЖТ
Д ŋƷʃŋƷʃ Ʒ
ЙО ɔ ЯЭЩ Ц Э Щ ЙА Ш С НЕҰ Ш ɑ ʃ ɪʊɑ ʃ Л СТМҚ С Л СТМҚ С
ИЫҮЬИҮ Ю
ЬЩЫЯ Һ
Ұ
ИЩҢҮФ ЫЙ Е
Ү
Ы ЙУ
ҮШЙА
ІЙ
Ы Ш Х
ФҮ ɯ ɯɪ
ү ү ɯ
ʃ ʃ ү П
ҚЙЗФЗФС ЙЛ
Ң Ж
СУ З П
ҚЙЗФЗФС ЙЛ
Ң Ж
СУ З ŋ Ʒ
ІЙ ЦҺЁЮ ЯЪ ЫЙ ҢС
ЙА ɯŋ ТП ЛШ ТП ЛШ
Ю Ч
ЯЫ І Ы Ц И
І ҮЦЫХ
Ц
ЙУ
ЙА ШЫ
СХЙО
І Ы ЙУ
ЫЙ
С І СҮХЫ ɪɯ ɑʃ ɯɪɔɯɑ ү ɯ МЛ Р
Й
К ХУЙ
ХНТКМ
П ТЗТЙ МЛ Р
Й
К ХУЙ
ХНТК
Т
ПТЗ Й
М ʃ
ЫЙ ɯ Ч
Һ ЮҺ П Ш
Х ЙУ ПХ ʃ Р
У МУ УР М У
Ё ИЁ ЧИ Щ
Ч Ц
Щ ҺЦ ЭҺ Ю
Э Ю
Я ЬЯЪЬ Ъ
ЁSensors
И Ч2022, Ц 1937
Щ 22, Һ Э Ю Я Ь Ъ 10 of 17
The works in [41] describe methods for determining the polarity of the text. When an-
Б Б Б Б alyzing sentiment and emotion, the results of morphological [55,56] and syntactic analysis
Б В Б В of the Kazakh language Ё И Ч [57–59] Щ Ц Һ areЭused. Ю Я Ь Ъ
В В
Ё И Ч Щ
To apply this method to other languages, Ц Һ Э Ю Я Ь Ъ it is necessary to determine the list of
ВГ Г В
Г Г Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
Г
Ғ Ғ Г
Ғ Ғ ɣ ɣ words that give emotional coloring to speech, and then the proposed method will, after
Ғ Д Ё И ЧД ҒЩДЦ Һ Э Ю ɣ Я Ь Ъ performing simplified speech recognition, find words that have the same structure as
Д Б Б words with Ё emotional
ИЁЧИЩЧ Цcoloring ҺЦЭҺЮ ЭЯ
and aЬhypothesis
ЪЬ Ъ will be put forward that the sounding
Д Ж
Ж В
Д Ж
Ж В Ʒ Ʒ ЁЁИИ ЧЧЩ ЩЦЩ ЦҺҺ ЭЭisЮ Ю ЯЮ ЯЬЬЯЪЪ
speech is emotional and there a need to perform full-fledged speech recognition, which
Ж
З ГЗ Б ЖЗ ГЗ Б Ʒ
is demanding on computing resources.
З ЙВ
Й ЙЗ ЙВ Ё ЁИ ИЧ ЁЧЩИ ЦЧЦҺЩҺЭЦ
Й
Ғ
Й
Ғ ɣ
Ё И ofЧЩ ЭЮ
Щ Ц ҺVocabulary
ҺЯЭЬЮЪ Я Ь Ъ
ЭЮЮЯЯЬЬЪ Ъ
К ДК Г К ДК Г 2.2.1. Construction Emotion Generalized Transcriptions
А А ɑ Ё Ё И И Б ЧЧ ЩЩ Ц Ц ҺҺ ЭБЭ ЮЮ ЯЯ Ь Ь ЪЪ
К
Қ Қ Ғ К
ҚЖҚ Ғ
А ЖӘ А Ʒɑ ɣ The Б emotion Ё Иvocabulary
Ч Щ Ц Һallows Э Ю for Я Ь Ъ
constructing its generalized transcriptions and
Қ ЛД
Л Қ ЛӘ
Л Д В Ё БИ ЧВЩ Ц Һ Э Ю Я Ь Ъ
applying them to search for words from emotion vocabulary. Examples of emotional
Ә ЗА ӘЗ А ɑе В ЁБ ЁЧИИ ВЧЦ Щ Б Ц Һ ЭЯ Ю Я ЬЪЪ
Л Е
М М Ж Л МЖ
М Е Ʒ Ё И
Kazakh wordsВwith generalized ГЁ И ЧЁ
И ЩЧ Щ Ч
Ц
Щ ЩҺЦ ГҺ
ЭЦ
Һ Э
Ю Һ
Э Ю Э
Ю
Я Ю
Ь Я Ь
Ъ Я
Ь Ъ Ь
Ъ
transcription are shown in Table 5.
Е ЙӘ ЕЙ Ә е Г Г В
Н КО
М НЗ Н КН О
М З ɔ Ё ЁИҒИЧ Ч ЁЩЩ ИЦЧ ЦҺЩ ҺҒЭ ЭЦЮЮ ҺЯ Э ЯЬ Ю ЬЪ Ъ ɣЯ Ь Ъ
АОҢ ЕБ
Ө А ОБӨ Е ɑ ɔ Table еɵ Б Ғ Ё ГИ ЁЧ ИЩ
Д Б Ч Ғ Ц Щ Һ ГЦЭ ҺЮЭЯЮ
Д ɣ Ь Я
Ъ Ь Ъ with generalized transcription.
НА Ң Й НА Ң
Ң Қ ВАЙ ŋ ɑŋ ɑ ɔ БЁ БИҒ Ч Б Ғ emotion vocabulary
А 5. Example of words Б from
А Ө ҚОВ А Ө О ɑ ɵ Б Д Щ Б ЦД Һ Э Ю Я Ь Ъ ɣ
Ә Ң ҰПК Ә
Ң ӘК Ұ ŋ ʊ В ВЖ В ВЖ Ʒ
Ә ӘПӘЛӨ Г Word Ә
ПӘЛП
Ұ ГӨ ɵ В В ЁДЁ ИИ ЧЧ ЩВ ВЩ ЦЦ ДҺҺ ЭЭ Ю Ю Я
ƷЯ ЬЬ Ъ Ъ Emotion
ЕҰП РЕ ҮР Қ Е
ПРЕМРЕ Қ Ү еʊ
Transcription
еʊ ү Г Ж
ГЖ ЗTranslation Г Ж
ГЖ З POS Generalized Transcriptions
Е А Е М Ұ ҒАқoрқaқтық Е А Ғ Ұ А е е ɣ Г БГ Б Г Б Г Б Nɣ Ʒ
ОҮА РА Ы О Ү
РО СА А Ы ɑ ү ɑ Ғ З Б Й Ғ З Б Й
Ғ Б Ғ Б
С Л ОЛ q(ɔ) rq (ɑ) qt (ɯ) q cowardice fear PWCPWPPWP
ААО ӘО
Ы
ОНҮС Д Ә А
С
О ӘЫ НД Ү Ә ɔ ɔ ɑɔ ү Б Ғ ВЙ
ҒЗ
В Б Ғ ВЙ
ҒЗ
В ɣ ɣ ɣ
Ө Ә Ә ІТ М aқылсыз АТӨҢӘТӨ
Ө ӘМ І (ɑ) ɑɵ ɵ ɵ ɪ
q (ɯ) lswz Д ВЁД
Б ВИ К Ч ЩБ
stupid Д ЦВД ВҺКЭ Ю Я Ь Ъ N anger WPWCFWC
Ө ӨС ТӨ
АҢЫ ЖЕ көз жaсы
С
Ө ЕІ ЕЖ АЫ еŋ ДГД БЙ ДГД БЙ
ӘӘҰЕІ Е ТҰ
У Э А
Е У Нқұрмет Ә ӘҰ
ТҰ УҰ
У Е Е ʊ
Е Н А q(ʊ,rmеt
k(ɵ) ɪ
z(Ʒ) (ɑ)s(ɯ)
е
ʊеɪ ɪ е В Ж
ɑ ВЖ Ж К Г Қ
Г honor
Г
tear В Ж
Б ВЖ Ж К Г Г
Г Қ Б NƷƷ Ʒ
N sadness PWC CWFW
Ұ ОҰ Ә П ІЗОӘ Ұ О П ӘЗ І О ʊ ɔ u) Ж Ғ В К Ж Ғ В К Ʒ ɣ happiness PWCCWP
ЕЕҮЭА О
УА О ААҢшaпшaң
Я ЕЕ Ү
УҮА
ЕООЙА АА Ә ( еʃ(ɑ) еүү ɪ ɔ ɔ ɑɑɑɔ ГГЗҚБҒ Ғ Л БҒ В ГГЗҚБҒ Ғ Л
Бquick
Ғ
ББ В Adv ɣ ɣ
ɣ
А
Ү Ө Ү Ш
Е ҮРЭ Ш
Й
ША
Ү Ө Р ША
ЙЕҮҢ Е ү ɵ p( ɑ ʃ(ɑ)
еү ɪ
(ŋ) БДЗ Б
З ГЗҚ БДЗ Б
З ЗҚ
Г happiness FWPFWC
ОО Я Ө
ӘПЕ Ё ИО ЙА Ч Щ Ц Ө Һ Э ɔ Ю ɑ Я Ь ɵЪ Ғ Л Д Ғ Л Д ɣ
ЫА
Ә Ы АӘ
ШӨ
ФӘ ЫӨЮ ӘФ А Ы
О
ШФ
А
Ә
шaрaсыздaн Ы АӘӨФ
ӘЫӨЙУ ӘӘ
П А ɯ
Е ( ʃ(ɑ)ɔɯ (ɑ)
r
ɵ(ɯ)
s ɵ zd(ɑ)nе Й ҒВ
Б Й Б ДЙ
ВВ ДМ В Г Й
Вinvoluntarily
Б Ё И ҒВ
Б Ч
Й Б ДЙ
ВВДМ
Щ ВЦВ Г ɣ
Б Һ Э
Adv Ю Я Ь Ъ sadness FWCWFWCCWC
Ы
ӨӨ ІҰ ОСЯ КҰ ӨӨ Ы ҰСҰО КЙА Ұ О ɵɪʊɯ ʊ ɔ ʊ ɑ Й Ж ҒН ЛЖ Й Ж ҒН ЛЖ Ʒ ɣ Ʒ
ӘЕ
Ю
АІӘ Ұ
Е
Ф
ХӨЕ ҰІ АЕХЕ
Ё Ә О
Рсөзқұмaр Ф
Х
Ә
Е
ІЙУА ІӘ ЕХ ЕҰІЙОА
РЕЕ Ә s(ɵ) е ɪ ɑ(ʊ,
zq еɪеm(ɑ)u) еɔrе ɔ ДД КВГ
М
БВЖ
К ГГЖ
Ё К
ГЧ ҒЩ
ИБГgarrulous,
В
ДД КВГ
М
БВЖ
Ц
К ГЖҺ
К
chatty
Г БГЭГВЮ Я Ь ƷЪ
Ғ Adj Ʒ ɣ disgust FWCPWCWC
І ҮЁҮ ИО Т Ю Қ І Т
Ү Ү ІЙО ӨҚ ЙУ ɪ ү ɔү ɵ К Д М К Д М
ҰА
ҰЭ Ә ОХ Ү О
Ә
Ү тaту
СӨ ҰАҰЕ
Х
ЙО Ә О Ү О
Ә
С
Ү Ө ʊ(ɑ)
ʊt ɪ tw ɔ ү ɔ ɔү ɵ Ж ЖБҚЗН В З
ҒЁ З Ң
И
ВЁ
Ғ ҒЗЁЧ
И Д
amicablyИЩЖ
ЧЖБҚ Ч
ЗНЗ ҢҒЗ Adj
Щ
В Ц Щ
Ғ ЗҺ
Ц ВҒЦЭ
Һ Һ
ДЮЭ Э Ʒ
Ю
ЯƷ Ю Ь Я
ɣ ЯЪЬ ŋ
ɣЬ ɣ Ъ happiness
Ъ PWPW
ОЕЫ ЭЕ ҰЭ
О УЁЛӨ Е ЕЫЕУ
О ЕО ҰЕ ЙО Е еɔɯ ɪеʊnɔ(ɑ) ɪ ɔе ГЙ
Ғ Қ ГЖ ҒҚ Г ГЙ
Ғ Қ ГЖ ҒҚН Г ɣ ɣƷ
ҮӘЭ
Я И А Ы А И Ытиянaқсыз
Ұ Ү
ЙАӘЕ ІЙ А
Ы Л АЫЙ ЫҰ ү
Ө tiyy ɪɑ(ɑ) ɯ ɯ
qs ɵ
(ɯ) ʊz ЗВ Қ
Л Ң БЙБН ПЁ ЙЁ
Д И
fragile
Ж ИЗВ
Ч Қ
Л Ң
ЧЩ БЙЩ БЦПЦ Й
ДҺ ЖҺЭ ЭЮ
Adj ŋ
Ю Я ЯЬ Ь
Ъ Ъ
Ʒ anger PWCCWCWPFWC
ҮОЯОӨ
Ө Е Ө ЫЯ Ө
Е ОТпiшту!ЙА ҮО
ӨЙА Е Ө
ОЙА ӨЫ Ө
Е
ТО ү е
ɵɔʃi)ɑtw! ɵ ɯ
ɔ ɵүɑ ɔ еɵ ЗД Г
ҒЛҒ ДД Й Д
Г
Л my Ғ ЗД Г
ҒЛҒ ДД Й Д
Г
Л Ғ Intj ɣ ɣ ŋ ɣ disgust
Ш ІШ
ЫЕЯІО
Ю
Ы ИӘ ҰІҮ ӘИ
І Ч
О
Ұ
МҰ ІҮ
У
Ы
ЙУЕЫЙ
Ы О
ӘІМ
Ұ
ҮӘІЙ
І ШО
Ұ
У ҰІ Ү ɯ p(е(ɪɑɯ
ɯ ʊ
ɔ ɪ ɪ ʊɔʊʃɪɵ ү ЙЙ ГЛКП
М Ғ
ВКЗВҢ
Ж К Р
Ғ
Ж ЖК gosh
З ЙЙГЛКП
М Ғ
ВКЗВҢ
Ж К Р
Ғ
Ж ЖКЗ
ɣ ƷƷ ɣƷ Ʒ
PWFPW!
ЮӨҰЭ Ю Ө Ю
ҰФИ
Ы Ө ЙУ Ө
ҰЙУ ӨФ ЙУ
Ұ
Ы Ө ʊ ɵ ɵʊ МЖД М Д Ж
ЙМ Д МЖД М Д Ж
ЙМ Д Ʒ
ЧЕ ЕЩ НЭ туу ІОІЕ Ш ЕЕНЕЫЙ Е Ы ɪ ɔɔɪ ʃеɪ еɯ
tww ɯ ʃʊүɪ ɯ КК ҒНҚД РГЗ ГП Қ Й КҒНҚРГҚ ГП
Holy ҚЙ ɣ Intj sadness PWW
ОЁ
ІҮ
Ұ А ӨЁҰ ЭЭ
Ү Ү Ё ҮҮ
Ө ҰШ Ы
уaй
ЙО Ұ
Ү А
ЙО ӨҰҮ ЙО
Ү ЕШШҮҮ
Ө Ұ ʊ
wɪ (ɑ)
ү ʊɵyɔүүɪɔʃ ɵү ЖЗ БНЖ ҚН
З ҚД С З
ЗWow
Ж КЖЗ БД
НЖ ҚД
ЗЗН СЗЖЗ
IntjƷ Ʒ Ʒ happiness WWC
ЁЯ ОЯІО ХЧ ҢЫЯІ ЙОЙА Х
ОҢ ІОШ ɪɵɪɔɑ(ɯ) ɔŋɪɑɔ ʃɯ(ɑɪ i) ҚД НЛСҒЛКҒ Р НЛСҒЛКҒ Р
ЭӨ
И
Э Ү Ә
Щ
Ұ ҮЫ Я
И Ц
Ұ
Ы Ү Ф
ЕӨ
бұзықтықІЙ
Е
Ү Ә
ШҰЙА
ҮЫ ЙА
ІЙ
iстеу СЙА
Ұ
Ы
Ф Ы І b(ʊ,
Ү ү z
ʃu)
ү qtʊ
ɑ(ɯ) qү stеw ҚҢЗ ВЖ ЗЙ Л
Ң ЖЙ ЙЛ К ҚД
Тroughhouse
З ҚҢЗ ВЖ ЗЙ Л
Ң ЖТЙ
Й
ЛК ŋ ɣ ɣ
З V Ʒ ŋ Ʒ anger CWCWPPWP WFPWW
ИЫ И Ы Ы
ІЙ ІЙ Ы ɯ ɯ Й
Ң Ң Й Й
Ң Ң Й ŋ
ЯҰ
И
Я
Ю Ц
Ү
ӨЮІ
ЭӨЩ
Ю Һ
Ү
ПЮ
І Э ЙА
І Хбәрекелдi
ЫЙ
ЙАҰЙУ
Ү
ӨП
СЙУ І
ЙУ
ЕӨШ Х
Ү
Х І
ЙУ ɯɑɑүɪɵ ɪɵɪү(ʃɪ i) ɪ Л
І Е ʊbærеkеld ЖЛП М
ЗТД М
К
ҚДС
М У
З
К
МҚ Л
К
Bravo ЖЛП М
ЗТД М
К
ҚДС
М У
З
К
М Қ Intj
К Ʒŋ happiness CWCWPWCCW
Ы ІЕ
ЁИ ЫЯ ІИЦРЫ Ы
ЫЙІЕЫЙА ЫЙІ Ы ɪɯеɯɔɯɪɯɑ ɯ ЙК ГПЙК ПТ Й ЙК ГПЙК ПТ Й
ЮИ
Ү
ЮЧ О
І ЫҺҰ
ІЁЧ
Э
ҰЪЭ
ЁЫ
Ё Я ЙУ
Э ІБәй
ЫЙ
ЙУ
ЙО
Ү
Ш О
І ЫХЙОҰРҰС
ІЕЙОШ ЫБЙО
ЕЕІЙА æy
ɯ
үʃɯ
ɪ ɔ
ʊ ʊɔ
ɪ ɪ ɪʃɔɯɪɪ ɪ ɑ М
ɔ З
М
П
РК
Н
Ғ
УЖ
Й К НЛ
Қ Н
ЖШ
Р Й
Қ
Н Л МЗ
Қ
hey
К М
П
РК
Н
ҒЙУЖ
К НЛ
Қ
ЖШҚ
РЙ
Н
Н
Қ К Л Intjɣ
Ʒ Ʒʃ anger WC
ЧЭ И Ч ЮЭ ҺА
С ШЕ
ІЙ Ш ЙУ Е
С ХА ʃ ɪ ʃ Қ
Р ҢР Қ
М УБ Қ
Р ҢР Қ
М УБ ŋ
ЁЫ Ъ Ү Ү И В ЙО
әттеген-aй Ү Ү В ІЙ ү
ɔʃɔɪ ɑ ɪ ɑ ɑ Н
ættеgеn- ү (ɑ) y Ш З З Ң
ЛМН
What a Ш
pity З З Ң Intj ʃ ŋ sadness WPPWCWC-WC
Щ
ЁАЭ
ЯӨ Щ ІЭЯИЩ
ЯИЬ ЯІЯЭЮ қaп ЙО
Ы
ЙА
ШЕ
АӨ Ш ІЕІЙ
ЙАЙА ІЙ
Ш ЙА ІЙА ɯ
ЕЙУ q(ɑ) ɵɪɑpʃ ɪ ɑɔʃ ɪ
Й
Н СҚ
Л
Б ДК
С ҚЛҢҢ
ЛСШ
Ф
Л
К Й
Н
Қ a shame СҚ
Л
Б ДК
С ҚЛҢҢ
ЛСШ ЛЛ
Ф
К Қ М Intj ŋ ŋ ʃ
ИЩ И ЬЫ Ё Ы ЪӘТИ
Ю Г ІЙ Ш
ЫЙ Ы ЙО Т Ы Ә
ГЫЙ
ЙУ ʃ
ɯ ɯ ɯ ɯ Ң С ПФ Й Н Й В
it’s
МП Ң С ПФ Й НЙ ВМ П ŋ sadness PWP
Ц
ИІ А ЭЯЮ ИА И Ю
А Ё
Э Я мaсқaрaй ІЙІ
СА ЫЙ
ЕЙУ АСЙУ
ЫЙ АЕЙА ЙО m ɪ ɪɯ ɯr(ɑ)ɪy ɑ ɔ Ң К Т БҚ МПБП ХWhat
БМ
Қ Н aҢ К Т БҚ МПБП Х Л Н Intj
БМ
Қ ŋƷ
ЮӘ
ЦЯ Ұ Ц Ю Ц ЬЕ ЙА
ЙУӘ Ұ ЙА
С ЙУ ЕШ ʊ (ɑ) ɑ sq ɑ(ɑ) М
е ɔʃ ПТРХКҢКФГНЛ
Ж
В Т Л М Т Л МЛ
Ж
В
mess
Т Л МТ
ТРХКРҢКФГН
sadness CWFPWCWC
ИЭҺӘ Ч ІЧИ А ІА УЁЧҒмәссaғaн
И ЫЙ ЕС
ХШШ ІІЙ
АУІ А
ӘҒЙО
ɯɯɪ ʃɪɑ (ɑ) ɪ (ɑ)
ɔʃ(ɣ) Қ УЗ РВБУ БGeeР ПҚ У БУ БМ Р Ң Intj ŋ
И
ЮЕ
ҺЁ
Щ ҮҺЯЮ ЁӘЁЧ
И ҺӘ Я
ЁЮ
О
Ш
ЫЙЙУ
ЙО Е
Х
ЙА
Ш
Ә
ҮЙУХЙОЙО
ЫЙ
Ә
Ш
ШХЙА
ЙО
О ЙУІЙ mæss еүɔʃ ʃɯ ɔʃ ɔɑɔn П М
Н
УГСВЛ
УМ Н Н
П
РВЛН
ХҒ МҢ П М
Н
УГЗВЛ
СУ МННВ
П
РВЛ Н
Х Ғ ɣŋ
ŋ fear CWFFWCWC
ЧЯЪ Э
Щ Ә ЭӘИ Щ Д Ш
ЙА Е
Ш Ә Е Д
Ә Ш
ІЙ ʃ ɑ ɪʃ ɪ ʃ РЛ
Ш ҚС ВҚ В Ң С РЛ
Ш ҚСВҚ В ҢС ʃ
ЧО
ИЁ
Ы Ю
ЕЪЁЯИИ Щ
Е ЪЮ ЕИ Ё И Ш
ЙО О
ІЙЙУ
Ы ЕЙО ІЙ ІЙШЙУ
Е ІЙ
Е ЙО ЫЙ ɔɯ ʃеɔ ɔеʃ е ɔɯ Р Н
ҢҒЙГ
ШМН ҢҢГСМ
Ш ГҢН П Р Н
ҢҒЙГ
ШМ НҢҢГСМ
Ш ГҢН П ɣŋ ʃ ŋŋʃ ŋ
Щ Ъ Ц Ч ЯЕ ӨФИЦЖЧ Ш С ЙА Ш Ф
ЙА ЖӨ
ЫЙ СШ ʃ ɑеʃƷɑеɵɯ ʃ С Ш Т ЛТР ЛГ ДП ТР С Ш Т ЛТРЛГ ДТ ʃ
ЮЩЬ
И
Ө О ІЁ
ЬИИЦ
И
Е
О Ц
Ь ОЁ
И И
ЙУШ
ЫЙІЙ
ӨЙО
О ІЫЙ
ІЙ С
ЫЙ ОЕС ЙО
ЫЙ
ОЕ ІЙ ɯ ʃ
ɪ
ɵ ɔ ɯ
ɔ ɯɔ ɯ
ɔ ɔ МСФ
Ң
Д
П Ғ
КН
ФҢ П Ғ
П
ГТ
Ф ҒН
П Ң
МСФ
Ң
Д
П Ғ
КН
Ф ҢП Ғ
П
Г
Т
Ф ҒПП
Н Ң Рof emotional
ŋɣ ŋ ɣ ɣ ŋ words of the Kazakh language and speech
Ұ ҰШ ʃ ʊɔ ʃ ТН An example МЖ of a МЖ
database Ʒ
ЁЬӨ
ЦЦ ҺЮ Һ
ЧӨ
ЩО ЮО
И
ХЧ
ЧИ
ҺЗЩ ЙО С С
ХЙУ
Ш ХШОХО
ЙУ
ІЙ
Ш З ХШ ɔ ɔ ʃ
Ф
Х
Т
УМ УСҒ ҢҒ РУ С ТН Ф
Х
Т
УМ УСҒ ҢҒ РУС ɣ ŋɣ
И
Ұ
Ч ЭЪ
И И Ч
Ц
ҺӨ
Ү ЫЙШҰӨ ІЙ
ЕЫЙ ШӨХӨ
С ҮЫЙ ʊɯɵʃɪɯʃɵʃ ɵrecognition ү
ʃɯ ЖП
ХР
Ш
Д
ҚҢ
ХПРД Р
Т
У
ХД
demoР П code
З ЖП
ХР
Ш
Д
ҚҢ
Х П РД
canР
Т
У
ХД Р Пdownloaded
be
З Ʒʃŋ here.
ҺИ ЁЪӨЁӨЩ ЪЙЦ Х
ІЙ ЙО ЙО
Ө ЙШ С
Ө ɔɵ ɔɵ ʃ УҢ НШ Д НДШ С Т УҢ НШ ДНДШ С Т ŋƷ ʃƷʃ Ʒ ʃ
ҺЧ ЯҰИ Щ Ұ Ъ Щ
ҰИ ХЫЙ
ЙА Ұ Ш Ұ ЫЙ
Ш
Ұ ʊ ɯ ʊ ʃ ʊ ɯ ʃ У Ж П СШ
Ж Ж
РЛsimplified П
С У Ж
РЛ П СШ
Ж Ж П
С
Ъ
ЩҮ ЬЧ ИЬҺ
Щ ЫЧ
ИҰЦ Ь
ШҮШШ
ІЙХ ІЙК ЫШ үʃʃɑ ʃ ʃ ɯ ʃ TheЗ
ПФ
С РС
Ң УҢЙТ Р transcription
ФУШ
З
С ФР Ң
С
УҢЙТ Р in this article has two main purposes. First, simplified
Ф У ʃ ŋ Ʒŋ Ʒ
И
Ъ
Щ
ЫЮ
Ц ҮЧЩ ЦЦ ҰЬ
Ү ҮЦЩКҺ ЫЙ
Ч ЙУ
ШС
Ы Ш
Ү Ш СС ҮҰ Ш
ҮҰШ
С С Х ɯүʃʊүʊү
ɯ ʃ ʃ ʃʃ Ш
transcription Ш ЙМ
С
Т ЗРСТ ФЖ
ЗФЗ
Т
Ж
ТС
Р
is used Пin
Ш ЙМ
С
Т ЗРСТ ФЖ
ЗФЗ
the
Т
Ж
РТspeech
С ʃrecognition task. A simplified transcription is built,
ЬЧ ИЪ Ү ИҮІҺҚЪ Ш ЫЙ ЫЙ
Ү Қ
Ү
ІХ
ʃ ɯ үɯ ɪү Ф Р ХПХШ ЗПЗ КУ ХШ ФРХПХШ ЗПЗ КУ ХШ ʃ
ʃ
ЬЦ
Һ ІЫЩ
Ё ЦҺЫ Һ ЫЩҺЦ С
ЙО
ХІЫШ СХЫ ХЫ ШХС ɪ ɯɔʃ ɯ ɯ as Ф
ʃ for example,КТ
УНЙСТУЙ У ХЙ У Тthe Ф
С word
КТ
УНЙСТУЙ УХЙ У Т(ata—grandfather)
С
ata is similar to CPC (vowel + voiceless
Ь Э Е ɪ Ф Қ Ф Қ
ЩІЦЧ Ъ ЫІ
ЧЫЪЛЬ
Ц
Ъ
І ШІСШЫ ШЫ
І Л
С
І ʃɪ ʃɪɯɪʃconsonant ɯ ХСХ КТ Ш
Р+ЙР
К Ш
КЙ
Т Ш Ф ХIn
vowel). СХ КТ Ш
РЙРtask
theК Ш
КЙ
Т ШФ of speech ʃ ʃ ʃ
recognition, this simplified transcription is used
Һ
Ъ
Э ИҺЪ ЯҺ Х Х ЙАХ
ІЙ
Е ɪ ШҚУҢУШ ХС ЛФ У ШҚУҢУШ ХС ЛФ У ʃŋ ʃ
ЦЭҺЩ Щ
І І Ь М С ШШ І М І ʃɪ ʃɪtoɑsearch Т СК К Х Т СК К Х
Ъ
ЯЬИЪЭ Ь ЬЭ Һ
Ь Ъ ЙАЫЙ Х
Е ЕЕ Х ɯɑɪ ɪ ɪ ШЛФП У
ҚШФ Ф
for У
Ф
possible
ҚҚ Ш Ш ЛФП У
ҚШФ Ф
candidates
ҚҚ Ш У
Ф for ʃ ʃ recognition ʃ from the dictionary of all words of the
ҺЯЪЦЯ Э ЦЮ Э Н Х СЕСЙУ Н
Е ɪ ɪ
Kazakh У ТҚТ Қ
language
МХ
that У have
МХ
ТҚТsimplified
Қ transcriptions (in our work, these are more than three
Ю Ь ЧЬ Я ЪЬ ЙУЙА
Ш ЙА ЙА ʃɑ ɑ ɑɔ М ХЛ
Ф Ш
Р ФХЛХЛ ШХФ М ХЛ
Ф Ш
Р ФХЛХЛ ШХФ ʃ ʃ
Һ Һ Ё Ң Х Х ЙО
Ң ŋ У У Н У У Н word forms of the Kazakh language). For our example,
ЪЮ ЬЮ ЯЮ Я
Ь ЙА ЙА million, Ш two ЛМ ЛХ Ш
hundred
Ф ЛМ
thousand ЛХ
Ф ʃ
Щ
Ё ИП ЙОЙУ
Ш ЙУ ЙУ
ІЙ ɔʃ
ɑ ɑ
НХМСФХМ Ң НХМСФХМ Ң
ЬЁ ЪЮ ЪЮ ЙУ ЙУП the word Ф Ш М
apa ШМ Ф
(apa—grandmother) ШМ ШМ hasʃ the ʃ ŋ same simplified transcription CPC (vowel +
И Ц Ё ЁИ ЙО
ІЙ С ЙО ЙО ɔ ɔɔ Ң ТХ НН
Н Х Ң ТХ НН
Н Х ŋ
Ь ЁЬ Ё Р ЫЙ Р ɯ ФНФН П П
ФНФН A list of candidates for recognition is compiled, and then a
И И
Һ И ІЙХ ЙО ІЙІЙ ЙО ɔvoiceless
ɔ ХҢconsonant
У Ң Ң
+Хvowel).
Ң
УbasedҢҢ ŋ ŋŋ
И Ч С ЫЙ Ш ɯ ʃ П Р П Рon the code book is run. Second, simplified transcription is
И И ІЙ С
ІЙ recognition Х Ң Х
algorithm
Ң ХҢХҢ
Ч И
Ъ И И
ЩТ
ЫЙ
Ш ЫЙ ЫЙ
Ш
ɯ ʃ ɯ ɯ
ʃ to determine
used Ш
Р П П П
С the Р Ш П
presence
П П ʃ ŋ ŋ colored words in speech and can significantly
С of emotionally
Ч И И ЫЙ ЫЙ Т ɯ ɯ П П П П
Щ Ь Ч ЧЦ ШШ ШШ
С ʃ ʃ ʃ reduce ʃ СФ
theР search
РР
Т time. СФ Р At Р Рthe initial stage, the algorithm allows for understanding of
Т
Щ ЩЧЩ Ч У Ш ШШШ ШУ ʃ ʃʃ ʃʃ С СР СР С Р СР
С
Ц ҺШ С Х whetherТХa speech, У audio ТХ recording У or text contains emotionally charged words. Thus, for
ЩЩ ШШ ʃʃʃ С ТС С ТС
ҺЦ Ц ЦЪ ХС С С example, УТtheТ word Ш У Т Т
“уaй”—(way) Ш is the equivalent ʃ of “wow” in English; it has a simplified
Һ ЦҺ
Һ Ц Ф Х ХС ХФ С У УТ Уof
Т WWC, У Т УТ occurs only once in the entire dictionary of emotionally
У
Ъ Ь transcription Ш Ф Ш which Ф ʃ
ҺҺ Х ХХ УШ У УШ У
ЬЪ Ъ Ъ chargedФ Ш words.Ш
Х ФШШ
Х
ʃ ʃ ʃ
Ь ЬЬ Ъ Ъ Ш Ш Ш Ш ʃ ʃ
ХФ Ф Ф ХФ Ф Ф
Ь Ь Х ХХ Ф Ф Х ХХ Ф Ф
ХХ ХХ
Sensors 2022, 22, 1937 11 of 17
2.2.2. Emotion Recognition Model

A production model is proposed for modeling emotion recognition from Kazakh texts.
To build a production model, the following meta-designations are introduced (Table 6):
Table 6. Meta-designations.
Designation Purpose
α, β, γ, . . . , ζ, ξ, . . . , Many words in the language—Variables
ω ω = ζ ·α· β·ξ—Lexical units (non-empty word or phrase)
L Set of sentences in the language
N Set of nouns
Adj Set of adjectives
Pron Set of pronouns
ɑ Б Б V_Post Set of positive verb forms
В В V_Negt Set of negative verb forms
Intj Set of interjections
е Г Г
AdvIntens Set of adverbs or enhancing
ɔ Ғ Ғ ɣ
emo Ё И Ч Щ Ц Һ Э Ю Я Ь Ъ
Emotion Establishment—Predicate
ɵ Д Д @ Ё И Чwords
Negation Щ Ц“емес/жoқ”(no)—Constants
Һ Э Ю Я Ь Ъ
ʊ Ж Ж Ʒ¬ Transformation to negative form—Operation
ү З З · Concatenation—Operation
ɯ Й Й
ɪ К К Below is a model based on formalized production rules for determining the emotion
ɪ Қ of lexical units (phrases) in a Kazakh text. Below are examples of the rules for defining
Қ
ɑ Л emotions in the Kazakh language.
Л
М А М
1. А
If a lexical unit contains ɑ a noun Ё И with Ч БЩthe Ц emotional
Һ Э Ю Б Яcolor Ь Ъof happiness and the next
Ё И Ч ЁЩИЦЧҺЩЭ Ц ЮҺЯЭЬ Ю Ъ Я
ɔ Н
А Ә Н А wordӘafter it ɑis a verb (of aБ positive В form) Б withВa neutral tone, then the emotional
Ң
Ә Е Ң Ә description
Е ŋ of this phrase е isВhappiness. Г В Г
ɯ П
Е О П Е О е ɔ Г Ғ Г Ғ ɣ
ʃ Р
О Ө Р О Ө ɔ ɵ Ғ Д Ғ Д ɣ
ʃ С С Ө ω ∈ L, ω = ζ ·α· β·ξ, α ∈ N, emo ( a) = happiness, β ∈ VPostЁ И (Чβ)Щ
, emo = 0Ц Һ Э Ю Я Ь Ъ
Ө Ұ Ұ ɵ ʊ Д Ж Д Ж Ʒ (12)
Т Т Ұ emo (ω ) = happiness
Ұ Ү Ү ʊ ү Ж З Ж З Ʒ
У
Ү Ы У Ү Here Ы and below ү ζ, ξ—anyɯ ЗstringsЙof words, З Й Ё Ё
including ИИ ЧЧ ЩЩ
empty Ц ЦҺҺ
ones. ЭFor
Э
ЮЮ Я ЯЬ Ь
ЪЪ
example,
АШЫ І Ш
А Ы келдi
шaбыт І (( ʃ(ɑ)b(ɯ)t kеld( ɪ).i) Б Й К Б Й К
А А А А ɑ ɑ Б Б Б Б
Ә Ф
І Э Ф І
Ә Е ɪ
If a lexicalӘunit contains ɪ В К Қ В К Қ
2. Ә anӘadjective Ә describing the emotion В anger В andВthe next В word
Х
Е Э Я Х
Е Е afterЙА Л Г Қ Л
it is aЕеverb ɑ Гform)
ɪ (positive
Е Е
Қ with
Е a neutral е tonality,е then Г the emotional
Г Г description
Г
О Я Ю ОЙАof this ЙУphrase ɔ ɑ Ғ Л М Ғ Л М ɣ
О is anger. О О О ɔ ɔ Ғ Ғ Ғ Ғ ɣ ɣ
ӨЮ Ё ӨЙУ ЙО ɵ ɔ ДМ Н ДМ Н
Ө А Ө Ө А Ө ɵ ɑ ɵ Д Б Д Д Б Д
Ұ Ё И Ұ ЙО ІЙ∈ ʊL, ω ɔ= ζ ·α· β·ξ,ЖαН∈ Adj, Ңemo Ж( aН) = anger, Ң Ʒ β ∈ЖVPost ŋ ( β) = 0
, sent
ω Ұ Ә Ұ Ұ Ә Ұ ʊЁ ИЁ Ё Ч
ИПʊИЩЧ ЧЩ ЦЩЁ В Ж Ж ЬВ ЪЖ Я(13)
Ь ƷЪ Ʒ
Ү И И Ү ІЙ ЫЙ ү ɯ З Ң emoП(ωЗ ) =Ңanger ŋҺЦИЦЭҺЧҺ
ЮЭЩ ЭЮ
ЯЮ ЦЬЯҺЪЯЬ ЭЪЮ
Ү Е Ү Ү Е Ү ү е ү З Г З З Г З
ЫИ Ч ЫЫЙ Ш ɯɯ ʃ ЙП Р ЙП Р
ForА А Ы тиянaқсыз
example, О Ы А А бoлды Ы О (tiy Ы(ɑ)n(ɑ)qs(ɯ)z b(ɔ)ld(ɯ) Б ).Б Й Ғ Б ЙБ Й Ғ Й ɣ
І Ч Щ І Ш Ш ɪ ʃ ʃ К Р С К Р С
3. Ә Ә І Ө І Ә Ә І Ө І ɪ ɵ ɪВ В К Д В
К В К Д К
ЭЩ Ц Е Ш If a lexical
С unit
ɪ ʃ contains Қ anСinterjection Т Қwith С an emotional
Т connotation sadness, then
Е Е Э Ұ Э Е Е Е Ұ Е е е ɪ ʊ Гɪ Г Қ Ж ГҚ Г Қ Ж Қ Ʒ
Я Ц Һ ЙА С the emotional
Х ɑ description Л of Т this phrase
У Л is Т sadness.У
О О Я Ү Я О О ЙА Ү ЙАɔ ɔ ɑ ү Ғɑ Ғ Л З ҒЛ Ғ Л З Лɣ ɣ
ЮҺ Ъ ЙУ Х МУ ШМУ Ш ʃ
Ө Өω Ю ∈ L, Ы ω=ЮζӨ · αӨ ЙУα ∈ЫIntj,
· β·ξ, ЙУɵ emo Д Д Мβ ∈Й
ɵ ( a) =ɯsadness, ДМ (Дβ) М
, sent =0Й М
Ё Ъ Ь ЙО ɔ НШ Ф НШ Ф ʃ
Ұ Ұ Ё І Ё Ұ Ұ ЙО emo І ЙО ʊ
(ωʊ) = sadness ɔ ɪ ЖɔЖ Н К Ж НЖ Н К Н Ʒ Ʒ(14)
И Ь ІЙ ҢФ Х ҢФ Х ŋ
Ү Ү И Э И Ү Ү ІЙ Е ІЙү ү ɪ З З Ң Қ З ҢЗ Ң Қ Ң ŋ ŋ
И АЫЙА А For example,
А АɯАмaсқaрa(m
А А(ɑ)
ПsqХ (ɑ)r(ɑ)), қaп(qП Б p).
(ɑ) ХБ Б ББ Б Б Б
Ы Ы И Я И Ы Ы ЫЙ ЙАЫЙ ɯɯ ɯ ɑ Й
ɯЙ П Л Й ПЙ П Л П
Ч ӘШ Ә Ә The use Ә ofӘ formal
ʃӘ Ә rules Ә for
Р determiningР Вemotions
ВВ ВВ Вfor
allows В extractingВ emotions from
І І Ч Ю Ч І І Ш ЙУ Ш ɪ ɪ ʃ К ʃ К Р М К
РК Р М Р
Щ ЕШ Е Е recognized
speech Е Е ʃ Е in Е the textЕС е е еon a codebook.
based СГе Г Г ГГ Г Г Г
Э Э Щ Ё Щ Е Е Ш ЙО Ш ɪ ɪ ʃ ɔ Қʃ Қ С Н Қ СҚ С Н С
Ц О СО О О О ОО ОТɔ ɔ ɔ ТҒɔ Ғ Ғ ҒҒ Ғ Ғ Ғɣ ɣ ɣ ɣ
Я Я Ц И ЦЙА
3. Experiment ЙА С ІЙ С ɑ ɑ ЛЛ Т Ң Л ТЛ Т Ң Т ŋ
Һ Ө ХӨ Ө Ө Ө ӨӨ ӨУ ɵ ɵɵ Д
У ɵ Д Д Д Д Д Д Д
Ю Ю Һ И Һ ЙУ ЙУ Х ЫЙ Х ɯ М М У П МУ М У П У
Ъ Ұ Ұ Ұ In ourҰresearch,
Ұ Ұ Ұ wordsҰʊШ thatʊ express
ʊ emotions
Ж ЖЖ are highlighted
ʊШ Ж
Ж ʃ ЖЖ
and ЖƷ divided
ƷƷ intoƷsix
classes Ё Ё 7).
(Table Ъ Ч ЪЙО ЙО Ш ɔ ɔ ʃ НН Ш Р Н ШН Ш Р Ш ʃ ʃ
Ь Ү ҮҮ Ү Ү ҮҮ ҮФ ү үү ФЗү З З ЗЗ З З З
И И Ь Щ Ь ІЙІЙ Ш ʃ ҢҢ Ф С Ң ФҢ Ф С Ф ŋ ŋ
Ы ЫЫ Ы Ы ЫЫ ЫХ ɯ ɯɯ Й
Хɯ ЙЙ ЙЙ ЙЙ Й
ИИ Ц ЫЙ ЫЙ С ɯɯ ПП Х Т П ХП Х Т Х
І І І І І І І Іɪ ɪ ɪ ɪК К К К К КК К
ЧЧ Һ ШШ Х ʃ ʃ Р Р У Р Р У
Э ЭЭ Э Е ЕЕ Еɪ ɪ ɪ Қɪ Қ Қ Қ Қ ҚҚ Қ
ЩЩ Ъ ШШ ʃ ʃ СС Ш СС Ш ʃ
Я ЯЯ ЯЙАЙА ЙА ЙАɑ ɑ ɑ Лɑ Л Л Л Л ЛЛ Л
Sensors 2022, 22, 1937 12 of 17
Table 7. Emotion classes.
Emotion Classes Polarity Example

Aлaқaй! Мен сәттi aяқтaдым
happiness positive
(Hooray! I finished successfully)
Жaуaптaрды ұмытып қaлдым
fear negative
(I forgot the answers)
Туу, oйдaғыдaй бaғa aлмaдым
disgust negative
(Tuu, didn’t get the expected grade)
Қaп! кейбiр жaуaпты бiлмей қaлдым.
sadness negative
(Qap, I didn’t know the answer.)
Кедергi жaсaмa! Уaқыт тығыз
anger negative
(do not bother, time is running out)
Бүгiн бaрлығы емтихaн тaпсырaды
neutral neutral
(everyone is taking exams today)
The table shows the classes of emotions, the main ones of which correspond to Ekman,
according to which students’ speeches were classified. This classification has been applied
in previous works and has shown its effectiveness.
For the experiment, 420 audio recordings were studied, in which the greatest activity
(voice acting) was found during the online exam.
At the moment for the Kazakh language, there are no sufficiently published results of
works on emotional recognition or synthesis of Kazakh speech. Therefore, it is currently
not possible to evaluate the proposed method in the traditional way. Thus, the evaluation
was carried out as follows. Accuracy was evaluated for the Kazakh language using the
Emotional Speech Recognition Method and DNN model presented in [60] (Table 8).
Table 8. Quantitative evaluation of the different methods on the speech emotion recognizer.
Method Dataset Language Number of Classes Accuracy

Emotional Speech
Kazakh 6 79.7%
Recognition Method
DNN model [59] Kazakh, Russian 3 82.07%
When recognizing emotions in the Kazakh language, an existing dictionary of

12,000 words was used. The emotionally colored dictionary passed an expert evalua-
tion of the polarity values by the employees of the Research Institute “Artificial Intel-
ligence”, taking into account the context of the task. The Research Institute employs
linguists/philologists, experts in the field of artificial intelligence, as well as students of the
Faculty of Information Technology who perform research practice at this Research Institute.
The model in [60] recognized the emotions with an accuracy of 82.07%. The model
often confuses the emotions anger with happiness, which is related to extremely high
emotional coloring. According to the published data, the result was 2.37% more, while the
number of emotion classes was three classes less compared to our work.
4. Results
To recognize emotions, we used audio from video recordings of taking online exams
during examination sessions. The database contains about 20,000 videos, which contain
the speech of students and their comments during and after the exam. However, many
audio recordings do not contain emotional speech, since during the online computer test,
students restrain and do not express emotions.
We have studied 420 audio recordings. Audio recordings of exams were used, which
were automatically processed by the speech recognition program and converted into text.
Sensors 2022, 22, 1937 13 of 17
All converted text was translated into a simplified transcription. Further, in the dictionary
of emotionally colored words, a search for the spoken text was carried out using a simplified
transcription. That is, we looked for emotionally charged words in audio recordings. If
the text of the audio recording contained emotionally colored words, the text was further
processed using the Emotion Recognition Model. That is, the emotions of the text were
automatically placed by the program, and the audio recordings received tags with emotions,
which were then manually checked by human taggers for the purity of the experiment. In
79.7% of cases, emotionally charged words were found correctly.
The confusion matrix of Emotional Speech Recognition Method for the Kazakh lan-
guage is presented in Figure 4.
Figure 4. Confusion matrix of Emotional Speech Recognition Method.
With the proposed Emotional Speech Recognition Method, we reached an accuracy of

79.7%. According to the results of the experiment, 105 students who received the expected
good grade expressed happiness; there were students who were afraid during the exam
or worried, and 28 were classified as fear. Further, after the output of the exam results,
students clearly expressed sadness and anger if they did not obtain a good mark on the
exam. During the experiment, the proposed recognition method confused the classes of
fear, sadness and anger emotions. This was due to the semantic proximity of the used
emotionally colored words. Moreover, 172 students were assigned as neutral, since they
did not voice their reactions. In view of the fact that during the computer test, students
did not speak much and did not express emotions much, the emotion classes proposed in
Table 4 were considered sufficient by the authors of the article.
5. Discussion
The results of this work were from a method for recognizing emotional speech based
on the emotion recognition model and the emotional vocabulary of emotionally charged
words of the Kazakh language with generalized transcription, as well as from a code book.
There are various works on recognizing emotions in speech, but they do not rely on
the proposed model for determining emotions, and this work also showed for the first time
the construction and use of an emotional vocabulary of emotionally charged words of the
Kazakh language with generalized transcription, as well as a code book, which significantly
reduced the time search and was not resource consuming, which expanded the scope of the
proposed method. This method can be adapted for other languages as well if an emotional
vocabulary is available.
The main disadvantage of methods for direct recognition of emotions in speech can be
considered as the need to process speech before its recognition. Thus, most often, speech in
Sensors 2022, 22, 1937 14 of 17
preprocessing before recognition undergoes a smoothing procedure to eliminate random

inclusions, noise, and also reduces the signal amplitude. The value of the amplitude
of a speech signal, in addition to the loudness of speech, also reflects intonation. Such
normalization of speech leads to the fact that the intonation of speech is smoothed out,
which is also a means of expressing emotions. Consequently, an increase in the accuracy of
recognizing what was said almost always leads to a decrease in the accuracy of recognizing
emotions in spoken speech. Thus, methods that allow speech recognition to be carried out
with high accuracy must be able to recognize emotions in some way.
In the future, it is planned to explore the possibilities of recognizing emotions from
images and videos accompanying the recognized audio signal. In addition, the obtained
results will allow one in the future to solve the problem of the synthesis of emotional speech
on the basis of the emotional vocabulary of emotionally charged words.
6. Conclusions
Speech recognition technologies and emotion detection contribute to the development
of intelligent systems for human–machine interaction. The results of this work can be used
when it is necessary to recognize positive and negative emotions in a crowd, in public
transport, schools, universities, etc. The experiment carried out has shown the effectiveness
of this method. The results obtained will allow one in the future to develop devices that
begin to record and recognize a speech signal, for example, in the case of detecting negative
emotions in speech and, if necessary, transmitting a message about a potential threat or
a riot.
In this paper, we presented a method for recognizing emotional speech based on an
emotion recognition model using the technique of constructing a code book. The code book
allows for significantly narrowing the search for candidates for recognizing emotionally
charged words. Answering the research question, we can say that it is possible to effectively
recognize the emotions of the Kazakh language based on a database of emotionally charged
words. An emotional vocabulary of the Kazakh language was built in the form of a code
book. The use of this method has shown its effectiveness in recognizing the emotional
speech of students during an exam in a distance format with an accuracy of 79.7%. At the
same time, there are certain risks in terms of speech recognition, which must be effective.
Its effectiveness directly depends on the level of noise and the use of grammatically correct
language (no slang, no use of foreign words in speech, etc.). There are also risks associated
with incorrect recognition of emotions, which may result from the incompleteness of the
emotional vocabulary. In addition, the meanings of polarity in emotional vocabulary may
need to be automatically reconfigured in the context of subject areas. For example, polarity
meanings should differ when recognizing the emotions of a crowd at a rally from the
polarity emotions of a student taking an exam, etc. This raises questions about verifying
the correctness of automatically configured polarity. This gives food for thought and the
formulation of new research topics.
In the future, we can think about using other types of features and apply our system
on other bases, including larger ones, and using a different method to reduce the size of the
feature. Finally, we can also consider emotion recognition using an audiovisual base, and
in this case, benefit from descriptors from speech and others from image. This allows for
improving the rate of recognition of each emotion.
Author Contributions: All authors contributed equally to this work. All authors have read and
agreed to the published version of the manuscript.
Funding: This research is funded by the Science Committee of the Ministry of Education and Science
of the Republic of Kazakhstan (grant no. BR11765535).
Institutional Review Board Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
Sensors 2022, 22, 1937 15 of 17
References
1. Franzoni, V.; Milani, A.; Nardi, D.; Vallverdú, J. Emotional machines: The next revolution. Web Intell. 2019, 17, 1–7. [CrossRef]
2. Majumder, N.; Poria, S.; Hazarika, D.; Mihalcea, R.; Gelbukh, A.; Cambria, E. DialogueRNN: An attentive RNN for emotion
detection in conversations. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1
February 2019; AAAI Press: Palo Alto, CA, USA, 2019; Volume 33.
3. Biondi, G.; Franzoni, V.; Poggioni, V. A deep learning semantic approach to emotion recognition using the IBM watson bluemix
alchemy language. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics); Springer: Cham, Swizerland, 2017; Volume 10406, pp. 719–729. [CrossRef]
4. Stappen, L.; Baird, A.; Cambria, E.; Schuller, B.W.; Cambria, E. Sentiment Analysis and Topic Recognition in Video Transcriptions.
IEEE Intell. Syst. 2021, 36, 88–95. [CrossRef]
5. Yang, D.; Alsadoon, A.; Prasad, P.W.C.; Singh, A.K.; Elchouemi, A. An Emotion Recognition Model Based on Facial Recognition
in Virtual Learning Environment. Procedia Comput. Sci. 2018, 125, 2–10. [CrossRef]
6. Gupta, O.; Raviv, D.; Raskar, R. Deep video gesture recognition using illumination invariants. arXiv 2016, arXiv:1603.06531.
7. Kahou, S.E.; Pal, C.; Bouthillier, X.; Froumenty, P.; Gülçehre, Ç.; Memisevic, R.; Vincent, P.; Courville, A.; Bengio, Y.; Ferrari, R.C.;
et al. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 2013 ACM
International Conference on Multimodal Interaction, Sydney, Australia, 9–13 December 2013; pp. 543–550.
8. Özdemir, M.; Elagöz, B.; Alaybeyoglu, A.; Akan, A. Deep Learning Based Facial Emotion Recognition System (Derin Öğrenme
Tabanlı Yüz Duyguları Tanıma Sistemi). In Proceedings of the 2020 Medical Technologies Congress (TIPTEKNO), Antalya, Turkey,
19–20 November 2020. [CrossRef]
9. Kahou, S.E.; Michalski, V.; Konda, K.; Memisevic, R.; Pal, C. Recurrent neural networks for emotion recognition in video. In
Proceedings of the ACM International Conference on Multimodal Interaction, ICMI 2015, Seattle, DC, USA, 9–13 November 2015;
pp. 467–474. [CrossRef]
10. Hossain, M.S.; Muhammad, G. Emotion recognition using deep learning approach from audio–visual emotional big data. Inf.
Fusion 2019, 49, 69–78. [CrossRef]
11. Rao, K.S.; Koolagudi, S.G. Recognition of emotions from video using acoustic and facial features. Signal Image Video Process. 2015,
9, 1029–1045. [CrossRef]
12. Cruz, A.; Bhanu, B.; Thakoor, N. Facial emotion recognition in continuous video. In Proceedings of the 21st International
Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan, 11–15 November 2012; pp. 1880–1883.
13. Tamil Selvi, P.; Vyshnavi, P.; Jagadish, R.; Srikumar, S.; Veni, S. Emotion recognition from videos using facial expressions. Adv.
Intell. Syst. Comput. 2017, 517, 565–576. [CrossRef]
14. Mehta, D.; Siddiqui, M.F.H.; Javaid, A.Y. Recognition of emotion intensities using machine learning algorithms: A comparative
study. Sensors 2019, 19, 1897. [CrossRef]
15. Franzoni, V.; Biondi, G.; Milani, A. Emotional sounds of crowds: Spectrogram-based analysis using deep learning. Multimed.
Tools Appl. 2020, 79, 36063–36075. [CrossRef]
16. Salekin, A.; Chen, Z.; Ahmed, M.Y.; Lach, J.; Metz, D.; De La Haye, K.; Bell, B.; Stankovic, J.A. Distant Emotion Recognition. Proc.
ACM Interact. Mob. Wearable Ubiquitous Technol. 2017, 1, 1–25. [CrossRef]
17. Fayek, H.M.; Lech, M.; Cavedon, L. Towards real-time speech emotion recognition using deep neural networks. In Proceedings
of the 9th International Conference on Signal Processing and Communication Systems, ICSPCS 2015, Cairns, Australia, 14–16
December 2015. [CrossRef]
18. Mirsamadi, S.; Barsoum, E.; Zhang, C. Automatic speech emotion recognition using recurrent neural networks with local attention.
In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, LA, USA, 5–9
March 2017. [CrossRef]
19. Franzoni, V.; Biondi, G.; Milani, A. A web-based system for emotion vector extraction. In Lecture Notes in Computer Science
(Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Swizerland, 2017;
Volume 10406, pp. 653–668. [CrossRef]
20. Franzoni, V.; Li, Y.; Mengoni, P. A path-based model for emotion abstraction on facebook using sentiment analysis and taxonomy
knowledge. In Proceedings of the 2017 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2017, Leipzig,
Germany, 23–26 August 2017; pp. 947–952. [CrossRef]
21. Canales, L.; Martinez-Barco, P. Emotion detection from text: A survey. In Proceedings of the Processing in the 5th Information
Systems Research Working Days, JISIC 2014, Hague, The Netherlands, 24–26 September 2014; pp. 37–43. [CrossRef]
22. Abdulsalam, W.H.; Alhamdani, R.S.; Abdullah, M.N. Facial emotion recognition from videos using deep convolutional neural
networks. Int. J. Mach. Learn. Comput. 2014, 9, 14–19. [CrossRef]
23. Gervasi, O.; Franzoni, V.; Riganelli, M.; Tasso, S. Automating facial emotion recognition. Web Intell. 2019, 17, 17–27. [CrossRef]
24. Gharavian, D.; Bejani, M.; Sheikhan, M. Audio-visual emotion recognition using FCBF feature selection method and particle
swarm optimization for fuzzy ARTMAP neural networks. Multimed. Tools Appl. 2017, 76, 2331–2352. [CrossRef]
25. Sinith, M.S.; Aswathi, E.; Deepa, T.M.; Shameema, C.P.; Rajan, S. Emotion recognition from audio signals using Support Vector
Machine. In Proceedings of the IEEE Recent Advances in Intelligent Computational Systems, RAICS 2015, Trivandrum, Kerala,
India, 10–12 December 2015; pp. 139–144. [CrossRef]
26. Kwon, S. A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 2020, 20, 183. [CrossRef]
Sensors 2022, 22, 1937 16 of 17
27. Kannadaguli, P.; Bhat, V. Comparison of hidden markov model and artificial neural network based machine learning techniques
using DDMFCC vectors for emotion recognition in Kannada. In Proceedings of the 5th IEEE International WIE Conference on
Electrical and Computer Engineering, WIECON-ECE 2019, Bangalore, India, 15–16 November 2019.
28. Tursunov, A.; Choeh, J.Y.; Kwon, S. Age and gender recognition using a convolutional neural network with a specially designed
multi-attention module through speech spectrograms. Sensors 2021, 21, 5892. [CrossRef] [PubMed]
29. Shahin, I. Emotion recognition based on third-order circular suprasegmental hidden markov model. In Proceedings of the IEEE
Jordan International Joint Conference on Electrical Engineering and Information Technology, JEEIT 2019, Amman, Jordan, 9–11
April 2019; pp. 800–805. [CrossRef]
30. Abo Absa, A.H.; Deriche, M. A two-stage hierarchical multilingual emotion recognition system using hidden markov models and
neural networks. In Proceedings of the 9th IEEE-GCC Conference and Exhibition, GCCCE 2017, Manama, Bahrain, 8–11 May 2017.
[CrossRef]
31. Quan, C.; Ren, F. Weighted high-order hidden Markov models for compound emotions recognition in text. Inf. Sci. 2016, 329,
581–596. [CrossRef]
32. Sidorov, M.; Minker, W.; Semenkin, E.S. Speech-based emotion recognition and speaker identification: Static vs. dynamic mode of
speech representation. J. Sib. Fed. Univ.-Math. Phys. 2016, 9, 518–523. [CrossRef]
33. Immordino-Yang, M.H.; Damasio, A. We feel, therefore we learn: The relevance of affective and social neuroscience to education.
Mind Brain Educ. 2007, 1, 3–10. [CrossRef]
34. Durães, D.; Toala, R.; Novais, P. Emotion Analysis in Distance Learning. In Educating Engineers for Future Industrial Revolutions;
Auer, M.E., Rüütmann, T., Eds.; Springer: Cham, Switzerland, 2021; Volume 1328, pp. 629–639. [CrossRef]
35. Baker, M.; Andriessen, J.; Järvelä, S. Affective Learning Together. Social and Emotional Dimension of Collaborative Learning; Routledge:
London, UK, 2013; 312p.
36. van der Haar, D.T. Student Emotion Recognition in Computer Science Education: A Blessing or Curse? In Lecture Notes in
Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham,
Swizerland, 2019; Volume 11590, pp. 301–311.
37. Krithika, L.B.; Lakshmi, G.G. Student Emotion Recognition System (SERS) for e-learning Improvement Based on Learner
Concentration Metric. Procedia Comput. Sci. 2016, 85, 767–776. [CrossRef]
38. Franzoni, V.; Biondi, G.; Perri, D.; Gervasi, O. Enhancing Mouth-Based Emotion Recognition Using Transfer Learning. Sensors
2020, 20, 5222. [CrossRef] [PubMed]
39. Luna-Jiménez, C.; Griol, D.; Callejas, Z.; Kleinlein, R.; Montero, J.M.; Fernández-Martínez, F. Multimodal emotion recognition on
RAVDESS dataset using transfer learning. Sensors 2021, 21, 7665. [CrossRef]
40. Yergesh, B.; Bekmanova, G.; Sharipbay, A.; Yergesh, M. Ontology-based sentiment analysis of kazakh sentences. In Lecture Notes
in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham,
Swizerland, 2017; Volume 10406, pp. 669–677. [CrossRef]
41. Yergesh, B.; Bekmanova, G.; Sharipbay, A. Sentiment analysis of Kazakh text and their polarity. Web Intell. 2019, 17, 9–15.
[CrossRef]
42. Zhetkenbay, L.; Bekmanova, G.; Yergesh, B.; Sharipbay, A. Method of Sentiment Preservation in the Kazakh-Turkish Machine
Translation. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics); Springer: Cham, Swizerland, 2020; Volume 12250, pp. 538–549. [CrossRef]
43. Yergesh, B.; Bekmanova, G.; Sharipbay, A. Sentiment analysis on the hotel reviews in the Kazakh language. In Proceedings of
the 2nd International Conference on Computer Science and Engineering, UBMK 2017, Antalya, Turkey, 5–8 October 2017; pp.
790–794. [CrossRef]
44. Bekmanova, G.; Yelibayeva, G.; Aubakirova, S.; Dyussupova, N.; Sharipbay, A.; Nyazova, R. Methods for Analyzing Polarity of
the Kazakh Texts Related to the Terrorist Threats. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial
Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Swizerland, 2019; Volume 11619, pp. 717–730. [CrossRef]
45. Shelepov, V.J.; Nicenko, A.V. Recognition of the continuous-speech russian phrases using their voiceless fragments. Eurasian J.
Math. Comput. Appl. 2016, 4, 54–59. [CrossRef]
46. Shelepov, V.Y.; Nitsenko, A.V. On the recognition of Russian words using generalized transcription. Probl. Artif. Intell. 2018, 1,
50–56. (In Russian)
47. Nitsenko, A.V.; Shelepov, V.Y. Algorithms for phonemic recognition of words for a given dictionary. Artif. Intell. [Iskusstv. Intell.]
2004, 4, 633–639. (In Russian)
48. Shelepov, V.Y. The concept of phonemic recognition of separately pronounced Russian words. Recognition of syn-tactically
related phrases. Materials of international scientific-technical conference. Artif. Intell. 2007, 162–170. (In Russian)
49. Shelepov, V.Y.; Nitsenko, A.V. To the problem of phonemic recognition. Artif. Intell. [Iskusstv. Intell.] 2005, 4, 662–668. (In Russian)
50. Sharipbayev, A.A.; Bekmanova, G.T.; Shelepov, V.U. Formalization of Phonologic Rules of the Kazakh Language for System
Automatic Speech Recognition. Available online: http://dspace.enu.kz/handle/data/1013 (accessed on 29 December 2021).
51. Bekmanova, G.T.; Nitsenko, A.V.; Sharipbaev, A.A.; Shelepov, V.Y. Algorithms for recognition of the Kazakh word as a whole. In
Structural Classification of Kazakh Language Words; Bulletin of the L.N. Gumilyov Eurasian National University: Astana, Kazakhstan,
2010; pp. 45–51.
Sensors 2022, 22, 1937 17 of 17
52. Shelepov, V.J.; Nitsenko, A.V. The refined identification of beginning-end of speech; the recognition of the voiceless sounds at
the beginning-end of speech. On the recognition of the extra-large vocabularies. Eurasian J. Math. Comput. Appl. 2017, 5, 70–79.
[CrossRef]
53. Kazakh Grammar. Phonetics, Word Formation, Morphology, Syntax; Astana-Poligraphy: Astana, Kazakhstan, 2002; p. 784.
(In Kazakh)
54. Bekmanova, G.; Yergesh, B.; Sharipbay, A. Sentiment Analysis Model Based on the Word Structural Representation. In Lecture
Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer:
Cham, Swizerland, 2021; Volume 12960, pp. 170–178. [CrossRef]
55. Sharipbaev, A.A.; Bekmanova, G.T.; Buribayeva, A.K.; Yergesh, B.Z.; Mukanova, A.S.; Kaliyev, A.K. Semantic neural network
model of morphological rules of the agglutinative languages. In Proceedings of the 6th International Conference on Soft
Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, SCIS/ISIS 2012, Kobe,
Japan, 20–24 November 2012; pp. 1094–1099. [CrossRef]
56. Yergesh, B.; Mukanova, A.; Sharipbay, A.; Bekmanova, G.; Razakhova, B. Semantic hyper-graph based representation of nouns in
the Kazakh language. Comput. Sist. 2014, 18, 627–635. [CrossRef]
57. Sharipbay, A.; Yergesh, B.; Razakhova, B.; Yelibayeva, G.; Mukanova, A. Syntax parsing model of Kazakh simple sentences. In
Proceedings of the 2nd International Conference on Data Science, E-Learning and Information Systems, DATA 2019, Dubai,
United Arab Emirates, 2–5 December 2019. [CrossRef]
58. Razakhova, B.S.; Sharipbaev, A.A. Formalization of Syntactic Rules of the Kazakh Language; Bulletin of the L.N. Gumilyov Eurasian
National University: Astana, Kazakhstan, 2012; pp. 42–50.
59. Yelibayeva, G.; Sharipbay, A.; Mukanova, A.; Razakhova, B. Applied ontology for the automatic classification of simple sentences
of the kazakh language. In Proceedings of the 5th International Conference on Computer Science and Engineering, UBMK 2020,
Diyarbakir, Turkey, 9–10 September 2020. [CrossRef]
60. Kozhakhmet, K.; Zhumaliyeva, R.; Shoinbek, A.; Sultanova, N. Speech emotion recognition for Kazakh and Russian languages.
Appl. Math. Inf. Sci. 2020, 14, 65–68. [CrossRef]

Sensors 22 01937 With Cover

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sensors 22 01937 With Cover

Uploaded by

Copyright:

Available Formats

3.9 6.

Emotional Speech Recognition

Gulmira Bekmanova, Banu Yergesh, Altynbek Sharipbay and Assel Mukanova

1 Faculty of Information Technologies, L.N. Gumilyov Eurasian National University,

Received: 30 December 2021

Sensors 2022, 22, 1937. https://doi.org/10.3390/s22051937 https://www.mdpi.com/journal/sensors

2. Problem Description and Proposed Solution

2.1. Emotional Speech Recognition Method

Figure 1. Emotional speech recognition process.

2.1.1. Stages of Speech Signal Recognition

Capturing an Audio Signal

This is a numerical analogue of full variation, C is the number of points of constancy,

Checking a Signal for Speech

Here, we determine k = k0 , at which the value Lk takes the minimum value. By

2.1.2. Development of an Automatic Transcriptor

Figure 3. Kazakh speech recognition system.

2.1.4. Structural Classification of Kazakh Words and Use of Generalized Transcriptions

Table 2. Symbols of the current alphabet and intermediate alphabet.

“Aқ киiмдi, денелi, aқ сaқaлды,”

as symbol 0, and each word of the line

“Coқыр мылқaу тaнымaс тiрi жaнды”

as symbol 1, constructing for them the corresponding averaged standards.

2.1.5. Codebook and Its Construction Technique

2.1.6. Recognizer Using a Codebook

2.1.7. Step Recognition Algorithm

2.2.2. Emotion Recognition Model

Table 7. Emotion classes.

Emotion Classes Polarity Example

Method Dataset Language Number of Classes Accuracy

When recognizing emotions in the Kazakh language, an existing dictionary of

Figure 4. Confusion matrix of Emotional Speech Recognition Method.

With the proposed Emotional Speech Recognition Method, we reached an accuracy of

preprocessing before recognition undergoes a smoothing procedure to eliminate random

You might also like