Professional Documents
Culture Documents
com
ScienceDirect
ICT Express 7 (2021) 403–413
www.elsevier.com/locate/icte
Abstract
Air traffic controller (ATC) fatigue is receiving considerable attention in recent studies because it represents a major cause of air traffic
incidences. Research has revealed that the presence of fatigue can be detected by analysing speech utterances. However, constructing a
complete labelled fatigue data set is very time-consuming. Moreover, a manually constructed speech collection will often contain only little
key information to be used effectively in fatigue recognition, while multilevel deep models based on such speech materials often have
overfitting problems due to an explosive increase of model parameters. To address these problems, a novel deep learning framework is
proposed in this study to integrate active learning (AL) into complex speech features selected from a large set of unlabelled speech data
in order to overcome the loss of information. A shallow feature set is first extracted using stacked sparse autoencoder networks, in which
fatigue state challenge features from a manually selected speaker set of are exploited as the input vector. A densely connected convolutional
autoencoder (DCAE) is then proposed to learn advanced features automatically from spectrograms of the selected data to supplement the
fatigue features. The network can be effectively trained using a relatively small number of labelled samples with the help of AL sampling
strategies, and the addition of a dense block to the convolutional automatic encoder can decrease the number of parameters and make the
model easier to fit. Finally, the two above-mentioned features are combined using multiple kernel learning with a support-vector-machine
classifier. A series of comparative experiments using the Civil Aviation Administration of China radiotelephony corpus demonstrates that the
proposed method provides a significant improvement in the detection precision compared to current state-of-the-art approaches.
⃝c 2021 The Korean Institute of Communications and Information Sciences (KICS). Publishing services by Elsevier B.V. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords: Air traffic control; Fatigue; SSAE; Active learning; Dense block; Spectrogram
Fig. 1. Spectrograms of fatigued speech (a) and normal speech (b). (For
interpretation of the references to colour in this figure legend, the reader is
referred to the web version of this article.)
Fig. 2. The basic architecture of AE.
Table 2
Constituents of a speaker state feature set.
INTERSPEECH 2011 feature set
LLDs 59
Functions 39
Features 4368
LLDs
RMS energy Spectral kurtosis
Sum of the auditory spectrum (loudness) Spectral slope
Sum of the RASTA-style filtered auditory spectrum MFCC 1–10
Zero-crossing rate MFCC 11 or 12
Energy in bands from 250 to 650 Hz and from 1 to 4 kHz RASTA-style auditory spectrum bands 1–26
Spectral roll-off points of 25%, 50%, 75% and 90% F0 (SHS based followed by Viterbi smoothing)
Spectral flux Probability of voicing
Spectral entropy Jitter
Spectral variance Variation in jitter
Spectral skewness Shimmer
where
rm = gθ ( f θ (xm )) (3)
f θ (x) = s f (W x + b1 ) (4)
gθ (h) = sg (W T h + b2 ) (5)
where, s f and sg are both nonlinear activation functions. The
parameter set θ = {W, b1 , W T , b2 }, W and W T represent the
weight matrix of the encoder and decoder respectively, and b1
and b2 represent the offset vector of the encoder and decoder Fig. 3. Construction of an SSAE with two hidden layers.
respectively.
The nonlinear activation function is generally set to be a
is a regularized item to avoid over-fitting. Parameter β is the
sigmoid function, which can be formulated as follows:
weight of the sparse penalty, where K L(ρ ∥ ρ̂ j ) is the KL
1 divergence or relative entropy, can be written as follows:
s f , sg = (6)
1 + e−x ρ 1−ρ
K L ρ ∥ ρ̂ j = ρlog + (1 − ρ)log
( )
(9)
3.2. Stacked sparse AE ρ̂ j 1 − ρ̂ j
N
1 ∑ (n)
Sparse autoencoder (SAE) is a special self-encoding net- ρ̂ j = z (10)
N n=1 j
work that makes hidden units activated randomly through
adding some sparse penalty terms [35]. It not only can set more Here, z (n)
j denotes the activation of the jth hidden neuron,
larger hidden units, but also has good robustness to signal to so ρ̂ j represents the average activation of the jth neuron in the
noise ratio and other effects. hidden layer. And ρ is a network variable, which is usually set
In the design of SAE structure, the number of hidden layer as the small value. The KL value decreases when ρ̂ j becomes
units is usually less than the number of input layer or previous close to the ρ, and this value reaches the maximum when the
hidden layer units. It can make it possible that compress deviation between ρ̂ j and ρ increases.
the data dimension so that the output is low-dimensional Fig. 3 shows the construction of an SSAE with two hidden
representation. The effect is similar to the PCA (Principal layers. ‘Input’ is the original input layer, ‘Feature1’ is the
Component Analysis) [36], which is of great benefit to the first hidden layer of SAE, and ‘Feature2’ is the second hidden
design of subsequent classifiers. layer of SAE where the feature1 is regarded as the input to
The loss function of the SAE can be defined as [24]: the second SAE. In short, SSAE is a deep architecture of
s2
∑ SAEs that stacks several hidden layers of basic SAEs together,
Jspar se (W, b) = J (W, b) + β K L(ρ ∥ ρ̂ j ) (7) meaning that the output of each layer is regarded as the input
j=1 to the subsequent layer in the SSAE [24].
m
∑
J (W, b) = ∥xi − ri ∥2 + λ(∥W ∥2 + ∥W T ∥2 ) (8) 3.3. Architecture of the proposed DCAE
i=1
where, J (W, b) denotes the loss function to measure the The convolutional autoencoder (CAE) [37] simply changes
difference between xm and rm , and the first term of J (W, b) the full connection into a convolution operation between the
is the reconstruction loss using l2 norm, while the second term encoding layer and the decoding layer, which is better for
406
Z. Shen and Y. Wei ICT Express 7 (2021) 403–413
input [38]:
xn = Hn ([x0 , x1 , . . . , xn−1 ]) (11)
where [x0 , x1 , . . . , xn−1 ] indicates the concatenation of the fea-
ture maps produced in layers 0, . . . , n − 1, and Hn (·) can be a
composite function of operations such as batch normalization,
pooling or convolution.
The dense block with a growing rate of k = 32 was
designed in the proposed AE. The designed dense block con-
tained three consecutive operations: batch normalization, a
3×3 convolution and a rectified linear unit activation function.
In the encoder, the first dense block consisted of four layers,
Fig. 4. Architecture of the proposed DCAE.
while the second block had eight layers. Since the input images
of the second dense block have been twofold downsampled
(compared with the previous block), we take the number of
layers of the second block to balance the complexity of each
dense block. The decoder has the same architecture.
The remaining parts of the proposed AE are now described.
For the normalization, consider an input panorama image
consisting of RGB values in the range of [0, 255]. Since CNNs
perform better for data ranging from 0 to 1, the proposed
encoder normalizes each channel of input images as
I (x, y)
Iˆ (x, y) = (12)
255
Fig. 5. Construction of a DenseNet.
where I (x, y) is the original pixel value at position (x, y),
and Iˆ (x, y) is the normalized value. Correspondingly, the
encoder and decoder have normalization and denormalization
layers, respectively. The transition layer is a convolutional
layer with 3×3 convolutional kernels with 32 channels in
the encoder, while it has 128 channels in the decoder. The
convolutional layer uses 3×3 convolutional kernels of stride
1 in the convolutional layers to analyse images. Meanwhile,
Fig. 6. A stereotypical dense block. 1-padding is adopted to ensure that the scale of images does
not change after convolution. Maxpooling layers are utilized
in the encoder to twofold downsample the images, with 2×2
extracting the hierarchical features. The architecture of our filters of stride 2 and 32 channels. Finally, the upsample layer
proposed DCAE is shown in Fig. 4. Considering that spectro- is the opposite of the maxpooling layer, upsampling images to
grams usually have abundant features, we specifically added those needed, without changing the number of channels.
the dense blocks to analyse and extract spectrogram fea-
tures. Experiments performed on densely connected convolu- 4. Unified AE fatigue-feature-extraction model
tional layers with dense blocks have shown that the proposed
DCAE exhibits better convergence and a higher detection 4.1. Architecture of the AE fatigue-feature-extraction method
rate. Meanwhile, the redundancy of feature maps can also be
greatly reduced. Moreover, the structure of the dense connec- Fig. 7 shows the architecture of the proposed method. Some
tion requires fewer parameters and eases the gradient vanishing effective labelled data are first screened using AL to reduce
problem. the cost of manual marking. The SSAE is utilized to extract
The construction of a dense convolutional network shallow features Fshallow and advanced features Fadvance then
(DenseNet) consisting of a dense block and a transition layer extracted by using proposed DCAE. The two obtained features
is shown in Fig. 5, while a stereotypical dense block is shown are combined and then input to the SVM (Support Vector
in Fig. 6. In Fig. 6, the notation 3×3@4 refers to a 3×3 Machine) classification as follows. This method achieves a
convolutional kernel with 4 channels. In a dense block, the joint deep–shallow–advanced feature that is more suitable for
current convolutional layer connects the next layer, while the our speech fatigue detection task.
current layer also connects all of the remaining layers in the From a manual marking perspective, an SSAE network
block. The transition layer is cascaded after the dense block, is designed to extract shallow feature Fshallow for the given
with the aim of reducing the number of channels generated manually selected speaker state challenge feature set. Unlike
by the dense block. Consequently, the nth layer receives the artificial feature, spectrogram information is represented as
the feature maps of all preceding layers x0 , x1 , . . . , xn−1 as a three-dimensional structure, which contains a considerable
407
Z. Shen and Y. Wei ICT Express 7 (2021) 403–413
amount of useful extra information. If we directly exploit a 4.3. Active sampling strategy
CAE to extract the advanced features, the deep convolution
network will usually lead to the loss of the former information. Training the DNN to achieve impressive performance re-
Moreover, such a deep-learning model generally depends on a quires the use of a large amount of labelled training samples
large quantity of training parameters, which makes optimiza- in supervised learning. But it will consume long time to label
tion more difficult. Inspired by previous work [39], the dense enough speech samples in practice, in which small training
block is utilized in our process of extracting advanced feature samples are prone to overfitting. With the help of AL strategy,
the proposed model can be trained more effectively by using
Fadvanced to obtain rich information, as detailed in Section 3.3.
a relatively small number of labelled samples.
In this paper, we choose the MCLU (Multiclass Level Un-
4.2. Multiple kernel learning strategy certainty) [41] technique as the query criterion. This method
applies a difference function cdi f f (x) to record the uncertainty
We have now obtained two deep features, Fshallow and of unlabelled samples. cdi f f (x) on logistic regression con-
Fadvance , in our feature-extraction subnetworks. Since simply siders the difference between the largest and second-largest
concatenating these two fatigue features would not make full class-conditional probability density using the following object
use of them and research shows that multiple kernel learn- function [24]:
ing (MKL) [40] can combine the features in a more-elastic cdi f f (x) = p (i) (x|ωmax1 ) − p (i) (x|ωmax2 ) (16)
manner.
ωmax1 = arg max p (i) (x|ωn )
{ }
The general SVM usually uses single kernel function to (17)
map the sample features to the Hilbert space, which turns the ωn ∈ Ω
ωmax2 = arg max p (i) (x|ωm )
{ }
linear inseparable problem of the original feature space into (18)
linear separable by maximizing the interval between positive ωn ∈ Ω /{ωmax1 }
and negative samples.
MKL is an optimization strategy of SVM actually, which When cdi f f (x) is large, x will be considered as the pre-
selects different kernels for different features and then trains dicted class ωmax1 . Otherwise, it will be assigned to uncertain
the weight of each kernel to synthesize a multi-kernel matrix sample that should be classified manually. In other words, the
to fusing the types of features better. The complete structure MCLU strategy selects the data corresponding to the minimum
of MKL is shown in Fig. 8. value of cdi f f (x) from the candidate unlabelled samples. The
In this paper, we utilize the multiscale Gaussian kernel detailed description is shown in Algorithm 1.
function as basic kernel: Fig. 9. shows the detailed construction of AL sampling
strategy for SSAE. we first train the SSAE with a few labelled
∥x − z∥2 training samples and the extracted features are then used to
k (x, z) = exp(− ) (13)
2σ 2 train a softmax classifier with supervised fine-tuning. The
When σ is small, the Gaussian kernel function can fit initial labelled samples are enough to train the SSAE to extract
sharply changing samples well. Otherwise, it can fit gently a robust feature representation. Subsequently, a subset of unla-
changing samples. Then both extracted feature corresponds belled data regarded as the candidate set is then classified using
408
Z. Shen and Y. Wei ICT Express 7 (2021) 403–413
Table 3
The fatigue data set utilized in this study.
Data set Unlabelled speech data Labelled speech data Total
(N = 3000) (N = 1606) (N = 4606)
Number Expression Explanation
1 Control category R, area control; A, approach control; T, tower control
Labelled 2 ATC rank 5, level 5; 4, level 4; 3, level 3; 2, level 2; 1, level 1;
speech data numbering 0, trainee
3–10 Time (UTC) 3–6, time of starting work; 7–10, time of ending work
11 Sex F, female; M, male
12 and 13 Age Arabic numeral (age in years)
14 and 15 Order Nn, N is a digital indicator and n is an Arabic numeral
indicating the nth instruction issued by the ATC while
working
16 and 17 Status 14th, ‘-’; 15th, voice command; 1, error; 2, ambiguity;
3, hesitation or pause; 4, fatigue
5. Experiments
Three experiments were used to verify the performance of
the proposed method. The first experiment filtered out some
unlabelled data from the candidate set using the MCLU query
method, and used the selected data to pretrain the SSAE. The
second experiment verified the performance of the proposed
DCAE. The third experiment compared the combined feature
classification method with current state-of-the-art methods.
The results are presented in detail in the following subsections.
Fig. 9. AL sampling strategy for SSAE. All of the experimental results were obtained on a Windows
10 personal computer equipped with a 64-bit Intel Core i5-
9300H CPU running at 2.4 GHz and with 8 GB of RAM.
softmax regression [42]. Finally, AL iteratively selects the All of the proposed methods were implemented using Python
most-uncertain unlabelled samples, adds them to the training (version 3.7) and TensorFlow (version 1.14.0) software.
set with true labels and simultaneously removes them from the
candidate set [43]. 5.1. AL and SSAE pretraining
Algorithm 1: Active Learning With MCLU 5.1.1. Data sets and setting
Required: In the experiments, the fatigue data set consisted of two
L 0 : initial labelled dataset parts: labelled and unlabelled. The labelled data set [44] is
U0 : candidate unlabelled samples reported in Table 3. We also collected 3000 unlabelled data
Us : selected samples to be labelled by the ATC instructor samples. The radiotelephony communications were obtained
Ix : the index of candidate unlabelled samples from the Air Traffic Management Shandong Bureau of China.
1. Pretrain the SSAE with initial labelled dataset L 0 ;
5.1.2. SSAE depth effect
2. The shallow features are extracted to train a softmax
Due to the automatic recognition of features, the number
classifier with supervised fine-tuning;
of hidden layers in the SSAE significantly affects the classi-
3. Learn the shallow features of candidate unlabelled
fication performance. In the experiments we fixed the other
samples in U0 ;
parameters and only changed the number of hidden layers in
4. Iteration
order to assess the effect of this change on the performance.
5. Train candidate data in softmax and Calculate the value
We then selected the complete labelled samples to train the
of cdi f f (x) of each candidate data with Eq. (13) in Cd(i) ;
network. We tested several SSAEs with depths varying from
6. sort Cd(i) by ascending;
one to three layers.
7. Get the samples index Ix with the minimum value of
After several experiments, we find that it will achieve
cdi f f (x) and remove the sample in U0 ;
optimal classification performance when the number of the
8. Update the label of selected data and add the labelled
units was set to 512,256 and 128 of each hidden layer in
data to Us ;
SSAE model. During training, the Adam algorithm was used
9. Until iteration over.
for backpropagation. The number of epochs was set as 30, the
409
Z. Shen and Y. Wei ICT Express 7 (2021) 403–413
Table 4
Classification results for different depths.
Number of hidden layers Classification result
1 0.8243
2 0.8525
3 0.8473
Table 5
Spectrogram set parameters in the transformation.
Parameter Value
Number of FFT samples 512
Sampling frequency 16 kHz
Window length 512
Frame overlap 256
Fig. 12. Information lost for the proposed DCAE and CAE.
Fig. 13. Accuracy for proposed DCAE method and competing methods.
Fig. 14. Results for different fatigue-detection methods based on the SVM.
Table 8
Results for different fatigue-detection methods (824 test samples).
pH SWFF Shallow features Advanced features Combined features Proposed
from SSAE from DCAE without MKL combined
features
Accuracy rate 85.63% 92.82% 87.86% 93.47% 95.58 98.35%
Improvement Baseline 8.39% 2.60% 9.15% 11.6% 12.72%
kernel SVM (support vector machine) as a classifier, which The experimental results demonstrate that the proposed
selected the Gaussian kernel function. In the experiment we method exhibits promising performance compared with many
selected 300 data sets obtained by AL as training sets and the current state-of-the-art approaches. The present research find-
fatigue data sets as test sets. Because the number of fatigued ings can provide theoretical guidance for air traffic manage-
speech samples in the fatigue data set was far smaller than ment authorities attempting to detect ATC fatigue, and they
the number of normal speech samples, in order to ensure the might also be useful as a reference in fatigue assessments
accuracy of the experimental results, we finally selected all performed in other professional fields of civil aviation.
412 fatigued speech samples, corresponding to the random
selection of 412 normal speech samples. Then, based on the CRediT authorship contribution statement
proposed model, we compared the recognition performance for
Zhiyuan Shen: Conceptualization, Methodology, Software,
shallow, advanced and combined features, and also with the
Validation, Supervision, Project administration, Writing - re-
state-of-the-art pH [45] and SWFF [44] fatigue features. Both
view & editing. Yitao Wei: Conceptualization, Methodol-
pH and SWFF features belong to the speech nonlinear features,
ogy, Software, Visualization, Formal analysis, Data curation,
which are demonstrated to achieve an effective recognition in
Writing - original draft.
the field of fatigue detection.
The experimental results are presented in Fig. 14 and
Table 8. The detection accuracy was calculated as Declaration of competing interest
Aacc = Ncor /Ntest (19) The authors declare that they have no known competing
financial interests or personal relationships that could have
where Ncor is the number of correct detections and Ntest is the appeared to influence the work reported in this paper.
number of samples in the test set.
The red points in the Fig. 14(a)–(d) indicated the true Acknowledgements
category results, while the green crosses indicated the pre-
dicted ones. The more overlaps between two markers implied The author thanks the Shandong Air Traffic Management
a higher accuracy. It was shown that, the least mismatch Sub-Bureau of Civil Aviation Administration of China for
was shown based on our proposed method. The experimental supplying speech raw data of air traffic controllers and helpful
results show that the data set obtained using AL was more- guidance and suggestions from Professor Chin-hui Lee at
clearly differentiated. The number of fatigue features extracted Georgia Institute of Technology.
by adding a dense block to the CAE increased by 5.61% com-
pared with using the ordinary SSAE, and the final accuracy References
for the combined fatigue features was 98.35%. These find- [1] Yu-Hern Chang, Hui-Hua Yang, Wan-Jou Hsu, Yu-hern chang hui-hua
ings demonstrate that the multilevel combined fatigue features yang wan-jou hsu effects of work shifts on fatigue levels of air traffic
proposed in this study provide performance that is superior to controllers, J. Air Transp. Manag. (ISSN: 0969-6997) 76 (2019) 1–9.
[2] J. Shen, J. Barbera, C.M. Shapiro, Distinguishing sleepiness and
those of other advanced fatigue-detection technologies. fatigue: focus on definition and measurement, Sleep. Med. Rev. 10
(1) (2006) 63e76.
6. Conclusion [3] S. Lee, J.K. Kim, Factors contributing to the risk of airline pilot
fatigue, J. Air Transp. Manag. 67 (2018) 197–207.
This paper has presented a novel unified deep-learning [4] X. Wang, C. Xu, Driver drowsiness detection based on non-intrusive
network in which two subnetworks are applied to extract the metrics considering individual specifics, Accid. Anal. Prev. 95 (2016)
shallow and advanced features, and MKL is used to combine 350–357.
[5] L.L. Di Stasi, R. Renner, A. Catena, J.J. Cañas, B.M. Velichkovsky,
the features in a more-generic and robust manner. The AL S. Pannasch, Towards a driver fatigue test based on the saccadic main
sampling strategy is exploited to select a subset of the most- sequence: Apartial validation by subjective report data, Transp. Res.
informative unlabelled samples for labelling and use them C 21 (1) (2012) 122–133.
to train the SSAE, which can improve the performance of [6] T. Chalder, G. Berelowitz, T. Pawlikowska, L. Watts, E.P. Wallace,
proposed network when relatively few labelled samples are Development of a fatigue scale, J. Psychosom. Res. 37 (2) (1993)
147–153.
available. Meanwhile, adding the dense block to the CAE [7] V. Riethmeister, Ute Bültmann, M. Gordijn, S. Brouwer, M.D. Boer,
network improves the ability to extract deep features from Investigating daily fatigue scores during two-week offshore day shifts,
spectrograms. Applied Ergon. 71 (2018).
412
Z. Shen and Y. Wei ICT Express 7 (2021) 403–413
[8] S. Arnau, T. M?Ckel, G. Rinkenauer, E. Wascher, The interconnection [26] F. Eyben, M. Wollmer, B. Schuller, Openear - introducing the munich
of mental fatigue and aging: an eeg study, Int. J. Psychophysiol. 117 open-source emotion and affect recognition toolkit, in: Affective
(2017) 17–25. Computing and Intelligent Interaction and Workshops, 2009. ACII
[9] Shitong, Huang, Jia, Li, Pengzhu, Zhang, et al., Detection of mental 2009. 3rd International Conference on, IEEE, 2009.
fatigue state with wearable ecg devices, Int. J. Med. Inform. (2018). [27] F. Eyben, M. Wöllmer, B. Schuller, Opensmile - the munich ver-
[10] H. Mansikka, P. Simola, K. Virtanen, D. Harris, L. Oksama, Fighter pi- satile and fast open-source audio feature extractor, in: Proc. ACM
lots’ heart rate, heart rate variation and performance during instrument Multimedia (MM), ACM, ACM, Florence, Italy, 2010, pp. 1459–1462.
approaches, Ergonomics (2016) 1–9. [28] Marie-José Caraty, Claude Montacié, Vocal fatigue induced by pro-
[11] M.L. Chen, S.Y. Lu, I.F. Mao, Subjective symptoms and physiological longed oral reading: Analysis and detection, Comput. Speech Lang.
measures of fatigue in air traffic controllers, Int. J. Ind. Ergon. 70 (2014).
(2019) 1–8. [29] L. Deng, M.L. Seltzer, D. Yu, et al., Binary coding of speech
[12] Baisheng Nie, Xin Huang, Yang Chen, Anjin Li, Ruming Zhang, spectrograms using a deep auto-encoder, in: Interspeech, Conference
Jinxin Huang, Baisheng nie xin huang yang chen anjin li ruming of the International Speech Communication Association, Makuhari,
zhang jinxin huang experimental study on visual detection for fatigue Chiba, Japan, September, DBLP, 2011.
of fixed-position staff, Applied Ergon. (ISSN: 0003-6870) 65 (2017) [30] L.O. Chua, Roska, et al., The CNN paradigm, IEEE Trans. Circuits
1–11. Syst. I Fundam. Theory Appl. (1993).
[13] J. Whitmore, S. Fisher, Speech during sustained operations, Speech [31] G. Huang, Z. Liu, K.Q. Weinberger, L. van der Maaten, Densely
Commun. 20 (1996) 55–70. connected convolutional networks, in: Proc. IEEE Conf. Comput.
[14] M. Vollrath, Automatic measurement of aspects of speech reflecting Vision & Pattern Recognition, vol. 1, 2017, p. 3.
motor coordination, Behav. Res. Methods Instrum Comput. 26 (1) [32] A. Kramer Mark, Nonlinear principal component analysis using
(1989) 35–40. autoassociative neural networks, AIChE J. 37 (2) (1991) 233–243,
[15] H.P. Greeley, E. Friets, J.P. Wilson, S. Raghavan, J. Picone, J. Berg,
http://dx.doi.org/10.1002/aic.690370209.
Detecting fatigue from voice using speech recognition, in: 2006
[33] J. Deng, Z. Zhang, E. Marchi, et al., Sparse autoencoder-based
IEEE International Symposium on Signal Processing and Information
feature transfer learning for speech emotion recognition, in: Affective
Technology, Vancouver, BC, 2006, pp. 567–571, http://dx.doi.org/10.
Computing & Intelligent Interaction, IEEE, 2013.
1109/ISSPIT.2006.270865.
[34] J. Li, Active learning for hyperspectral image classification with a
[16] E.M. Albornoz, M. Sánchez-Gutiérrez, F. Martinez-Licona, H.L.
stacked autoencoders based neural network, in: Proc. IEEE Int. Conf.
Rufiner, J. Goddard, Spoken emotion recognition using deep learn-
Image Process. (ICIP), Phoenix, AZ, USA, Sep. 2016, pp. 1062–1065.
ing, in: Proc. Iberoamer. Congr. Pattern Recognit, Springer, Cham,
[35] Vincent Pascal, Hugo Larochelle, Stacked denoising autoencoders:
Switzerland, 2014, pp. 104–111.
Learning useful representations in a deep network with a local
[17] Salaheddine Bendak, Hamad S.J. Rashid, Fatigue in aviation: A
systematic review of the literature, Int. J. Ind. Ergon. (2020). denoising criterion, J. Mach. Learn. Res. 11 (2010) 3371–3408.
[18] D. Yu, M.L. Seltzer, J. Li, J.-T. Huang, F. Seide, Feature learning [36] I.M. Mohammed, M.Z.N. Al-Dabagh, M.I. Ahmad, et al., Face
in deep neural networks—Studies on speech recognition tasks, 2013, Recognition Using PCA Implemented on Raspberry Pi, in: Proceed-
arXiv:1301.3605. ings of the 11th National Technical Seminar on Unmanned System
[19] N. Kalchbrenner, E. Grefenstette, P. Blunsom, A convolutional neural Technology, 2019, p. 2021.
network for modelling sentences, 2014, arXiv:1404.2188 [Online]. [37] J. Masci, U. Meier, D. Ciresan, et al., Stacked convolutional
Available: https://arxiv.org/abs/1404.2188. auto-encoders for hierarchical feature extraction, in: International
[20] S. Prasomphan, Improvement of speech emotion recognition with Conference on Artificial Neural Networks, Springer-Verlag, 2011.
neural network classifier by using speech spectrogram, in: International [38] Schuller Appendix, Computational paralinguistics emotion affect and
Conferenceon Systems, Signals and Image Processing, IEEE, 2015, pp. personality in speech and language processing, 2013.
73–76. [39] Shengwei Wang, Hongkui Wang, Sen Xiang, Li Yu, Densely connected
[21] Abdul Malik Badshah, Jamil Ahmad, Nasir Rahim, Sung Wook Baik, convolutional network block based autoencoder for panorama map
Speech emotion recognition from spectrograms with deep convolu- compression, Signal Process., Image Commun. (ISSN: 0923-5965) 80
tional neural network, in: 2017 International Conference on Platform (2020) 115678.
Technology and Service (PlatCon), 2017. [40] M. Nen, Alpayd, N. Ethem, Multiple kernel learning algorithms, J.
[22] J. Zhu, Z. Liu, Analysis of hybrid feature research based on ex- Mach. Learn. Res. 12 (2011) 2211–2268.
traction LPCC and MFCC, in: Tenth International Conference on [41] C. Persello, L. Bruzzone, Active learning for domain adaptation in
Computational Intelligence and Security, IEEE, 2014, pp. 732–735. the supervised classification of remote sensing images, IEEE Trans.
[23] B. Schuller, F. Burkhardt, Learning with synthesized speech for Geosci. Remote Sens. 50 (11) (2012) 4468–4483.
automatic emotion recognition, in: Proc. of the 2010 IEEE Int’L Conf. [42] A.I. Schein, L.H. Ungar, Active learning for logistic regression: An
on Acoustics Speech and Signal Processing (ICASSP), IEEE, 2010, evaluation, Mach. Learn. 68 (3) (2007) 235–265.
pp. 5150–5153. [43] C. Deng, X. Liu, C. Li, D. Tao, Active multi-kernel domain adaptation
[24] C. Deng, Y. Xue, X. Liu, C. Li, D. Tao, Active transfer learning for hyperspectral image classification, Pattern Recognit. 77 (2018)
network: A unified deep joint spectral–spatial feature learning model 306–315.
for hyperspectral image classification, IEEE Trans. Geosci. Remote [44] Shen Zhiyuan, Pan Guozhuang, A High-Precision Fatigue Detect-
Sens. 57 (3) (2019) 1741–1754, http://dx.doi.org/10.1109/TGRS.2018. ing Method for Air Traffic Controllers Based on Revised Fractal
2868851. Dimension Feature, Hindawi, 2020, 1024-123X.
[25] A.I. Schein, L.H. Ungar, Active learning for logistic regression: An [45] V.V. Nishawala, M. Ostoja-Starzewski, Acceleration waves on random
evaluation, Mach. Learn. 68 (3) (2007) 235–265. fields with fractal and hurst effects, Wave Motion 74 (2017).
413