Professional Documents
Culture Documents
PSVM
HAN Zhiyan, WANG Jian
College of Engineering, Bohai University, Jinzhou 121013
E-mail: hanzyme@126.com
Abstract: For the sake of ameliorating the precision of speech emotion recognition, this paper put forward a new
emotion recognition technique based on Deep Learning and Kernel Nonlinear PSVM (Proximal Support Vector
Machine) to discern four fundamental human emotion (angry, joy, sadness, surprise). First of all, preprocess speech
signal. And then use DBN (Deep Belief Networks) to extract emotional features in speech signal automatically. After
that, integrate the DBN automatic features and traditional features (prosody features and quality features) as the total
features. Finally, use six Nonlinear Proximal Support Vector Machines to recognize the emotion and use majority voting
principle to obtain the final identification result. To assess the new method, we compare the total features, DBN
automatic features and traditional features. The experimental results indicate that the total features are better than the
other two methods.
Key Words: Emotion Recognition, Speech Signal, Deep Learning, PSVM
978-1-7281-0106-4/19/$31.00 2019
c IEEE 1426
Step 4: Integrate the DBN automatic features and Assuming visual variables v {0,1}M and hidden
traditional features as the total features.
variables h {0,1}N , we can get the joint distribution over
Step 5: Use six Nonlinear Proximal Support Vector
Machines to recognize the emotion. the visual and hidden units:
Step 6: Use majority voting principle to obtain the final 1
identification result.
p( v, h;T ) exp( E ( v, h;T )) (1)
Z (T )
Where T {a,b, W} are parameters, Z (T ) is the
Preprocess normalization constant, E ( v, h; T ) is the energy function:
Speech
Z (T ) ¦¦ exp( E ( v, h;T ))
v h
(2)
Extract DBN Extract traditional
M N M N
features features
E ( v, h;T ) ¦¦ Wij vi h j ¦ bi vi ¦ a j h j (3)
i 1 j 1 i 1 j 1
3 FEATURE EXTRACITON
The 31th Chinese Control and Decision Conference (2019 CCDC) 1427
corresponding calm statement 1 2 1
min Q
y 澠u cu +J 2澡
Feature 2 the pitch frequency average value (Z ,J , y )R n1 m 2 2
Feature 3 the maximum pitch frequency s.t. D ( AAcDu eJ ) y =e
(5)
average value
Feature 4
If we now instead of the linear kernel AAc using a nonlinear
the difference of pitch frequency
kernel K ( A, Ac) , we can get:
average value and corresponding
calm statement pitch frequency 1 2 1
average value min Qy 澠u cu +J 2澡
(Z ,J , y )R n1 m 2 2
Feature 5 the difference of maximum pitch
s.t. D ( K ( A, Ac) Du eJ ) y =e
(6)
frequency and corresponding calm
statement maximum pitch frequency Here, we replace K ( A, Ac) using the shorthand notation K .
Feature 6 amplitude average energy We will use the following Gaussian kernel for the PSVM :
Feature 7 amplitude energy dynamic range 2
P Ai c B. j
Feature 8 the difference of amplitude average ( K ( A, B ))ij H , i 1,! , m, j 1,! , k
(7)
energy and corresponding calm
mun nu k
statement amplitude average energy Where A R , BR K and P is a positive constant.
Feature 9 the difference of amplitude energy The gradients with respect to (u , J , y, v) of the Lagrangian:
dynamic range and corresponding
2
calm statement value Q 2 1 ªu º
L(u , J , y,X ) y + «J » -X c( D( KDu eJ ) y e)
2 2 ¬ ¼
The latter seven features are shown in Table2.
Table2. Prosody Features (8)
the first resonance peak frequency Lagrangian gradients about (u , J , y,X ) equation to zero,
Feature 10
average value gives the following KKT optimality conditions:
Feature 11 the second resonance peak frequency u DK cDX 0
average value
J ecDX 0
Feature 12 the third resonance peak frequency
average value vy X 0
Feature 13 the harmonic noise ratio mean D ( KDu eJ ) y e (9)
Feature 14 the maximum of harmonic noise We can get:
ratio
u DK cDX
Feature 15 the minimum of harmonic noise ratio
J ecDX
Feature 16 the harmonic noise ratio variance
X
y
Q
4 NONLINEAR PROXIMAL SUPPORT I I
X ( D ( KK c eec) D) 1 e ( GGc)1 e
VECTOR MACHINE Q Q (10)
The Proximal Support Vector Machine (PSVM) classifies Where G is as follow:
points relying on proximity to one of two parallel planes.
G D>K -e@
Acquiring a linear or nonlinear PSVM classifier doesn’t (11)
require more complicated than solving a single linear The nonlinear separating surface is as follows˖
equation [12-16].
xcZ J (12)
xcAcDu J 0
In order to get our nonlinear proximal support vector
machine classifier, we revise the following optimization We can obtain the nonlinear separating surface by
problem Eq. (4) to Eq. (5) by taking the place of the primal replacing xcAc use K ( xc, Ac) :
variable Z using its dual equivalent Z AcDu : K ( xc, Ac) Du J K ( xc, Ac) DDK ( A, Ac) DX ecDX
1 2 1 ( K ( xc, Ac) K ( A, Ac)c ec) DX 0 (13)
min Q y 澠Z cZ +J 2澡
(Z ,J , y )R n1 m 2 2 (4)
The corresponding nonlinear classifier for this nonlinear
s.t. D ( AZ eJ ) y =e
separating surface is then:
1428 The 31th Chinese Control and Decision Conference (2019 CCDC)
! 0, then x A, Emotion
Emotion categories
° categories
( K ( xc, Ac) K ( A, Ac)c ec) DX ® 0, then x A,
° Joy 98% 1% 0 1%
¯ 0, then x A or x A,
Anger 2% 97% 1% 0
Surprise 5% 4% 91% 0
(14)
Sadness 2% 1% 3% 94%
The 31th Chinese Control and Decision Conference (2019 CCDC) 1429
Computer Science & Information Security, Vol.7, No.3, [14] V. N. Vapnik, The nature of statistical learning theory,
1552-1556, 2016. Springer, New York, 2000.
[7] P. Liu, S. Han, Z. Meng, Y. Tong, Facial expression [15] P. S. Bradley, O. L. Mangasarian, Massive data
recognition via a boosted deep belief network, IEEE discrimination via linear support vector machines,
Conference on Computer Vision & Pattern Recognition, Optimization methods and software, Vol.13, No.1, 1-10,
1805-11812, 2014. 2000.
[8] G. Ian, B. Yoshua, C. Aaron, Deep learning, Posts & [16] D. D. Aroor, C. S. Chellu, GMM-Based intermediate
Telecom Press, Beijing, China, 2017. matching kernel for classification of varying length patterns
[9] G. E. Hinton, Learning multiple layers of representation, of long duration speech using support vector machines, IEEE
Trends in Cognitive Sciences, Vol.11, No.10, 428-434, Trans on Neural Networks and Learning Systems, Vol.25,
2007. No.8, 1421-1432, 2014.
[10] G. E. Hinton, S. Osindero, Y. W. The, A fast learning
algorithm for deep belief nets, Neural Computation, Vol.18,
No.7, 1527-1554, 2014.
[11] Z. Han, J. Wang, Speech emotion recognition based on
Gaussian Kernel Nonlinear Proximal Support Vector
Machine, Proceedings of Chinese Automation Congress,
2513-2516, 2017.
[12] G. Fung, O. L. Mangasarian, Proximal Support Vector
Machine Classifiers, International Conference on
Knowledge Discovery & Data Mining, 77-86, 2001.
[13] L.H. Chiang, M.E. Kotanchek, A.K. Kordon, Fault diagnosis
based on fisher discriminant analysis and support vector
machines, Computers & Chemical Engineering, Vol.28,
No.8, 1389-1401, 2004.
1430 The 31th Chinese Control and Decision Conference (2019 CCDC)