Professional Documents
Culture Documents
Peeters
2004
!#"%$'&($')+*,"!-#./0$#&($12/!354(67#89:*-#!;&#<$'="0
><?7@A@BDCE& $(F0E!A/
peeters@ircam.fr
http://www.ircam.fr/
1 Introduction
1.1
Features taxonomy
NOEP0QSRUTEVUPNNXWUYZR[\VUQU]^TE_0TO [`WEabOEcUPDaSP0QUOdUYP
ecUPfOETE]gPhPiSOEP0VSOWEajOEcUPfRUPN_0YT kO'T WUVlkmYMWUn0TERUP0Rpo[XO cUPqaSP0QUOEdUYPN
UvU<w v wyx
{v
~{y { }
sK,
zD{ |A }Aw ~ ,sK2~v |
y~ <} Ux
{
rts u<vUwyx
v , wyv,wyv y
| sK,{ |
zD{y|A}A~ ~|
D xKy<wyx
sK,
zD{y|A}A~ ~|
< wyx
{yy Kx s vU~ u
{
z{y|A}A~
sK,
~|
r u v,wMs 2v
{ {
!
23/04/04
"
1/25
G. Peeters
2004
y
(M U0E
!"$#0M &%'( ' ' )*
A 0
:
) 2
$
132 G"4<:=
-.-0/
48F>@ G6R
4878K<:4K<>48F9>587
N 13JFC8K@ <:=
F78C@ 2 JK9>@ 7
4878? K<>48K<>48F9:57
N <:@AB9:462 C
F78C@ 2 JK9:@ 7
/
48SF>= 9:J
$
/
= 9:P<>=
N F>AMJ9:@ <>=
F78C@ 2 JK9>@ 7
&
= 9:P<>=
N >
F AMJ9:@ <:=
F78C@ 2 JK9:@ 7
$
'
!
,
1.2
23/04/04
13
? 2 4658789:2 ;6<>=
<:D @AB9:462 C
9E;6F>=
2/25
G. Peeters
2004
2 Pre-computing
2.1
Energy envelop
$
YM] N
2.2
!
&5' (6)* +
2.3
!" !
##%$
H5I WJOP
23/04/04
FF%G
3/25
G. Peeters
2.4
2004
Perceptual model
!"!#%$'&(
) #%$
!!
/ &
) 0
#$
*+,
'!,
-',.
,
Amplitude [db20]
-20
-40
-60
-80
-100
-120
-4
10
-2
10
10
10
10
10
Frequency [Hz]
+ ,
aS_
23/04/04
0.4
9<:
678
6 +
>? 6
678
5 5
9;:
3 3
4 34 3
0.2
0
0.5
1.5
Frequency [Hz]
2.5
4
x 10
/ ,
4/25
G. Peeters
!#"%$&'
=
(!)
=
oUP:TEV
P0VUR
2004
-./
,
0
+
-./
,
0
+
0.3
0.2
0.1
0
0.5
1.5
Frequency [Hz]
0 1
5768 4
1?!@#ACB1DE
( )
23/04/04
>
=
=)
2.5
x 10
< )
9
= 75 :; 6 4
5/25
G. Peeters
2.5
2004
Ampl
Ampl
0.1
0.05
0
2000
4000
-3
Freq
x 10
0.05
0
-10
-3
x 10
8
6000
2000
0
-10
6000
Freq
100
50
0
2000
-5
4000
50
0
-10
6000
-5
Log-freq
4
4
(
6
.
5
5 5
.
.
5
5
6
-
.
.
(
(
6
(
-
23/04/04
100
3 &
4
2.6
150
Freq
Log-freq
200
Log-ampl
Log-ampl
4000
150
200
2
0
0
Log-freq
6
Power
Power
-5
6/25
G. Peeters
2004
Envelop characterization
attack decay
7 %
sustain
release
& 9
( "
!
sustained sound
non-sustained sound
attack
: %
rest
( "
!
23/04/04
7/25
G. Peeters
2004
energy
attack
end
start
90%
...
20%
time
"
$
energy
O;
O; O;
...
threshold 2
threshold 1
OEc
@
OEc
;
56 8
OEc
time
23/04/04
end
effort 12
OEc
start
effort 23
...
OEc
attack
OEc
O 56 8
56 8
8/25
G. Peeters
(mpeg7:LogAttackTime)
2004
DT.g_lat
(cuidado:TemporalIncrease) DT.g_incr
;
<
>
23/04/04
9/25
G. Peeters
2004
3.1.4 Example
F:\data\class\sol\sust\bowedstring\alto\mf\alto\_a\_gref\_mf\_si3\_12.wav
1
0.5
0.2
0.4
0.6
0.8
1
1.2
1.4
Dlat: -0.53981 - threshold: 0.15 - Dincr: 3.265 - Ddecr: -0.28535
1.6
1.8
10
1.4
1.6
1.8
15000
10000
5000
0
4
5
6
incr (r-) incr2(r--) desc (r-)
0.5
satt_posn
0.2
0.4
; <
4#
5 %
4,
eatt_posnmaxenv_posn
0.6
2 #
1.2
"
5 %
5 9
41
0.8
2 (
0.4
0.3
0.2
0.1
0
0.5
1
1.5
MODam: 0.060872 - MODfr: 5.3833
2.5
0.6
envelop-v
polyfit
hatenvelop-v
0.4
0.2
0
-0.2
0.6
0.8
1.2
1.4
1.6
1.8
2.2
0.015
fft(envelopv-polyfit)
0.01
0.005
0
4,
41
23/04/04
5 %
10
= %
4#
5 %
% 8
15
20
25
30
35
40
10/25
45
50
G. Peeters
3.2
2004
Others
(mpeg7:TemporalCentroid)
DT.g_tc
(cuidado:TemporalEffectiveDuration)
DT.g_ed
energy
threshold
time
effective duration
Auto-correlation
(cuidado:AudioZcr)
DT.i_xcorr_m
$
0.2
1
signal
xcorr
0.5
Amplitude
Amplitude
0.1
0
-0.1
-0.2
200
400
Time
600
-0.5
-20
800
-10
0
Time
10
20
250
signal
xcorr
Amplitude
200
150
100
50
0
1000
2000
4
4
3000
Frequency
4000
5000
6000
5
8
23/04/04
11/25
G. Peeters
4.2
2004
DT.i_zcr_v
' >
8
+ >
? / + '
? 7 0 =
5 Energy features
5.1
Total Energy
(mpeg7:AudioPower)
DE.i_tot_v
5.2
(cuidado:AudioHarmonicPower)
DE.i_harmo_v
(cuidado:AudioNoisePower)
DE.i_noise_v
5.3
23/04/04
12/25
G. Peeters
2004
6 Spectral features
6.1
(mpeg7:AudioSpectrumCentroid)
=
DS.i_sc_v
(mpeg7:AudioSpectrumSpread)
DS.i_ss_v
DS.i_skew_v
m e a n : 7 . 8 7 2 e - 0 1 7 s td : 5 s k e w : - 8 . 3 2 5 4 e - 0 1 7 k u r t: 3
d a ta
ga u s s f it
0.0 9
0.0 8
0.0 7
0.0 6
0.0 5
=
0.0 4
0.0 3
0.0 2
0.0 1
0
- 50
- 40
-30
-2 0
-1 0
10
20
30
40
50
m e a n : 1 6 .6 7 s td : 2 3 . 5 7 1 4 s k e w : - 0 . 5 6 5 6 9 k u r t: 2 . 4
0 .0 2 5
d a ta
g a u s s fi t
0 .0 2
0 .0 1 5
0 .0 1
0 .0 0 5
0
-5 0
-4 0
-3 0
-2 0
-1 0
10
20
30
40
50
m e a n : - 1 6 .6 7 s t d : 2 3 . 5 7 1 4 s k e w : 0 . 5 6 5 6 9 k u r t : 2 . 4
0 .0 2 5
d a ta
g a u s s fi t
0 .0 2
0 .0 1 5
0 .0 1
0 .0 0 5
0
-5 0
23/04/04
-4 0
-3 0
-2 0
-1 0
10
20
30
40
50
13/25
G. Peeters
(cuidado:AudioSpectrumKurtosis)
DS.i_kurto_v
m e a n : 7 . 8 7 2 e -0 1 7 s t d : 5 s k e w : - 8 . 3 2 5 4 e - 0 1 7 k u rt : 3
d a ta
g a u s s f it
0 .0 9
=
2004
0 .0 8
0 .0 7
0 .0 6
0 .0 5
0 .0 4
=
0 .0 3
0 .0 2
0 .0 1
0
-5 0
-4 0
-3 0
-2 0
-1 0
10
20
30
40
50
m e a n : - 2 . 1 5 9 7 e -0 1 5 s td : 2 8 . 8 7 0 4 s k e w : 3 . 1 2 0 4 e - 0 1 6 k u r t: 1 .8
0 .0 2 5
d a ta
g a us s fi t
0 .0 2
0 .0 1 5
0 .0 1
0 .0 0 5
-1 0
-8
-6
-4
-2
10
m e a n : 0 . 0 0 4 9 9 9 8 s t d : 1 . 4 1 4 2 s k e w : 5 .3 0 3 2 e - 0 0 7 k u r t: 6 .0 0 0 3
1 .2
d a ta
g a u s s f it
0 .8
0 .6
0 .4
0 .2
0
-1 0
(cuidado:AudioSpectrumSlope)
-8
-6
-4
-2
10
DS.i_slope_v
$
=
+
"! =
23/04/04
#
#
14/25
G. Peeters
2004
DS.i_decr_v
=
=
(cuidado:AudioSpectrumRollOff)
DS.i_rolloff_v
NY
"
8
.
6.2
"
/ 4#
5 %
; 0 @
4
5
; 0 @
8
-
.
-
8
.
Q
O
$
Q
O
=
23/04/04
15/25
G. Peeters
6.3
2004
-5
spectrum
mid-ear spectrum
-10
-15
-20
1000
2000
3000
Frequency
Log-amplitude
4000
5000
6000
0
-1
-2
-3
10
15
20
25
Mel band
MFCC
Value
$
s(n)
FFT
MelBand
0
-5
-10
0 4#
MFCC
DCT
6
MFC coefficient
5 ,
Log
,
4
-,
-,
=
5 ,
7 Harmonic features
7.1.1 Fundamental frequency
(mpeg7:AudioFundamentalFrequency) DH.i_f0_v
7.1.2 Noisiness
(mpeg7:AudioHarmonicity)
23/04/04
10
DH.i_noisiness_v
16/25
12
G. Peeters
(cuidado:AudioInharmonicity)
f0
energy
7.1.3 Inharmonicity
2004
DH.i_inharmo_v
2 f0 3 f0 4 f0 5 f0 6 f0 7 f0
f(1)
$
(
(mpeg7:HarmonicSpectralDeviation) DH.i_devs_v
Spectral deviation: 0.15374
0.18
spectral envelop
harmonics
0.16
(
c
Q
c
c
a
c
0.14
0.12
Amplitude
0.1
0.08
0.06
0.04
0.02
0
4
6
Frequency [harm number]
10
7 )
8
23/04/04
17/25
G. Peeters
2004
0.04
0.035
Amplitude
0.03
0.01
0.005
=
0.02
0.015
0.025
10
Frequency [harmonic number]
15
20
% 8
P0ODQ ,
tristimulus1
tristimulus2
tristimulus3
0.35
0.3
0.25
0.2
0.15
$
0.1
+
0.05
23/04/04
+
10
15
20
18/25
G. Peeters
2004
8 Perceptual features
8.1
Features
=
=
8.1.3 Sharpness
= <
=
(cuidado:AudioSharpness)
DP.i_sharp_v
8.1.4 Spread
)+*
(cuidado:AudioRelativeSpecificLoudness):
$%& '
23/04/04
(cuidado:AudioSpread)
DP.i_spread_v
19/25
G. Peeters
2004
9 Various features
9.1
(mpeg7:AudioSpectrumFlatness) DP.sfm_m
&
!
&
13254
, *-.
-/0
(* ()
&
9;<>=?
FHGIJKMLON P
$&'
6
"#$% )
=
=
78:9;<
CDE
9;<
B
@
=?
A
$
QSR
23/04/04
))
*
+
, *-.
-/0
Z\[^]
UWV
T
_a`
Y
X
_a`
20/25
G. Peeters
2004
10 Temporal modeling
10.1.1 Mean
10.1.2 Variance
*
, "!$#%'&
(
, "!$#%'&
) , "!$#%'&
10.1.3 Deviation
/ . 0 1243
A;6 BDC
E
=
H 5?6 8$:@;
576 8:<;
/ . 0 1243
=
(>
H 57698$:<;
G
H 576 8:<;
57698$:<;
E
F
+
+
>
H 576 8:<;
Element Name
weight
element name
Mpeg-7
yes
Mean
Yes
Variance
Extension
Derivative
extension
Modulation
23/04/04
21/25
G. Peeters
/
2004
! "
! / & /
, ! 0 -
12121
+ " , &
-
' = #
23/04/04
"
22/25
G. Peeters
2004
23/04/04
frame
based
number of
features
acronym
xml tag
n
n
n
n
n
1
1
1
1
1
DTg_lat
DTg_incr
DTg_decr
DTg_tc
DTg_ed
mpeg7:LogAttackTime
cuidado:TemporalIncrease
cuidado:TemporalDecrease
mpeg7:TemporalCentroid
cuidado::TemporalEffectiveDuration
y
y
12
1
DTi_xcorr_m
DTi_zcr
cuidado:AudioXcorr
cuidado:AudioZcr
y
n
y
y
1
2
1
1
DEi_tot_v
DTg_mod_fr, DTg_mod_am
DEi_harmo_v
DEi_noise_v
mpeg7:AudioPower
ScalableSeriesType element name="Modulation"
cuidado:AudioHarmonicPower
cuidado:AudioNoisePower
y
y
y
y
y
y
y
y
6
6
6
6
6
1
1
3
DSi_sc_m
DSi_ss_m
Dsi_skew_m
Dsi_kurto_v
Dsi_slope_v
Dsi_decs_c
Dsi_rolloff_v
Dsi_variation_v
mpeg7:AudioSpectrumCentroid (mpeg7:SpectralCentroid)
mpeg7:AudioSpectrumSpread
cuidado:AudioSpectrumSkewness
cuidado:AudioSpectrumKurtosis
cuidado:AudioSpectrumSlope
cuidado:AudioSpectrumDecrease
cuidado:AudioSpectrumRollOff
cuidado:AudioSpectrumVariation
y
y (post)
y (post)
12
12
12
DPi_mfcc_m
DPi_Dmfcc_m
DPi_DDmfcc_m
cuidado:AudioMFCC
y
n
y
y
y
y
y
1
2
1
1
3
3
9
DHi_f0_v
F0 Mod AM, FR
DHi_noisiness_v
DHi_inharmo_v
DHi_devs_v
Dhi_oeratio_v
Dhi_tri_v
mpeg7:AudioFundamentalFrequency
ScalableSeriesType element name="Modulation"
mpeg7:AudioHarmonicity
cuidado:AudioInharmonicity
mpeg7:HarmonicSpectralDeviation
cuidado:HarmonicSpectralOERatio
cuidado:HarmonicSpectralTristimulus
y
y
y
y
y
y
y
y
6
6
6
6
6
1
1
3
DHi_sc_m
DHi_ss_m
DHi_skew_m
DHi_kurto_v
DHi_slope_v
DHi_decs_c
DHi_rolloff_v
DHi_variation_v
mpeg7:HarmonicSpectralCentroid
mpeg7:HarmonicSpectralSpread
cuidado:HarmonicSpectralSkewness
cuidado:HarmonicSpectralKurtosis
cuidado:HarmonicSpectralSlope
cuidado:HarmonicSpectralDecrease
cuidado:HarmonicSpectralRollOff
mpeg7:HarmonicSpectralVariation
y
y
y
y
1
24
1
1
DPi_loud_v
DPi_specloud_m
DPi_sharp_v
DPi_spread_v
AudioLoudness
cuidado:AudioRelativeSpecificLoudness
cuidado:AudioSharpness
cuidado:AudioSpread
y
y
y
y
y
y
y
y
y
y
y
6
6
6
6
6
1
1
3
3
3
9
DPi_sc_m
DPi_ss_m
DPi_skew_m
DPi_kurto_v
DPi_slope_v
DPi_decs_c
DPi_rolloff_v
DPi_variation_v
DP_ioeratio_v
DPi_devs_v
DPi_tri_v
cuidado:AudioFilterbankCentroid
cuidado:AudioFilterbankSpread
cuidado:AudioFilterbandSkewness
cuidado:AudioFilterbankKurtosis
cuidado:AudioFilterbankSlope
cuidado:AudioFilterbankDecrease
cuidado:AudioFilterbankRolloff
cuidado:AudioFilterbankVariation
cuidado:AudioFilterbankOERatio
cuidado:AudioFilterbankDeviation
cuidado:AudioFilterbankTristimulus
y
y
4
4
166
DPi_sfm_m
DPi_scm_m
mpeg7:AudioSpectrumFlatness
cuidado:AudioSpectrumCrest
23/25
G. Peeters
2004
12 Acknowledgement
13 References
% +
% 3
>7 %
0
# ' % ) % % % %
>0
A %
3
D '( ! % ) % % A % 0 ' % 3 % # ' %
A E +A 0
= 0
0( % % \
2
>
A
> % %
'&
0 %
'
7 M0( % %
0
0
%
3
'
3
'
. .
/:
%'
'
%
7
H
2"*
'
)" '
:E
.2* f7 % ' %
2L$
HI&"JKK
;.$
%'
0
#
3.%
)
%
%
HS6 )UTIH
. .VJ
% '
p0
tA #0" 3" .@;
/ 0 >
JJ"JK
/ 0
p
-/P"@C"JKKJ; 0
ZM _J Z !
/ `<
% (0
% 54
>)2
'
3 %'
'
%
% % 20>)
A3
)
E
%('>% F
*) %
%
% E
'
"%
0' '
.3
'&
'! (> % %
"
p>)E02^
2
)
a .R
" JKKJ
% 0 L$
2> )?]t>*)
%
3
>3 .QUpM
%
U
S S-c
'
U
US
)
)* '
McbJKKK
0
M. % %
))
!
03,
5
0
# %
9]
)
" !;
#&< )" '
23/04/04
Z
)
'
0
% U8MA % 7
.W."T
% 0 h
>#>7
"
U+
%
") %
- >9
7 % % %
)&
01%*
2
% &b+
'
Z/ ^
;$
7 % % M
c %
L
M
%
q0#00*
L
% E 0y
3 %'&%Ad
") %
%
T)
2 %
N/ )7h 7 .@
D 7 #* %
-
"% *y! ' X
%
)S6!8 >)
U
tS: )76I98\>):"JKK;g+ % 3 % 7 % $t
U %
] A
% + 0 7 ' M3#>7
<-6]t ) * HS
8M
)
U
USVbJKK
0 ' %
7 % % A % 0N] P % 3 0 # ' 3#>A
%
V3 %'&%ed %
/U
R D 9' % W.fT01%" I
!
% 8MA % ?
'
"%
% )")
'
%('
,+
%" > %
>)M&p0
'
)*
7 0
,
M % .*
3
)" '
)>
)*) % %('
%'&
%
% )
E10.2* E
h0:
% +' 0
&% %
M
A % % % 0
M
# E 0
h +9Z ' % &) % . M % '
* %
24/25
A0
%
G. Peeters
(?6I$ )f
% ;
D
6SV8\>) 2
$.bJKK .;,
2
+-/
% .%/V #@@#&M
y
>M)"
23/04/04
M % &
3 :
M
L8M
hh+
# % ' 30
*)
% >
]t '
3") %
RJ
#RJR
[
' % #>'M
L8M
% + '
3
+
<-6
0% 0
)"#
+)
%
* % '
% !1"4!3
3
E0!3+" % >") M 7 0q '( 0 ' A % #> 7> ) P
:*)0-
K#&
)D/V$
3)"`
#
;= > )
2004
)
0 % 0
% ) & M
25/25