Individualization of Head Related Transfer Functions Using Principal Component Analysis - Applied Acoustics - 2015 - Fink, Ray

Applied Acoustics 87 (2015) 162–173
Contents lists available at ScienceDirect
Applied Acoustics
journal homepage: www.elsevier.com/locate/apacoust
Individualization of head related transfer functions using principal

component analysis
Kimberly J. Fink, Laura Ray ⇑
Thayer School of Engineering, Dartmouth College, 14 Engineering Drive, Hanover NH 03755, United States
a r t i c l e i n f o a b s t r a c t
Article history: Prior research investigates virtual auditory displays (VADs) using models of HRTFs as a function of a finite
Received 25 February 2014 number of principal components (PCs) and associated weights (PCWs). This paper studies the effect of
Received in revised form 3 July 2014 PCWs on horizontal plane HRTFs derived from a database of HRIRs and provides a principled approach
Accepted 4 July 2014
to PCW tuning. Tuning is first evaluated numerically to determine how variation of PCWs from an average
Available online 25 July 2014
PC model affects HRTF spectral characteristics. An average PC model at 50 azimuths in the horizontal
plane is developed from a database of HRIRs of 34 subjects. HRIRs of nine additional subjects are used
Keywords:
to test the validity of the average model and to conduct numerical optimization experiments, in which
Head related transfer functions
Principal component analysis
a cost function of spectral distortion is minimized by sequentially tuning PCWs. Sequential tuning mimics
Virtual auditory display how a human would tune a VAD. Numerical results show that sequential tuning of a subset of PCWs
reduces spectral distortion metrics when tuning an average HRTF to match an individual HRTF. These
experiments show that tuning PCWs can change the shape and frequency location of the pinna notch.
The numerical experiments also aid in developing a tuning method that is amenable to human tuning.
Several variants of subject tuning experiments are conducted to verify that sequential tuning reduces
listening errors. Results of a head steering task show an improvement of 30% in large heading errors when
using a tuned VAD relative to an untuned VAD.
Ó 2014 Elsevier Ltd. All rights reserved.
1. Introduction (PCA) to reduce dimensionality of head-related impulse responses

(HRIRs) identified from HRIRS of 34 subjects in a public database
Spatial localization of sound is known to depend on interaural and shows that principal component weights (PCWs) can be tuned
time delay and level difference, and more generally on the interac- sequentially in order to individualize the HRTFs.
tion of the source with the head, torso, and pinna. Head-related Methods for HRTF customization are important in reducing lis-
transfer functions (HRTFs) model the acoustic transfer functions tening errors. Principal component analysis (PCA) has been used by
between the source and the listener’s tympanic membrane from several authors to model HRIRs or HRTFs and to reduce the dimen-
which listeners derive spatial information from binaural signals sionality of an HRIR or HRTF dataset. PCA reduces dimensionality
and have been used to create a virtual auditory display (VAD) by transforming a number of potentially correlated variables into
through head phones [1,2]. A high quality VAD typically requires a smaller number of uncorrelated principal components (PCs),
a large number of HRTFs, as well as a means of continuous repre- where a small number of PCs recover a large percentage of the var-
sentation or interpolation of HRTFs [3–5]. Non-individualized iability in the database [10–12]. Kistler and Wightman [12] use fre-
HRTFs are generally used to create a VAD owing to the difficulty quency-domain PCA to reduce dimensionality of 5300 HRTFs (10
of measuring individual HRTFs [6–8]. These, however, can create subjects at 265 locations for both ears) measured at 11 elevations
errors in sound localization such as front-back reversals, up-down from 48° to 72° and 24 azimuths from 165° to 180° [2]. They
confusions, and lateralization or inside of head localization [6–9]. found that the first five PCs retain 90% of the variation in the ori-
This paper presents and evaluates a method that enables listeners ginal log-magnitude HRTF dataset. Middlebrooks and Green [13]
to tune a VAD in the horizontal plane originating from an HRTF also perform PCA in the frequency domain on the HRTFs of eight
database. The method incorporates principal component analysis subjects at 360 locations for both ears. They also found that the
first five PCs covered approximately 90% of the variation of the ori-
⇑ Corresponding author. Tel.: +1 603 646 1243. ginal HRTFs, and their PCs were similar to those of [12]. By splitting
E-mail address: lray@dartmouth.edu (L. Ray). the subjects into two groups according to height and performing
http://dx.doi.org/10.1016/j.apacoust.2014.07.005
0003-682X/Ó 2014 Elsevier Ltd. All rights reserved.
K.J. Fink, L. Ray / Applied Acoustics 87 (2015) 162–173 163
PCA twice [13], found a dependency of PCs on the physical size of nonlinear optimization results exist, it is too complex for a human
subjects. to tune multiple parameters simultaneously.
The CIPIC database is comprised of the measured HRIRs of 43 We first develop a PC model of an average subject at each of 50
subjects (27 male and 16 female) and 2 KEMAR manikins at 1250 azimuths in the horizontal plane using HRIRs reported for 34 sub-
locations (25 azimuths and 50 elevations) [14]. It incorporates jects in the CIPIC database. We identify a set of PCs that models the
anthropometric measurements of many subjects. The HRIRs con- original, measured HRIRs with less than 5% error, averaged over 34
tain 200 samples at a sample frequency of 44.1 kHz. Zotkin et al. subjects. We hold out nine subjects from the CIPIC database in
[3] use anthropometric database matching to generate semi-per- order to test the validity of the PC model derived from these 34
sonalized HRTFs from the CIPIC database. Seven measurements of subjects, and we use HRIR data for these nine subjects with the
each pinna are taken with a camera. Using the CIPIC data for HRTFs optimization procedure to develop a tuning procedure using a
and pinna measurements, the best matching set of HRTFs is found small number of PCWs, in order to match the average PC model
for the left and right ears. Although this method reduces localiza- to the subject’s HRTF derived from a measured HRIR. Two optimi-
tion errors, it requires taking a picture of a listener’s pinna and zation experiments are reported. In the first, an average HRTF is
automatically computing all the desired dimensions. Xu et al. constructed from the CIPIC database for a specified direction, and
[15] followed a similar approach finding that anthropometric data a subset of weights are tuned to match a specific subject’s HRTF.
can improve HRTF personalization. In the second experiment, a subset of PCWs of the average HRTF
Refs. [16–21] develop customization methods that do not for a specified direction are tuned to match the HRTF of a specific
involve anthropometric data. They use PCA in the time domain, subject for that direction reflected to the back of the head. These
rather than in the frequency domain, on portions of the HRIRs in numerical experiments thus investigate customization of an aver-
the CIPIC database in the median plane. Shin and Park [16] perform age HRTF for a given direction to a specific subject, and customiza-
PCA on the first 0.2 ms of HRIRs of 45 subjects in the median plane tion of an average HRTF to eliminate a front-back reversal.
at 45 elevations. Shin and Park [16] found that five PCs for each ear Following numerical experiments, results of subject testing, in
at each elevation recover more than 90% of the variation in the which subjects tune PCWs sequentially to customize a VAD are
original 0.2 ms of the HRIRs. They also found that the PCs were presented. A head steering task is conducted with the tuned VAD
similar for each ear, so ear symmetry was assumed. They allowed to demonstrate effectiveness in a listening scenario.
subjects to tune all five PCWs for the left ear HRIRs at each eleva-
tion, modifying the first 0.2 ms of the HRIR, and for the remainder, 2. Theory of principal component analysis of HRIRs
a KEMAR HRIR was used. Testing was done on four subjects with
the customized and KEMAR HRIRs. The HRIRs of the four subjects 2.1. Principal component analysis (PCA)
were also measured and included in the subject testing. The
front-back reversal rate is 28.1% for the KEMAR HRIRs, 13.1% for PCA reduces the dimensionality of a dataset while keeping as
measured HRIRs, and 10.6% for tuned HRIRs. much of the original variation as possible [10,11]. One PCA
Hwang et al. [18] perform PCA on the HRIRs of 45 subjects from approach is based on singular value decomposition (SVD), in which
the CIPIC database in the median plane. They, however, only per- the data are assembled into N M matrix, X, where each row is an
form a single analysis that includes HRIRs for 49 elevations in observation. The mean of X is calculated as
the median plane and every subject. The first 1.5 ms of the HRIRs
are used, and once again, ear symmetry is assumed. With the first 1X N
xm ¼ X n;m ð1Þ
12 PCs they reconstruct the original HRIRs with 4.8% modeling N n¼1
error and retain 90.2% of the variation. For customization, subjects
where the subscript denotes the element.
xm is subtracted from X
tune the three weights with the highest standard deviations at
providing matrix B ¼ X ux where u is a N 1 vector of ones.
each elevation; therefore, each elevation has its own set of PCWs
SVD of B gives
being tuned. Customization of the HRIRs and subject testing is
reported for three subjects. The average front-back reversal rate B ¼ USV T ð2Þ
is 22.6% with KEMAR HRIRs, 0.37% with measured HRIRs, and
5.9% with customized HRIRs [18]. Refs. [17, 19–21] present similar The original data X can be reconstructed by,
methods and results.
X ¼ USV T þ ux ð3Þ
While these studies provide evidence that PCW tuning gives
customized HRIRs with improved front-back reversal rate, there V is an M M matrix and the columns are the PCs. The PCWs are
are no studies that quantitatively relate tuning to changes in the contained in N M matrix US with each row corresponding to the
HRTF or that relate tuning procedures amenable to human use to PCWs for one observation from the original matrix X. If all the PCWs
achievable variations in the HRTF through tuning. The objectives are used, the original data is reconstructed without error. If the
of this paper are (1) to investigate the effect of tuning PCWs on number of PCs is truncated to reduce the size of the data, error is
horizontal plane HTRFs derived a from PC model of the HRIR, and incurred.
(2) to identify a procedure for individualizing an HRTF through
tuning PCWs in a manner that is amenable to use by human sub- 2.2. Selection of the number of PCWs
jects to reduce characteristic anomalies in average HRTFs, e.g.,
front-back reversals and lateralization. Specifically, we wish to PCA should provide a set of basis functions that can represent
determine how varying PCWs from an average PC model affects the HRIRs of a general population. While it is unknown whether
HRTF spectral characteristics, such as the pinna notch. Addition- the CIPIC or any other existing database is sufficiently large to pro-
ally, we identify a subset of PCWs that can be used to tune a vide a set of basis functions, we investigate this question by using a
VAD in order to individualize an HRTF. To do so, we construct a large number of subjects from the database and holding out sub-
numerical nonlinear optimization problem in which PCWs are jects based on anthropometric data for validation, setting a maxi-
tuned sequentially, in a specified order, to minimize an objective mum error threshold of 5% [17–21]. We perform PCA on the
function of spectral distortion. We use sequential tuning as a HRIRs of 34 subjects from the CIPIC database for all 50 azimuths
model of how a human would tune a VAD; while more complex in the horizontal plane. HRIRs of KEMAR manikins are removed
164 K.J. Fink, L. Ray / Applied Acoustics 87 (2015) 162–173
from the dataset along with HRIRs of nine subjects (numbers 20, 3. Numerical methods and results of numerical optimization of
40, 50, 59, 61, 131, 153, 155, and 162), with three each representa- PCWs
tive of ‘‘large’’, ‘‘small’’, and ‘‘average’’ anthropometric measure-
ments of head width, pinna, and chin to shoulder, respectively. We investigate how the HRTF is modified when PCWs are
Subject HRIRs are reserved for validation of the PC model and for changed, and specifically whether HRTF spectral characteristics,
use in numerical optimization experiments. Before PCA, the initial such as the pinna notch can be tuned through optimization of
delay and interaural time delay (ITD) are removed using proce- PCWs. Additionally, we develop an optimization approach in which
dures described within the CIPIC database documentation. Each a small number of PCWs are tuned in sequence to create individu-
HRIR is then padded with zeros to make the length 200. The HRIRs alized HRTFs. Sequential tuning of weights provides a methodology
for the left and right ears are combined into N M matrix X, where that is amenable to human subject tuning. If the weights can be
N = 1700 (34 subjects 50 azimuths) and M = 400. Each row of X changed sequentially within a numerical optimization, with evi-
corresponds to a particular azimuth at 0° elevation and subject dence of a progression towards minimizing an objective function,
and contains 400 samples – the 200-point HRIRs for left and right then sequential tuning should be possible for a person to perform.
ears, respectively. The HRIRs for the left and right ears are com- The numerical study also provides insight on the number of
bined so that at each azimuth the PCWs found model both ears. weights to be tuned.
PCA produces 400 PCs, and each of the original HRIRs is a weighted
linear combination of the PCs. 3.1. Objective function
2.3. Validation The objective function is comprised of weighted spectral distor-

tion SDl and SDr for each ear for octave bands with center frequen-
A goal of PCA is to keep as much of the variation of the original cies from 31.5 Hz to 16 kHz and additional narrowband weighted
dataset as possible, while minimizing the number of weights SDpinna,l and SDpinna,r in the frequency range that the pinna notch
needed to recreate the original HRIRs. The percent error between typically is found:
the original HRIRs from the database and the HRIRs reconstructed
F ¼ SDl þ SDr þ SDpinna;l þ SDpinna;r ð7Þ
from PCs is
where
PN PM ^
2
j¼1 X i;j X i;j
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
i¼1 u 0 12
% error ¼ PN PM 2 ð4Þ u I
j¼1 X i;j
u1 X jHðf Þj
B C
SD ¼ u
i¼1 i
t @wi 20 log A : ð8Þ
I i¼1 ^
Hðfi Þ
where X ^ contains the HRIRs for each subject and azimuth recon-
structed with k PCs [17]. The total modeling error over all subjects
Weighting factors wi for each band for SDl and SDr are wi = 2,
is 4.8% when 25 PCs are used. Therefore 25 PCs are chosen to repre-
wi = 3, wi = 1 for bands with centers between 31.5 Hz and 2 kHz;
sent the original HRIRs of the 34 subjects, which is comparable to
4 and 8 kHz, and 16 kHz respectively. Bands with center frequen-
the number of PCs used in [17].
cies of 4–8 kHz cover the frequency range 2.8–11.2 kHz and are
The HRIRs of the nine held-out subjects are used for validation.
assigned larger weights owing to spectral characteristics that
These HRIRs are modeled by the 25 PC model, and modeling error
typically occur in this band. The narrowband weighting is 0.5
and percent variation are computed. The HRIRs for the nine sub-
between 4.8 and 12 kHz.
jects are organized into matrix Xval where N = 450 (9 subjects 50
azimuths) and M = 400, and the mean is subtracted to obtain Bval,
with weights given by Wval = BvalV where V is a matrix of the 25 3.2. Numerical tuning procedures and results
PCs found from PCA of the HRIRs of the original 34 subjects. The
reconstructed HRIRs are given by In the numerical optimization experiments, the average of the
25 PCWs for each direction over the 34 subjects are found, provid-
^ v al ¼ W v al V T þ ux
X ð5Þ ing an average HRIR for each direction. These average HTRFs
provide the starting HRTF for tuning, and the tuned HRTF is com-
where u is a 450 1 vector of ones [17]. The total percent error pared with this average. While some authors compare tuned HRTFs
using the 25 PCs is 5.2%. Thus, the PCs found from the HRIRs of to manikin HRTFs as an indication of tuned HRTF performance, we
34 subjects can be used to model subjects that were not included find that there are far more listening errors, particularly front-back
in the PCA. reversals, when listening through a manikin HRTF than when lis-
tening through an average HRTF derived from PCA; hence, compar-
ison of the starting HRTF and the final HRTF are provided here. The
2.4. Spectral distortion
bounds of each weight are set to the average weight ±3 standard
deviations. Each tuning experiment starts with an average HRIR
Truncation in the number of PCs used to model the HRIRs also
for a particular direction and a given CIPIC database subject (from
introduces error in the identified HRTF found using the Fast Fourier
among the nine held out subjects) whose HRTF is to be matched
Transform (FFT). A common measure of error between the identi-
through optimization. In the first experiment, a subset of the
fied and measured HRTF is the spectral distortion (SD) score
average PCWs for a particular direction are tuned by minimizing
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u 0 12 the objective function such that the average HRTF for that direction
u I
u1 X jHðfi Þj C matches the chosen subject’s HRTF for that direction. In the second
u B
SD ¼ t @20 log A : ð6Þ experiment, the average PCWs for a particular direction are tuned
I i¼1 ^ i Þ
Hðf through optimization such that the average HRTF for that direction
matches the subject’s HRTF for that direction reflected to the oppo-
^ i Þ are the frequency response of the original HRTF and
H(fi) and Hðf site side of the head. This simulates how a subject might tune
the HRTF reconstructed from the truncated principal component PCWs to eliminate a front-back confusion. For this numerical
model, and f is the frequency [22]. study, sequential optimization is chosen rather than a nonlinear
Fig. 1. Subject 40 objective function and spectral distortion. Fig. 2. PCW variation in tuning 0° azimuth and elevation for Subject 40.
optimization, in which weights are changed simultaneously,

because in practice, a subject will change one PCW at a time when
tuning PCWs.
The number of PCWs tuned should be minimized so as to
decrease the time and complexity in customizing a VAD Huang
et al. [18] tune the three weights with the largest standard devia-
tions between subjects in the CIPIC database for each elevation.
This method therefore tunes the weights that exhibit the largest
variation between subjects for a given direction. We choose to tune
the five weights with the highest standard deviations. The weights
with the five highest standard deviations at 0° azimuth are, 2, 4, 7,
3, and 8 and are used for both experiments. Sequential tuning of
these five weights is performed on the average HRTF to match
the measured HRTFs of each of the nine validation subjects for 0°
azimuth. Weights are tuned in order, with tuning of each weight
sequentially constituting one ‘‘round’’ of tuning. Additional rounds
of tuning are performed beginning again with weight 2.
Fig. 3. Subject 40 left ear HRTF tuning results.
Following this procedure, at the end of three rounds of tuning,
the CIPIC subject with the smallest objective function is number
40 (the subject with the largest head width). Fig. 1 shows the
objective function and SD variation through three rounds of
sequential tuning for Subject 40, Fig. 2 shows variation in the five
PCWs through each round of tuning and Fig. 3 shows the true
HRTFs (blue1), average HRTFs (red), tuned HRTFs (green), and the
HRTF after tuning the first PCW (purple). The SD as a function of
octave band for the left ear is given in Fig. 4. The objective function
is reduced by 59% over three rounds of tuning. The SD is reduced 47%
and 50%, respectively, for the left and right ears. 49% of the reduction
in SD occurs in the first round of tuning compared to 10% in rounds
two and three. Fig. 3 shows that by changing just one PCW we shift
the pinna notch to more closely match the subject’s actual pinna
notch frequency. Fig. 3 also shows that the basic shape of the left
ear HRTF is matched through sequential optimization and that some
detail and ripple in the original HRTFs is smoothed. Kulkarni and Col-
burn [23] show that these fine details in HRTFs are not important for
sound localization and externalization, by smoothing HRTFs using a
truncated Fourier series. The tuning procedure shapes the pinna Fig. 4. Subject 40 left ear spectral distortion tuning results.
notch for the left ear to better match the pinna notch for Subject
40 in both frequency and magnitude. Fig. 4 also shows that SD
decreases in most bands through tuning. A large reduction in SD is 36% over three rounds of tuning, smaller reductions in the SD of
seen in the octave band centered at 8 kHz for both ears. each ear result, compared with Subject 40. Subjects 162 also saw
Of the nine validation subjects, Subject 162 has the largest most of the reduction in the objective function and SDs in the first
objective function after three rounds of tuning. This subject has round of tuning. The optimization procedure has more trouble
large pinna. While the objective function for this subject decreases matching the HRTFs of Subject 162. This is attributed to the obser-
vation that the 0° azimuth and elevation HRTFs of this subject
1
For interpretation of color in Fig. 3, the reader is referred to the web version of shows notches at other frequencies in addition to the pinna notch.
this article. Nonetheless, both ears show large reductions in SD over the octave
band with a center at 8 kHz. For the left ear, a large reduction in
spectral distortion is also seen in the 4 kHz octave band. For the
remaining seven subjects the average percent reduction in the
objective function was 41% with a range of 19–58% reduction.
To simulate tuning out a front-back confusion, sequential opti-
mization is performed starting with the average weights for 0° azi-
muth and trying to match the 180° azimuth HRTF for each of the
nine subjects. CIPIC Subject 155 (average pinna size) shows the
best results, with an initial value of the objective function of 29.2
reduced by 58% to 12.1 at the end of three rounds of tuning. The
initial left ear SD is 8.2 dB and it is reduced 52% to 3.9 dB, and
for the right ear the initial SD is 7.1 dB and it is reduced by 46%
to 3.9 dB. Subject 155 also saw most of the reduction in the objec-
tive function and left and right ear spectral distortions in the first
round of tuning. Of the nine validation subjects, Subject 50 (small-
est chin-to-shoulder measurement) has the largest objective func-
tion after three rounds of tuning. Figs. 5 and 6 show Subject 50’s Fig. 6. Subject 50 left ear spectral distortion tuning results.
results for tuning out a reversal. The objective function is reduced
by 29% with 28% of the reduction in the objective function a result
of tuning the first weight. The initial SD for the left ear is 11.5 dB average in the optimization study, we restrict the sensitivity anal-
and it is reduced 30% to 8.1 dB, for the right the initial is 10.8 dB ysis to this single PCW and determine if changes in SD that result
and it finished at 7.3 dB (32% reduction). Fig. 5 shows that sequen- from numerical experiments are perceivable by a listener. The azi-
tial optimization is able to capture the first notch, while a second muth included in this experiment are 0°, 180°, 45°, and 45°. At
notch at higher frequencies is smoothed out. The spectral distor- each direction, the PCW with the highest standard deviation is
tion for the left ear is reduced or approximately the same in all modeled as PCW ± DPCW, where DPCW = 0 corresponds to the
octave bands. For the remaining seven subjects the average percent average weight. We determine the change in spectral distortion
reduction in the objective function is 37% with a range of 25–54%. that is perceivable by a listener, SDD,
These experiments are performed at all 50 azimuths in the hor-
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
izontal plane. In tuning HRTFs at a specific azimuth, for the nine u 0 12
u
validation subjects, the first round of tuning produces the largest u1 X I
jHðfi Þj C
u B
reduction in the objective function compared to rounds two and SDD ¼ t @20 log A ð9Þ
I i¼1 H^ D ðfi Þ
three on average over all held out subjects at all 50 azimuths; there
are five directions for the left ear and seven directions for the right
out of 50 that have larger reductions in spectral distortion in tun- where H(fi) is the response of the average HRTF from the principal
ing rounds two and three. Similar results are obtained for the sec- component model with 25 PCs and H ^ D ðfi Þ is the response of the
ond experiment. These results indicate that a sequential tuning HRTF from the principal component model with the PCW with
approach in which a few PCWs are varied provides significant the highest standard deviation increased or decreased by DPCW.
reduction in spectral distortion and that some of this reduction is Five subjects, two female and three male, participated in this
from matching the pinna notch of an average HRTF to that of an experiment. Using bursts of white noise shaped by the PCW-mod-
individual. eled HRTFs and played through headphones (Beyerdynamic DT 990
Prior to human subject tuning, a test of the smallest variation in Pro with a Beyerdynamic A1 headphone amplifier), the subject
a tuned weight, DPCW, that is noticeable to a human listener is determines a positive and negative DPCW with the smallest mag-
conducted. The experiment evaluates the sensitivity of changes nitude at each azimuth for which he or she can hear a difference
in the average HRIR PC model to variations in the PCW with the between the average and modified HRTF. The subject types a
highest standard deviation. Since tuning the first weight in the first DPCW into a text box and then compares listening with the DPCW
round of tuning provides a large change in spectral distortion on to the original HRIR by toggling between the two HRIRs. If a differ-
ence is not heard, DPCW is increased and if a difference is heard
DPCW is decreased until the listener finds the smallest value for
which a difference is perceived. The headphone transfer function
was measured on eight subjects in [24]. The frequency response
was flat within ±3.6 dB for the left ear and ±2.6 dB for the right
ear between 117 and 4641 Hz across these eight subjects. Because
of this, a headphone compensation filter is not used in our study.
Table 1 shows the perceivable changes (positive and negative) in
the PCW with the largest standard deviation averaged over five
subjects along with the associated spectral distortion. On average
the 0° azimuth requires the largest positive change in PCW and
45° (right side of the head) requires the smallest change in PCW
and results in the largest average SDD for the left ear. Changing
PCW 3 for an azimuth of 45° (left side of the head) results in
the largest average SDD for the right ear.
These values are compared to the numerical optimization
results for the nine held out subjects from the CIPIC database for
the first numerical experiment. Because the PCWs are tuned to
match a specific HRTF and minimize the objective function, held
Fig. 5. Subject 50 left ear HRTF tuning results. out subjects can have either positive or negative DPCW and thus
Table 1
Average perceivable differences in HRIRs from subject testing.
Azimuth (°) PCW Avg DPCW rDPCW Avg SDl,D (dB) Avg SDr,D (dB)
0 2 0.15 0.10 1.7 1.7
0.11 0.07 1.1 1.2
180 6 0.10 0.02 1.7 2.1
0.18 0.09 2.9 2.7
45 3 0.13 0.09 0.8 3.7
0.09 0.07 0.6 2.9
45 4 0.08 0.05 3.2 0.8
0.08 0.03 3.1 0.8
subjects are separated into two groups for positive and negative
values of DPCW. Table 2 shows weights with the highest standard
deviation at each azimuth, the average DPCW for each group and
the number of subjects in each group. Comparing Table 2 to Table 1,
for each azimuth the average DPCW for the nine CIPIC subjects Fig. 7. Full and reduced order average left ear HRTFs at 0° azimuth and elevation.
(column 4, Table 2) is larger than the average DPCW for which
the five subjects can perceive a change in the HRIR (column 3,
Table 1). Thus, changes in the first PCWs tuned in the optimization interpolation is used. The value of the objective function (Eq. (7))
experiments should be perceivable by listeners. Also, changes in is calculated to compare the interpolated, average, and tuned
PCWs that occur in later rounds might not be perceived by a HRTFs to the measured HRTFs of the nine held out subjects from
listener. Fink [25] provides additional results of these experiments. the CIPIC database as a function of azimuth in [25]. Using tuned
After tuning, a reduced order HRTF is identified. The tuned HRTFs at every azimuth provides the lowest value of the objective
HRIRs are FIR filters and are transformed to a balanced state space function; this method, however, is not practical. The average HRTFs
form that orders states according their contribution to the system have a larger value of the objective function for azimuths in the
response [26,27]. The Hankel singular values (HSVs) give insight front of the head, than do the interpolated HRTFs. The interpolated
into the order of the reduced system [26,27]. We reduce the HRTFs provide a good compromise between tuning all locations
200th order FIR filters to 15th order IIR filters off-line after tuning and using average HRTFs.
to reduce the size of the VAD. Fig. 7 shows the full order average
left ear HRTF found by averaging the PCWs of the 34 subjects at
4. Subject testing methods and results
0° along with the reduced 15th order IIR HRTF. This figure shows
that reduced order modeling is able to match the original pinna
This section presents the procedure and results of subject test-
notch. Model-order reduction is applied to the average HRTFs at ing performed to validate the modeling and tuning techniques. The
all locations in the horizontal plane. The average spectral distortion
numerical optimization experiments show that PCWs can be tuned
over all 50 azimuths is 3.0 dB for the left ear and 2.9 dB for the sequentially to reduce errors in the HRTFs of subjects from the
right ear.
CIPIC database and that tuning PCWs can change spectral charac-
A listening experiment is conducted to determine if subjects can
teristics of the HRTF. Subject testing determines whether listeners
perceive a difference between the full and reduced order models.
have fewer localization errors with tuned HRTFs than with the
Four subjects participated, each listening to a minimum of four
average HRTFs after following a sequential tuning procedure. The
azimuths. All reported hearing no difference between the FIR and
tests also investigate how subjects perceive the tuning interface,
IIR models.
as well as the number of variables (PCWs) to tune and their limits.
Owing to the large number of PCWs and their potential ranges, it is
3.3. Interpolation of HRIRs not clear from the outset of subject testing exactly how many
PCWs should be tuned, and over what allowed range. Shin and Park
Tuned HRIRs at ten azimuths in the horizontal plane are used [16] allows subjects to tune five PCWs, and [18] allows subjects to
to generate custom HRTFs at the other 40 azimuths where HRIRs tune three. The numerical analysis of Section 3 provides some
are measured in the CIPIC database. Interpolating between tuning insights; based on these experiments, tuning of the PCW with
locations is based on a method that allows the HRIRs to be the largest standard deviation between subjects provides the larg-
interpolated at lower spatial resolutions than typical methods by est change in spectral distortion for most, but not all, held out sub-
interpolating the PCWs [28]. Because different PCWs are tuned at jects; additional gains are seen when tuning additional parameters,
each azimuth, sometimes we interpolate between two tuned but with diminishing returns. Through a number of variants of the
PCWs and other times between a tuned and average PCW. Linear experiments derived with feedback from participants, we evolve a
Table 2
Average changes in HRIRs from numerical optimization experiments after tuning the PCW with the highest standard deviation in the first round.
Azimuth (°) PCW #sub Avg DPCW rDPCW Avg SDl,D (dB) Avg SDr,D (dB)
0 2 5 0.79 0.26 4.7 5.3
4 0.62 0.33 4.6 5.5
180 6 4 0.30 0.28 3.6 3.8
5 0.39 0.13 4.5 4.3
45 3 5 0.34 0.08 1.7 6.5
4 0.14 0.12 0.9 3.8
45 4 4 0.31 0.12 7.1 2.1
5 0.31 0.11 6.6 1.9
tuning procedure in which both the number of PCWs and the range tuning. Note that the five PCWs with the highest standard devia-
over which they are tuned provides statistically significant reductions differ for each azimuth. Participants are instructed to tune
tion in listening errors. We report the results of each variant of the the azimuths in front-back pairs. The limits on the PCWs are set
tuning experiments here. to the average PCW for that azimuth ±3 standard deviations as in
[17]. Subjects move the PCW sliders sequentially until what they
4.1. Procedure hear sounds like it is coming from the given direction. Once the
subject is satisfied with tuning a particular azimuth, it is then
The experiments present stimuli filtered with modeled HRTFs tested by playing each stimulus as unfiltered sound to clear the
through headphones to a listener. Beyerdynamic DT 990 Pro head- listener’s memory and then playing the tuned azimuth. The subject
phones with a Beyerdynamic A1 headphone amplifier present the can re-tune and test that direction again if desired. Tuning can be
stimuli to the listener. Two sources are used, bursts of white noise performed with either or both sources and the subject tests their
and a recording of a female voice. In general, in the tuning phase, tuned HRIRs with both sources. The process is repeated until all
subjects listen to a source through the PC modeled average HRIR ten directions are tuned. After tuning, the subject plays all 10
at a known azimuth and are asked to move sliders in a GUI control- directions in order around the head using both white noise and
ling a subset of PCWs and another slider controlling the interaural the voice recording. If any source does not sound as if it originates
time delay to make the source sound like it is coming from the from the given direction, he or she re-tunes that direction and
given azimuth. Fig. 8 shows the GUI from the first variant of the repeats playing all directions.
tuning experiments, and GUIs for remaining experiments are sim- The second part of the experiment is completed on a different
ilar, with buttons and sliders to aid in navigating through tuning day, typically within a week of tuning. Before this experiment,
parameters. Fink [25] provides additional details The PC model of reduced-order HRTFs are identified as described in Section 3. At
the average HRIRs is created in Simulink. The model is compiled the start of the experiment the subject adjusts the volume to a
onto a dSPACE DS1103 PPC Controller Board and the user tunes comfortable level for both the white noise and the voice recording.
the PCWs through a GUI made in ControlDesk. After tuning, Three tuned azimuths, 80°, 0°, and 80°, are played as references
reduced-order tuned HRTFs are identified using the procedure using both stimuli. After the references are played, unknown azi-
described in Section 3. On another day, a listening test is performed muths are presented to the subject in a semi-random order so that
using both the average HRTFs and tuned HRTFs. Additional details each azimuth is played the same number of times and the same
regarding the user interface and GUIs constructed to support tun- azimuth is not played twice in a row in the same trial; these details
ing are described in [25]. While the overall experiment remained are unknown to the subjects. In total, 12 trials, each consisting of
consistent throughout testing, specifics evolved with input from 26 or 27 directions are presented to the listener. Six trials use
subjects, e.g., the number of PCWs available to tune and their the tuned HRTFs and six use the average HRTFs. The trials alternate
ranges, in order to investigate human factors associated with tun- between using the white noise and voice recordings. There are two
ing and effectiveness of the tuning procedure. semi-random orders of directions, and one of the two is randomly
In the first variant of the experiment, listeners tune ten assigned to the subject. Each azimuth is played 32 times, 16 with
azimuths (0, 45, 80, 100, 135, 180, 135, 100, 80, 45 degrees) the tuned HRTFs, and 16 with the average HRTFs (eight with white
with positive azimuths on the right side of the head. Note that noise, eight with the voice recording). During the test, a stimulus
additional azimuths could be tuned to provide less sparse from each direction plays for three seconds followed by five sec-
azimuthal sampling, but the objective here is to identify a tuning onds of silence in which the listener records the perceived direc-
procedure and not to build the full virtual auditory display. tion on a response sheet as a forced choice between the ten
Subjects adjust the volume to a comfortable level using the head- azimuths. Repeat listening of a direction after the three second
phone amplifier and then tune the first five PCWs sequentially in playing time is not allowed. Two measures of performance are
order of descending standard deviation and use the ITD for fine derived from the data. The azimuth angle error is the absolute
Fig. 8. Graphical user interface for subject tuning experiments one and two.
difference between the perceived and actual azimuth. The front- 1 and 2 respectively. The average time to tune was 23.7 min. Sub-
back confusion (FBC) rate is defined as the number of FBCs over jects E, F, and I made significant improvements in the overall FBC
the number of responses, where a front-back confusion occurs if rate with the tuned HRTFs, while for Subjects J through M, the
the azimuth angle error is lower when the perceived angle is overall FBC rate remained approximately the same between the
reflected 180° to the other side of the head. tuned and average HRTFs with statistically significant decreases
Some subjects who participated in these tests may be better lis- in front-back confusion rate for one to two azimuths. The listening
teners to than others, but we are not able to measure a person’s test results of Subject E are shown in Fig. 11. This subject had the
inherent ability to localize sound. Such tests can be performed in second lowest FBC rate overall with the tuned HRTFs of the sub-
an anechoic room with loudspeakers. jects who participated in this experiment. Fig. 11 shows that most
errors were on the left side of the head, specifically at 135° and
4.2. Results 100° which saw statistically significant decreases in the front-
back confusion rate below the 1% level through tuning. Azimuths
This section summarizes results of subject tuning of HRTFs for 0° and 180° also saw statistically significant decreases in the FBC.
four variants of the tuning experiments described above. Five sub- Because this subject also participated in Experiment 1, we can
jects (A–E) participated in the first variant of the experiment in compare the errors using the tuned HRTFs and average HRTFs from
which five PCWs were tuned for ten azimuths taking between 20 each experiment. Comparing the FBC rates with the tuned HRTFs
and 70 min to tune. Subject B had the lowest FBC rate after tuning, from Experiment 1 and 3, the overall front-back confusion rate
with results shown in Fig. 9. The size of the square is proportional with tuned HRTFs was 32.5% in Experiment 1 and 22% in Experi-
to the number of times the subject indicates that response. The line ment 3. The composite results of all subjects in Experiment 3 are
with the positive slope shows a perfect response and the two neg- given in Fig. 12 where the largest square is 88 responses. The
atively sloped lines show FBCs. Fig. 9 shows a large confusion at FBC rate is lower overall and for both the noise and voice sources.
100° that is reduced through tuning. Listening errors for ±45° The FBC rate is lower at 135°, 100°, 0°, 135°, and 180° below the
sources also decrease with the tuned HRTFs. The composite listen- 1% level.
ing results of all five subjects who participated in this experiment While eliminating two of five PCWs being tuned in Experiment
show an overall FBC rate with the average HRTFs of 37% and with 3 reduced tuning complexity, some subjects felt that at certain
the tuned HRTFs 28.8%. This is a statistically significant decrease in directions more weights or a larger range over which to change
FBC with the tuned HRTFs. weights would have helped them tune certain directions. In Exper-
Based on feedback from Subjects A–E, a second variant of the iment 4, the tuning interface GUI remains the same as Experiment
tuning experiment was conducted in which all ten tuned azimuths 3 with sliders for three PCWs and the ITD, with the addition of two
were played as references at the start of the experiment and at the extra buttons. One button is called ‘‘Wider Limits’’ which increases
start of each trial instead of three tuned azimuths. Three subjects the range over which the three PCWs can be varied from ±3 to ±5
(F–H) participated in this experiment, averaging 21.3 min to com- standard deviations. The second button is called ‘‘More Sliders’’.
plete the tuning. Subject H had the largest change in overall FBC This button brings up a new window with five additional sliders
rate. Fig. 10 shows the results of Subject H’s listening test. This sub- for the PCWs with the next highest standard deviations at each azi-
ject had difficulty localizing the exact azimuth of the source, and muth. The limits on these sliders are again the average PCW ± 3
there is a large confusion at 0°. With tuned HRTFs listening errors standard deviations. With this experiment subjects are instructed
are reduced. The overall FBC rate is 46.3% with average HRTFs and to use the three main sliders first and use the two other options
30.6% with the tuned. if they felt they needed to make additional changes. After tuning,
In the third variant of the experiment the number of weights subjects also had the opportunity to test the interface by playing
being tuned is reduced to the PCWs with the highest three stan- the ten azimuths tuned plus 40 found by interpolation, as
dard deviations at each direction, because some subjects felt that described in Section 3.
there were too many parameters to change at each direction. Also, Nine subjects participated in this final tuning experiment, with
Subject H who participated in Experiment 2, found that at each four of these subjects having participated in a previous experi-
direction, only one or two PCWs were used to tune the HRTFs. ment. Subjects N–R tuned for the first time and Subjects C, I, J,
All other aspects of this test remained the same as in Experiment and K tuned for a second time. The average tuning time was
2. Seven subjects participated in this experiment, Subjects I 28 min. Fig. 13 shows the listening results of Subject I for this
through M, and Subjects E and F who participated in Experiments experiment. With the average HRTFs at 0° this subject perceived
Fig. 9. Subject B (Experiment 1) listening tests (left) average HRTFs and (right) tuned HRTFs.
Fig. 10. Subject H (Experiment 2) listening results (left) average HRTFs and (right) tuned HRTFs.
Fig. 11. Subject E (Experiment 3) listening results (left) average HRTFs and (right) tuned HRTFs.
Fig. 12. Composite listening results for Experiment 3 (left) average HRTFs and (right) tuned HRTFs.
many of these azimuths at 180° and was able to localize all 0° azi- consistent and accurate with their responses using tuned HRTFs in
muths with the tuned HRTFs. The FBC rate was reduced with tuned this experiment.
HRTFs at 135°, 80°, 0°, 100°, and 180°. Comparing these results Fig. 14 shows the listening test results for all subjects who par-
with tuned HRTFs in Experiments 3 and 4, azimuths ±45° showed ticipated in Experiment 4, where the largest square is 115
statistically smaller FBC rates in Experiment 4 than in Experiment responses. Tuning performance improved the most at ±135°, 0°
3. These two azimuths were indicated as being the worst from and 45° where there were statistically significant decreases in
Experiment 3. The azimuth perception errors were lower overall front-back confusion rates. This is the only experiment in which
in Experiment 4. This indicates that this subject was getting more statistically significant changes in both front back confusion and
Fig. 13. Subject I (Experiment 4) listening results (left) average HRTFs and (right) tuned HRTFs.
azimuth perception with tuned HRTFs were seen. In other experi- the sound direction given a target head direction relative to the
ments, statistically significant decreases were seen in one of these source. For example, in one practice case the source is coming from
error metrics, but not both. an azimuth of 135° and the listener is asked to make it sound like
Table 3 summarizes the options for each of the four tuning 45°. In this case, the correct heading is 90°. The subjects are
experiments. given as much time as needed for practice.
As the subject moves the knob, sound filtered through HRTFs
5. Procedure and results of using the VAD to complete a task are presented to the listener and change based on the heading
and source direction. When the heading is the same as the source
5.1. Procedure direction, this is as if the listener is facing the direction of the
source and the 0° HRTF is used. Using the interpolation method
With the customized VAD and interpolated HRTFs, we investi- presented in Section 3 we create HRTFs at the 50 locations in the
gate completion of a steering task. The task involves ‘steering’ to horizontal plane where the HRTFs in the CIPIC database are mea-
locate a sound source and simulates head movement relative to a sured. Linear interpolation is used on the ITD to allow us to esti-
source when a sound source is presented. A GUI with a knob mate the ITD at each of the 50 azimuths in the horizontal plane.
enables listeners to change a simulated head direction or ‘‘head- Linear interpolation of the ITD also allows us to generate ITDs
ing’’ shown by an arrow. The initial heading always starts at 0° between measurement locations. The ITD changes between each
and the sound is coming from one of the ten locations tuned in of the 50 azimuths in the horizontal plane. The HRTF remains the
the previous experiments. This detail is unknown to the subject. same until it crosses the midpoint between two azimuths, and then
At the start of the experiment the subjects are told that the sound it switches to the next direction. Because of how the HRTFs in the
could be coming from anywhere in the horizontal plane, while CIPIC database are measured, there is no change in ITD or HRTF
their head is facing 0°. The subject is asked to use the GUI to virtu- between ±80° and ±90° or between ±100° and ±90°. These locations
ally turn their head to make what they are hearing sound like it is are the only azimuths in the horizontal plane where the ITD and
coming from one of four directions, 45°, 0°, 100° or 180° by using HRTF do not change.
the knob to change the simulated heading. During the experiment, forty directions are presented, with
Before the experiment begins, subjects familiarize themselves each of the ten tuned azimuths presented four times (once for each
with the interface and the task by listening to five stimuli from of the four designated directions). The initial heading is always 0°.
known azimuths. For these practice directions subjects are told The designated direction is always 0°, 180°, 45°, or 100°.
Fig. 14. Composite listening results for Experiment 4 (left) average HRTFs and (right) tuned HRTFs for all nine subjects.
Table 3
Summary of tuning experiment variants.
Experiment Number of subjects Number of PCWs and range Notes

1 5 5, ±3r 80°, 0°, and 80°, are played as references prior to measuring listening results
2 3 5, ±3r All 10 azimuths are played as references prior to measuring listening results
3 7 3, ±3r All 10 azimuths are played as references prior to measuring listening results
4 9 3, ±3r, with access to 5 additional All 10 azimuths are played as references prior to measuring listening results
PCWs and ±5r limits
Fig. 15. Listening results of (left) Subject V (unturned VAD) and (right) subject E (tuned VAD).
Occasionally the listener might not need to make any changes in spired for this subject, while the other five subjects had less time
the heading. For each trial, the location of the arrow, or ‘‘head between the testing and tuning.
direction’’ is recorded on a response sheet and then the next direc- Overall, the average heading error of the subjects who did not
tion is played. The subject can take as much as needed to do the tune was 33° compared to 21° for subjects who used a tuned
experiment and can take a break part way through the experiment. VAD. The error for the subjects who tuned is lower with a p-value
For each of the 40 directions, the heading error can be calculated as of 0.01. For the subjects who used an average VAD the percentage
the difference between the true heading and head direction. It is of errors above 20° was 39.3% compared to 30.0% in the group of
always between 0° and 180°. The percentage of ‘‘large heading subjects who tuned before. This is lower with a p-value of
errors’’ is calculated as the number of directions that have heading 0.0007. These results show that, while most subjects who had
errors greater than 20°. not tuned a VAD are able to complete the steering task to some
extent, the average errors in the group of subjects who used a per-
sonalized VAD are lower. Overall, this task is relatively easy for
5.2. Results
subjects to perform because as the knob moves the subjects can
use other cues besides a stationary HRTF to localize the sound. If
13 subjects participated in this experiment, six who tuned the
there is a reversal with the HRTF at one azimuth, this might be
VAD (C, E, F, H, I) and seven who had never participated in an
resolved with the HRTF at the adjacent azimuth. To accurately
experiment before (S-Y). These subjects took an average of
steer a sound source, tuned HRTFs can be used to reduce errors.
24 min to complete the test, including the practice directions.
The subjects who had tuned used the VAD they created in the pre-
vious experiment and the other seven used average HRTFs. 6. Conclusion
Of subjects using average HRTFs, Subject V had the lowest head-
ing error and percentage of errors above 20°. Fig. 15a shows Sub- The numerical optimization results indicate that sequential
ject V’s results with true heading on the x axis and selected tuning of PCWs is a reasonable process by which a human may
heading on the y axis. Perfect responses lie on the dotted line. individualize a VAD and that tuning a single PCW can modify spec-
The average heading error for this subject is 13°, and 20% of tral features of the HRTF. Results also show that the method of
responses have heading errors greater than 20°. Subject T had identifying HRTFs from HRIRs can create a reduced order VAD with
the highest average heading error at 93° and percentage of errors no audible difference by the listener. Subject testing results show
above 20° (95%). Subjects C, E, F, and I all tuned twice and their that localization errors are reduced after tuning. The progression
VADs from the second tuning are used in this experiment. The sub- of subject experiments shows that a tuning interface that presents
ject who has the lowest heading error and percent errors above 20° options for a larger tuning range for the first three PCWs’ deviation
is Subject E. The heading error is 6.6° and the percent above 20° is and an interface that provides sliders for additional PCWs with the
2.5%. Fig. 15b shows the listening results of this subject and shows highest standard deviation permits tuning out front-back reversals
that only one direction was reversed, either as an error in indicat- for many subjects without an overwhelming number of parame-
ing direction on a response sheet or a listening error. The subject ters tuned; in the final tuning experiments, the average tuning
who performed the worst among those using a tuned VAD is Sub- time for ten azimuths was 28 min, and the final experiment, which
ject H. The heading error is 52.4° with 75% of errors greater than provided options for tuning additional PCWs beyond the first three
20°. A period of six months between the tuning and testing tran- and for tuning PCWs over a range of five standard deviations
showed statistically significant reduction in both front-back confu- [12] Kistler DJ, Wightman FL. A model of head-related transfer functions based on
principal components analysis and minimum-phase reconstruction. J Acoust
sion rate and azimuth perception error. Results also show that a
Soc Am 1992;91(3):1637–47.
tuned VAD can be used to complete a steering task more accurately [13] Middlebrooks JC, Green DM. Observations on a principal components analysis
than with a VAD based on average HRTFs. of head-related transfer functions. J Acoust Soc Am 1992;92:597–9.
[14] Algazi VR, Duda RO, Thompson DM, Avendano C. The CIPIC HRTF database.
IEEE workshop on appl. of sig. proc. to audio and acoust; 2001.
[15] Xu S, Li Z, Salvendy G. Individualized head-related transfer functions based on
population grouping. J Acoust Soc Am 2008;124(5):2708–10.
Acknowledgement
[16] Shin KH, Park Y. Enhanced vertical perception through head-related impulse
response customization base on pinna response tuning in the median plane.
This research is supported by the Air Force Office of Scientific IEICE trans. fundamentals 2008; E91-A(1): 345–56.
Research under award No. FA9550-08-1-0366. [17] Hwang S. Modeling, customization, and interpolation of head-related impulse
responses based on principal components analysis,’’ PhD thesis, Korea
Advanced Institute of Science and Technology; 2009.
[18] Hwang S, Park Y, sik Park Y. Modeling and customization of head-related
References impulse responses based on general basis functions in the time domain. Acta
Acustica United Acustica 2008;94:965–80.
[1] Wightman FL, Kistler DJ. Headphone simulation of free-field listening II: [19] Hwang S, Park Y. HRIR customization in the median plane via principal
psychophysical validation. J Acoust Soc Am 1988;85(2):868–78. components analysis. In: AES 31st International Conference; 2007.
[2] Wightman FL, Kistler DJ. Headphone simulation of free-field listening II: [20] Hwang S, Park Y. Interpretations on principal components analysis of head-
stimulus synthesis. J Acoust Soc Am 1988;85(2):858–67. related impulse responses in the median plane. J Acoust Soc Am 2008:123.
[3] Zotkin DN, Duraiswami R, Davis LS. Rendering localized spatial audio in a [21] Hwang S, Park Y, sik Park Y. Modeling and customization of head-related
virtual auditory space. IEEE Trans Multimedia 2004;6(4):553–64. transfer functions using principal component analysis. In: International
[4] Freeland FP, Biscainho LP, Diniz PR. Efficient HRTF interpolation in 3d moving conference on control, automation and systems; 2008.
sound. In: AES 22nd Intern’l Conf on Virtual, Synthetic and Entertainment [22] Inoue N, Kimura T, Nishino T, Itou K, Takeda K. Evaluation of HRTFS estimated
Audio; 2002. using physical features. Acoust Sci Tech 2005;26(5):453–5.
[5] Wenzel EM, Foster SH. Perceptual consequences of interpolating head-related [23] Kulkarni A, Colburn HS. Role of spectral detail in sound-source localization.
transfer functions during spatial synthesis. In: Proceed. of IEEE workshop on Nature 1998;396:747–9.
appl. sig. process. to audio and acoustics; 1993. [24] Wightman FL. Headphone compensation filters. Personal communication;
[6] Wenzel EM, Arrude M, Kistler DJ. Localization using nonindividualized head- 2011.
related transfer functions. J Acoust Soc Am 1993;94(1):111–23. [25] Fink K. Modeling and individualization of head-related transfer functions
[7] Middlebrooks JC. Virtual localization improved by scaling nonindividualized using principal component analysis, Ph.D. Thesis, Dartmouth College; May
external-ear transfer functions in frequency. J Acoust Soc Am 1999;106(3): 2012.
1493–510. [26] Beliczynski B, Kale I, Cain GD. Approximation of fir by IIR digital filters: an
[8] Zahorik P, Bangayan P, Sundareswaran V, Wang K, Tam C. Perceptual algorithm based on balance model reduction. IEEE Trans Sig Proc Let
recalibration in human sound localization: learning to remediate front-back 1992;40(3):532–42.
reversals. J Acoust Soc Am 2006;120(1):343–59. [27] Mackenzie J, Huopaniemi J, Valimaki V, Kale I. Low-order modeling of head-
[9] Hartmann WM, Andrew Wittenberg. On the externalization of sound images. related transfer functions using balance model truncation. IEEE Signal Process
J Acoust Soc Am 1996;99(6):3678–88. Lett 1997;4(2):39–41.
[10] Jackson JE. A user’s guide to principal components. John Wiley and Sons Inc.; [28] Hwang S, Park Y, sik Park Y. Customization of spatially continuous head-
2003. related impulse responses in the median plane. Acta Acustica United Acustica
[11] Jolliffee IT. Principal component analysis. Springer-Verlag; 2002. 2010;96:351–63.

Individualization of Head Related Transfer Functions Using Principal Component Analysis - Applied Acoustics - 2015 - Fink, Ray

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Individualization of Head Related Transfer Functions Using Principal Component Analysis - Applied Acoustics - 2015 - Fink, Ray

Uploaded by

Copyright:

Available Formats

Applied Acoustics 87 (2015) 162–173

Contents lists available at ScienceDirect

Individualization of head related transfer functions using principal

1. Introduction (PCA) to reduce dimensionality of head-related impulse responses

2.3. Validation The objective function is comprised of weighted spectral distor-

optimization, in which weights are changed simultaneously,

Experiment Number of subjects Number of PCWs and range Notes

You might also like