You are on page 1of 5



Lalan Kumar, Amish Goel, and Rajesh M Hegde

Indian Institute of Technology Kanpur

Department of Electrical Engineering

ABSTRACT estimates. Increasing the order of the polynomial by increas-

ing the number of sensors with reducing search path, provides
An off-axis pole focusing method for robust Direction of Ar- several off-axis poles. These off-axis pole candidates can be
rival (DOA) estimation is presented in this paper. The roots averaged using a beam narrowing factor (ζ) as a weight to
(poles) of the root-MUSIC polynomial are generally evalu- provide a more accurate pole location or DOA estimate. This
ated over the unit circle in the z-domain. The DOAs can weighted averaging performs well under reverberation and
be computed from these poles using a simple geometric re- reasonably restores the original pole location (DOA) that is
lation. Rather than computing the root-MUSIC polynomial disturbed due to multipath. The rest of the paper is organized
over the unit circle, it can be estimated inside the unit circle as follows. The concept of off-axis pole focusing for DOA
at different radii. Such an off-axis estimation method along estimation is introduced first. A complete description of pole
with averaging shows higher robustness in DOA estimation. focusing algorithm is presented next. Experiments on source
This can particularly provide robustness in reverberant envi- localization and distant speech recognition are then described
ronments where the roots (poles) are displaced due to the ef- followed by the conclusion.
fects of multipath. Experiments on source localization are
conducted to evaluate the proposed method. The robustness 2. THE OFF-AXIS POLE FOCUSING METHOD FOR
of the proposed method under reverberation is illustrated us- DOA ESTIMATION USING ROOT-MUSIC
ing DOA confidence interval plots. Experiments on distant
speech recognition are conducted under reverberant condi- The MUSIC method decomposes the covariance matrix of re-
tions on the subset of spatialized version of TIMIT (S-TIMIT) ceived signals at multiple microphones [1]. In general, for M
and MONC databases. The experimental results obtained for signals acquired by a uniform linear array (ULA) consisting
both source localization and distant speech recognition indi- of N identical sensors where N > M , the MUSIC spectrum,
cate a reasonable improvement over conventional sub-space is defined as
and correlation based methods. −1 H H
PM U SIC (θ) = s (θ)Qn Qn s(θ) (1)
Index Terms— root-MUSIC, Pole focusing, DOA, Off-
axis DOA estimation where
s(θ) = [1 e−jkdsin(θ) e−j2kdsin(θ) · · · e−j(N −1)kdsin(θ) ]T
The computational efficiency of MUltiple SIgnal Classifica-
is the steering vector corresponding to the direction of arrival
tion (MUSIC) method [1], is limited by the requirement of
θ. Note that Qn is N x (N − M ) noise eigenvector matrix,
a large number of sensors to resolve closely spaced sources
k is wave number and d is the distance between two sensors.
under reverberant environments. Alternately the root-MUSIC
The denominator of MUSIC spectrum becomes zero when θ
[2] method is used in this context to obtain the DOA by solv-
is the signal DOA which gives M peaks in the spectrum cor-
ing a polynomial derived from the MUSIC spectrum. In this
responding to M DOAs. Putting z = ejkdsin(θ) in Equation
paper, an off-axis pole focusing [3] method based on root-
2, and simplifying Equation 1, it can be written as [2]
MUSIC is proposed for high resolution DOA estimation un-
der reverberant environments. In the root-MUSIC method, −1 (N −1) −l
the inverse of an all pole MUSIC expression is expressed as a
PM U SIC (θ) = Σl=−(N −1) Cl z (3)
polynomial. The roots (poles) of this polynomial are the DOA
Equation 3 defines a polynomial of degree (2N − 2) with
This work was funded in part by TCS Research Scholarship Program (2N − 2) zeros that form poles for root-MUSIC expression.
under project number TCS/CS/20110191. Both z and z1∗ , with same phase and reciprocal magnitude,

Imaginary Part Est. Poles Imaginary Part Root−MUSIC

Actual pole 0.3
Pole Focusing 0.3
−0.1 −0.1
Real Part 1
1 Real Part
−0.2 0.9 −0.2
0.8 0.8
−0.3 −0.3
Real Part 0.7 Real Part
0.6 −0.4 Imaginary Part 0.6 −0.4
Imaginary Part

(a) (b)

Fig. 3. Magnified polar plot showing (a) all DOA (pole) candidates with sensors varying from from 3 to 12, shown in blue. (b)
comparison of DOA pole estimation from root-MUSIC using 12 sensors (blue), proposed method (brown), actual pole location
(red). Experiments conducted at SNR of 16dB, Reverberation time of 400ms. The sources are at 30◦ , 35◦ .

Fig. 1. Plot showing all roots (poles) obtained using root-

MUSIC at a SNR of 16 dB in a non-reverberant environment Fig. 2. Illustration of the Pole focusing effect using six sen-
using 12 sensors. The sources are at 30◦ , 35◦ . The poles in sors and twelve search paths. Solid arc represents bandwidth
red correspond to the DOAs of different poles separated by δr = .01. The red pentagram
represents actual pole (DOA).
are the poles, one within the unit circle and another outside.
Of the (N − 1) poles within the unit circle, M poles closest 2.1. Off-axis pole focusing under reverberant conditions
to the unit circle, are corresponding to M sources. This is
illustrated in Figure 1. The poles in red, are corresponding to Under reverberation, root-MUSIC poles are dispersed within
the two DOAs. If zm is the root of the polynomial then DOA unit circle. Hence searching the poles within unit circle is the
is given by correct approach. The root-MUSIC polynomial in Equation
3, can be rewritten as
arg(zm )
θm = sin−1 [ ] m = 1 to M (4) −1 −1
+C2 z −2 +· · ·+C2N −2 z −(2N −2)
kd PM U SIC (θ) = C0 +C1 z
arg(zm ) = kdsin(θm ) (5) (6)
In this case the search path is taken over the unit circle. As-
It is clear from the Figure 1 that the poles corresponding to suming z = rejkdsin(θ) , where 0 ≤ r < 1, implies mov-
DOAs, are very close to the unity circle. In highly noisy or ing the search path inward which is equivalent to moving the
reverberated environment these poles are no more close to the poles outwards to a larger radius [4]. Hence, such off-axis
unit circle. They are scattered within the unit circle. The search path for the poles (DOAs in root-MUSIC), will en-
proposed off-axis pole focusing approach addresses this issue hance the pole (DOA) location. This leads to reduction in
of accurate DOA estimation under reverberation errors. Ad- bandwidth of the pole, causing what is called the Pole Fo-
ditionally it addresses the issue of using limited number of cusing Effect. Figure 2 shows that as the search path ap-
sensors to provide higher resolution in DOA estimation. proaches the pole, bandwidth decreases. At each off-axis
search path, increasing the number of sensors along with re- 2.2.3. Estimating off-axis pole (DOA) candidates
ducing the search path radius gives more robust DOA estima-
Starting with the number of sensors for which the DOAs are
tion. This principle is the basis of proposed algorithm. With
resolved and with a search path over the unit circle, the num-
several DOA candidate poles obtained from different sensor
ber of sensors is increased further with reduction in search
data and several off-axis paths, an estimate of DOA is ob-
path radius. This is equivalent to substituting z = rejkdsinθ
tained by weighted averaging of the poles. The beam narrow-
in Equation 6, where 0 < r < 1. Moving search path in-
ing factor (ζ) (defined in Sec 2.2.4) of the poles is used as the
ward results in enhanced pole radius compared to radius of
weight in averaging process. The method improves the reso-
unit circle search path. The radius value starts from unity, de-
lution of the DOA estimates due to the off-axis pole averaging
creasing with a certain step size. In this work the step size is
under reverberation.
taken to be 0.01. It must be ensured that no pole outside the
Figure 3(a) shows all DOA candidate poles plotted for
unit circle is considered for pole enhancement. Since closely
sensors 3 to 12. The plot is at SNR 16dB and very low DRR,
spaced sources are considered herein, all the resolved DOA
2.41dB with equivalent reverberation time as 400ms. Initially,
poles within range of ±10◦ of pivot DOA candidate are taken
with low number of sensors, root-MUSIC is unable to resolve
as DOA candidate poles. The poles obtained in each off-axis
DOAs. It is observed that only one pole lies close to the me-
path are first recorded. Among them, the candidate poles hav-
dian value of the two DOAs under consideration. Such poles
ing argument lower than that of the pivot pole, are considered
can be seen in middle region of actual DOA pole argument,
along with their corresponding radius values for the lower
indicated by the two lines in the figure. On increasing the
DOA. The remaining poles are used for the higher DOA.
number of sensors the DOAs start getting resolved, but their
locations are quite within the unit circle and away from the ac- 2.2.4. Computing the Beam Narrowing Factor (BNF) of the
tual DOAs. By applying the pole focusing method and aver- off-axis pole (DOA) candidates
aging over all resolved candidates thus obtained, the resultant
DOA is more closer to the actual DOA. Figure 3(b) shows that As explained in Section 2, both z and z1∗ are the roots of Equa-
argument (and hence DOA) of the pole resulting from pole fo- tion 6, with same phase and reciprocal magnitude. Hence,
cusing is closer when compared to conventional root-MUSIC. writing these roots as

2.2. Algorithm for DOA estimation using off-axis pole fo- zm = e∓σm T eiωm T
The block diagram of the proposed method to resolve closely ωm T = kdsin(θm ) (7)
spaced sources, is illustrated in Figure 4. A detailed discus-
and T is sampling interval. The bandwidth (BW ) of the pole
will be approximately 2σm [5]. It is related with the magni-
tude (radius) of the pole by the equation

|zm | = e−σm T (8)

The beam narrowing factor for every pole obtained can be

computed as
ζ= (9)
Fig. 4. Block diagram of the Off-axis Pole focusing method BW
where ω is argument of the pole given by Equation 5.
sion on the proposed algorithm including the description of
2.2.5. Computation of final DOA using weighted averaging
the individual blocks is presented in following section.
and scaling
2.2.1. Estimating minimum number of sensors to resolve
ground truth DOAs The weighted averaging is performed with BNF as the weight.
The weights are applied on the argument and radius of the off-
For fewer sensors, root-MUSIC is unable to resolve two axis poles to give
closely spaced sources under reverberant conditions. Hence
the method starts with a minimum number of sensors, sat- Σi ωi ζi Σi ri ζi
ωp = and rp = (10)
isfying the condition N > M . The number of sensors is Σi ζi Σi ζi
increased till the two DOAs are resolved. Up to this number
of sensors, the search path is taken on unity circle.
ωp = Argument of pole using pole focusing
2.2.2. Computing the pivot pole candidate rp = radius of pole using pole focusing
ωi = Argument of ith off-axis pole
All the unresolved DOA pole candidates are averaged as in
ri = radius of ith off-axis pole
Section 2.2.5 to get the pivot DOA pole candidate.
ζi = The corresponding BNF. Three microphones are used in the beginning and then grad-
The final DOA is computed by scaling the estimated argu- ually increased up to twelve. The experiments are performed
ment as in Equation 4. The steps in the proposed method are at different reverberation times (T60). The room impulse re-
enumerated in Algorithm 1. sponse is generated by image method [6] as implemented in
[7]. Figure 5 shows the experimental set-up for the experi-
Algorithm 1 Algorithm for DOA estimation using pole fo- ments on source localization and distant speech recognition.
1: Set-up two closely spaced sources with separation of 5◦ .
2: Initialize P , the number of sensors such that P > M ,
where M = no. of sources, and search path radius r = 1.
3: Compute DOA pole candidate using root-MUSIC with
z = rejkdsinθ .
4: If only one pole candidate is found, P = P + 1, goto step
5: Average all single pole candidates to get pivot pole. Num-
ber of sensors required to resolve ground truth DOAs is
P . Set N = P
6: N = N +1, set r = r−rstep where rstep = .01. Store the Fig. 5. Figure illustrating the experimental conditions
pole candidates with search radius and argument, found
using root-MUSIC with z = rejkdsinθ , if it is within ±10 3.2. Experiments on source localization
of pivot.
7: If N<12, Go to step 6 To check the resolving power of the proposed method with
8: Compute BNF (ζ) = BW ω
, over all off-axis poles. limited number of sensors, a ULA of nine sensors is con-
9: Compute all off-axis poles below and above the pivot pole sidered from the given set-up. The sources spaced apart by
corresponding to two DOAs. 2◦ is used in the experiment. The method is compared with
10: Compute the weighted average on the two sets of Poles, root-MUSIC and root-MUSIC with forward-backward (FB)
as ωp = ΣΣi ωi ζiiζi and rp = ΣΣi ri ζi ζi i smoothing [8]. The number of microphones in subarray for
11: The final DOA is obtained from Equation 4. FB smoothing is five. The result is presented as confidence
interval (CI) plots. Confidence interval (CI) is defined as in-
terval estimate of a population parameter. It is used to in-
3. PERFORMANCE EVALUATION dicate the reliability of an estimate. Figure 6 shows 95%
confidence interval of mean DOA estimate for pole focusing,
In this Section, the performance of the proposed method is root-MUSIC and forward-backward root-MUSIC (FB RM)
evaluated by conducting two experiments. The first experi- at different reverberation time (T60). It is clear from the up-
ment is on source localization. The result is presented as con- per and lower limit of pole focusing confidence interval that
fidence interval (CI) plot. In the second experiment, distant the mean DOA estimate of pole focusing is always closer to
speech recognition experiments in a meeting room environ- actual DOA than that of root-MUSIC or forward-backward
ment are conducted over a microphone array. The results are root-MUSIC. The CI plot is shown for both the sources.
compared to the root-MUSIC and correlation based methods. 3.3. Experiments on distant speech recognition
Table 1. Source and Microphone locations used in the exper- Speaker independent speech recognition experiments were
iments conducted for reconstructed speech acquired over micro-
phone arrays. A filter and sum beamformer was trained using
Source 1 [3.5 2.48 1.4]T Source 2 [3.47 2.45 1.4]T the DOA estimates obtained. The experimental results are
Mic 1 [3.0 2.5 1.0]T Mic 2 [3.0 2.55 1.0]T presented as % word accuracies (WAcc ). The % WAcc is
calculated as
3.1. Experimental Conditions (Wn − (Ws + Wd + Wi ))
W Acc = · 100
The proposed algorithm was tested in a typical meeting room
environments. A room with dimensions, 7.3m×6.2m×3.4m where Wn is the total number of words, Ws the total number
was used in the experiments. The position (with respect to one of substitutions, Wd the total number of deletions, and Wi
of the corners of the room) for two sources and a two element the total number of insertions. To ensure conformity with
Uniform Linear Array (ULA), is shown in table 1. Additional standard databases, we used subset of spatialized version
microphones are placed in similar fashion, while maintain- of TIMIT [9], S-TIMIT data. Additional experiments are
ing linearity and uniform distance between two microphones. also performed on the Multiple Overlap Numbers Corpus
Table 2. Distant speech recognition performance as % accuracy at various reverberation times (T60), MONC I : Isolated digit
performance, MONC II : Connected digit performance.


T60=150 T60=400 T60=150 T60=400 T60=150 T60=400 T60=150 T60=400 T60=150 T60=400 T60=150 T60=400
MONC I 98.4% 94% 93.45% 93.45% 92.88% 92.35% 91.10% 92.35% 88.08% 82.74% 81.49% 79.36% 78.47%
MONC II 88.58% 84.33% 83.89% 84.0% 83.29% 83.37% 83.33% 83.19% 81.45% 75.65% 73.58% 74.72% 70.88%
S-TIMIT 93.27% 91.4% 90.55% 90.8% 90.48% 90.54% 88.7% 85.6% 84.14% 79.56% 76.63% 78.04% 76.47%


[1] R. O. Schmidt, “Multiple emitter location and signal

parameter estimation„” IEEE Transactions on Antenna
DOA1 estimate

and Propagation,, vol. AP-34, pp. 276–280, 1986.


[2] Arthur. Barabell, “Improving the resolution perfor-

mance of eigenstructure-based direction-finding algo-

rithms,” in Acoustics, Speech, and Signal Process-

300 350 400
T60(ms) ing, IEEE International Conference on ICASSP, 1983,
(a) vol. 8, pp. 336 – 339.

[3] G. Duncan and M.A. Jack, “Formant estimation algo-


rithm based on pole focusing offering improved noise

tolerance and feature resolution,” Radar and Signal Pro-

cessing, IEE Proceedings F, vol. 135, no. 1, pp. 18 – 32,
DOA2 estimate

feb 1988.

[4] S. McCandless, “An algorithm for automatic formant


extraction using linear prediction spectra.,” Acoustics,

Speech and Signal Processing, IEEE Transactions on,

300 350 400

vol. 22, no. 2, pp. 135–141, 1974.
Fig. 6. Confidence interval plots of DOA estimation perfor- [5] L R Rabiner and R W Schafer, Digital Processing of
mance of Pole focusing, root-MUSIC and FB root-MUSIC Speech Signals, Springer, 1978.
at SNR of 16 dB and various reverberation time, T60. The
sources are at (a) 30◦ and (b)32◦ . Total number of sensors [6] J.B. Allen and D.A. Berkley, “Image method for ef-
used is 9, sensors in subarray for FB root-MUSIC used is 5. ficiently simulating small-room acoustics,” J. Acoust.
Soc. Am, vol. 65, no. 4, pp. 943–950, 1979.
(MONC) [10]. In MONC database, recognition experiments
were performed for isolated and connected digits. Table 2 [7] EAP Habets, “Room impulse response (rir) gener-
lists the recognition performance of various methods un- ator,”,
der clean and reverberant conditions. The results are also July 2006.
compared with correlation based methods, GCC-PHAT and [8] SU Pillai and Byung H Kwon, “Performance analysis
GCC-ROTH. of music-type high resolution estimators for direction
4. CONCLUSION finding in correlated and coherent scenes,” Acoustics,
Speech and Signal Processing, IEEE Transactions on,
An off-axis pole focusing method using the root-MUSIC is
vol. 37, no. 8, pp. 1176–1189, 1989.
proposed in this work for robust DOA estimation under re-
verberant environments. The pole focusing provides reason- [9] John S. Garofolo, TIMIT Acoustic-Phonetic Continuous
ably more accurate source localization under reverberation Speech Corpus, Linguistic Data Consortium, Philadel-
when compared to conventional root-MUSIC and FB root- phia, 1993.
MUSIC for closely spaced sources. The off-axis DOA esti-
mation with weighted averaging provides robust DOA esti- [10] CSLU, Multi Channel Overlapping Numbers Cor-
mation. An adaptive version of this algorithm for real time pus distribution, Linguistic Data Consortium,
implementation is currently being studied for voiced based
camera steering in meeting rooms.