You are on page 1of 4

International Workshop on Acoustic Signal Enhancement 2012, 4-6 September 2012, Aachen

DISCRIMINABILITY MEASURE FOR MICROPHONE ARRAY SOURCE LOCALIZATION L. O. Nunes, W. A. Martins∗ , M. V. S. Lima, L. W. P. Biscainho,† Federal University of Rio de Janeiro DEL/Poli & PEE/COPPE Rio de Janeiro, Brazil
ABSTRACT The performance of sound source localization systems based on microphone arrays is dictated by a combination of factors that range from array, source, and environmental characteristics to the nature of the localization algorithm itself. Array geometry is an example of critical feature for source localizability. This paper proposes a numerical measure of the capability of a microphone array with a specific geometry to distinguish a given point in space from its neighbors. Such numerical measure, herein called discriminability index (D), has the interesting feature of taking into account only the effects of array geometry on spatial resolution, thus providing a way of connecting a microphone array geometry to the region of interest. The proposed measure can be particularly useful to help choose an appropriate array geometry when a sound source is confined to a predefined region. Simulation results using the classic SRP-PHAT method are presented for highlighting the correlation between D and the accuracy of the source location estimates. Index Terms— acoustic signal processing, microphone arrays, sound source localization 1. INTRODUCTION In recent years, the interest in using microphone arrays to localize a sound source has grown substantially [1, 2, 3]. The field of sound source localization (SSL) is at the core of many microphone array applications, such as teleconference, high-quality recordings, and computer games [1]. Yet it is still a challenging problem, especially when facing very reverberant rooms and low signal-to-noise ratio (SNR) conditions. In practical implementations, the estimation of the source location depends on the intrinsic features of the source signal, the room characteristics, the quality of the voice-activity detector (VAD) used, the localization method, and other parameters that are usually chosen in an experimental manner like frame duration, sampling rate, spatial grid of search, and array geometry (comprising number, spacing and distribution of the microphones). Due to the existence of so many degrees of freedom it is very difficult to determine what are the major factors for an SSL algorithm to fail. Therefore, decoupling the effects of such aspects/parameters is a major step towards a better comprehension of the SSL problem. In particular, injudicious choice of the array geometry may severely hinder the accuracy of the estimates.
∗ Dr. † The authors would like to thank the funding agencies CNPq and FAPERJ

B. Lee, A. Said, and R. W. Schafer Mobile & Immersive Experience Lab Hewlett-Packard Laboratories Palo Alto, CA, USA

Some previous works [4, 5] have discussed the spatial resolution of microphone arrays. In [4], the spatial observability function (SOF) is used to account for the level of access of a microphone array to different spatial positions. However, the computation of SOF takes into account all degrees of freedom that were previously mentioned. An effort to assess the effects of sampling rate and array geometry upon spatial resolution was done in [5] within a beamformer context. It assumed far-field sources such that source location is determined only by the direction-of-arrival (DOA), and showed that two different DOAs leading to the same array-response vector are indistinguishable. In this paper, the spatial resolution for localizing near field sources associated with a given array geometry is assessed. The proposed approach consists of measuring how distinguishable a spatial point is from its neighbors given a specific array geometry, thus disclosing which positions in space are favored or unfavored by that specific array. Such knowledge can be used, for example, to properly design the array, provided the sound source position is known to be confined to a region of interest. An important feature of the proposed measure is separating the effects of array geometry, sampling rate, and distance between neighboring spatial points on the attainable spatial resolution, disregarding any other degrees of freedom. This paper is organized as follows. Section 2 describes the proposal to assess the spatial resolution of microphone arrays leading to the discriminability index (D). In Section 3, simulation results are used to validate the proposed measure. Section 4 presents case studies that illustrate how the index can provide some control over spatial resolution. The conclusions are drawn in Section 5.

2. MEASURING THE SPATIAL RESOLUTION INDUCED BY ARRAY GEOMETRIES Most practical microphone array localization techniques depend upon computations of time-difference of arrivals (TDOAs) of acoustic signals acquired by microphone pairs. The generalized crosscorrelation (GCC) method [6], for instance, is often used for estimating the TDOA by determining the time-lag which maximizes the cross-correlation between filtered signals from a microphone pair. After the estimation of TDOAs of all microphone pairs, it is possible to find the spatial point that best fits the estimated TDOAs in some predefined sense [7]. Another example is search space-based methods such as the steered response power (SRP) algorithm [8, 9], whose main idea is to steer the array directionality pattern to many directions while searching for the acoustic source position, indicated by the maximum energy of the array combined output signal. The

Martins is also with CEFET/RJ - UnED-NI.

for financially supporting their work. This R&D project is a cooperation between Hewlett-Packard Brasil Ltda. and COPPE/UFRJ, being supported with resources of Informatics Law (no 8.248, from 1991).

sampling frequency is usually equal to that of the source signals. This section presents an algorithmic solution to such a problem which does not depend on the particular source signal. ·) : R3 × V(·) −→ {0. and (iii) mapping the confusion measure to a D. whose geometric center is x. In this case.n [·] is the generalized cross-correlation function for a given TDOA ξm. the proposed algorithm first finds for each microphone pair the proportion of neighbors with the same discrete TDOA. In order to mitigate any unwanted effects. which implies that the position estimate of a source placed at that point x may result in any of the xneig ∈ Vx no matter how good the localization algorithm is. the discriminability map was computed only at the array plane and is shown in Fig. however. In Eq. xneig ) = 1. if ξm. since close spatial points tend to2 yield similar discrete TDOAs. the proposal depends only on the array geometry and other relevant parameters. xneig )5 . Among such parameters. whereas D(x) = 0 % means that all neighboring points are mapped into a single TDOA value. 1].24 cm [10]. fs ∈ R denotes the sampling frequency used to discretize the TDOA. and φm. In order to validate the proposed algorithm.n (x) ∈ Z between the mth and nth microphones. The original source signal consists of 3 sentences spoken by a female speaker. . n). In addition. a single localization method (the SRP algorithm. such as sampling rate and neighborhood definition (with their implied spatial and temporal resolutions). mn ) ∈ R3 × R3 . Vx is composed by the six-connected positions in the center of the faces of this cube. which hints that it is a key parameter for the proposed algorithm. (4) N m. M − 1} and n ∈ {m + 1. 1] defined as 2 3 X 1 X4 1 D(x) = 1 − Cm. Given a microphone pair of indexes (m. a hard-decision criterion may yield a simple countable confusion measure. 3 Microphones are indicated by small circles or x-marks. 0. even after discretization operations. Rather. and considering a predefined neighbor distance.n [ξm. Define the neighborhood Vx of x as the set of points which are close to x in some sense to be discussed later on. then the mapping from spatial points to discrete TDOAs must be “rich enough” in order to allow one to distinguish among distinct spatial points.n (x) = ξm.3 For the 2 We say “tend to” because there are infinite points in a hyperboloid that lead to the same continuous TDOA. with m ∈ {1. It is worthwhile pointing out that fs directly affects the mapping from spatial points to discrete TDOAs. . sampled at 48 kHz with 24-bit precision. However. two times the distance between neighbors) is an important parameter for D defined in equation (4). it is hoped that a source in a position x with high values of D(x) is easier to localize. (1) m=1 n=m+1 where x ∈ R3 denotes a possible spatial position for the acoustic source. Since the ultimate goal of localization techniques is to determine reliable estimates for source locations. M }. . the array geometry plays a central role. The standard mapping from spatial points to TDOAs is given by j ff fs ξm. . 1}. The algorithm comprises the following steps: (i) standard mapping from spatial points to discrete TDOAs. One of the main challenges when deploying microphone arrays is therefore the definition of some related parameters aimed at achieving such a rich mapping from spatial points to TDOAs. The next goal is to quantify the confusion between a spatial point x and its neighbors. 1(a). VALIDATION OF THE PROPOSED ALGORITHM The algorithm proposed in the previous section provides a measure of how difficult it is to localize a given point in space when using a specific array geometry. . described in equation (1)) is executed for two sources at different positions.1 c ∈ R stands for the propagation velocity of the acoustic wave. Intuitively. n). 3. and · represents the standard 2-norm of vectors.n (xneig ) . Different neighbor distances were chosen to further differentiate the two algorithms. operating at a given sampling rate. The microphone positions associated with the pair of indexes (m. One could argue. or localization technique involved. if the proposed measure is reflected by the performance of a given localization technique. a VAD was used to discard speech-free segments of the source signal. . There are a multitude of ways of defining the neighborhood Vx associated with a spatial point x. (2).n (x. This section describes an experiment which corroborates this expectation.n |Vx | x ∈V neig x where |Vx | stands for the number of elements of the set Vx . 9] W (x) = M X M X φm. To this effect. environment. The length of the cube edge (i. averages the results across all microphone pairs.SRP cost function W (·) : R3 −→ R can be written as [8. but can be higher if an interpolation scheme is employed by the SSL method. M ∈ N is the number of microphones in the array. for a given microphone pair. The distance between neighbors was set at 5 cm for the ULA and 10 cm for the USA. 1 This . from the knowledge of the discretized TDOAs. (2) c for each pair of microphone indexes (m. with j Cm. this paper considers a cube in the 3-D Euclidean space.n (·. and a uniform spherical array (USA) with thirteen microphones and aperture of 15. (ii) mapping from discrete TDOAs to a single confusion measure among neighbors. Two very distinct array geometries found in the literature are employed: a uniform linear array (ULA) with five microphones and aperture of 136 cm [2]. The previous reasoning along with expression (1) exemplify the strong dependency of localization techniques on the TDOAs computed for different microphone pairs. otherwise (3) The last step consists of averaging the confusion measure across −1) all points in the neighborhood of x and across all the N = M (M 2 3 microphone pairs yielding D(·) : R −→ [0. depending on in which x ∈ R3 and xneig ∈ Vx ⊂ R3 . and then takes the complement value on the interval [0. the signals were generated considering an anechoic noise-free room. . the problem of measuring how distinguishable a spatial point is from its neighbors for a chosen geometry has not been individually addressed yet. given a point in the Euclidean space. the proposed confusion measure is denoted by Cm. In the case of the linear array.n (x) = round ( x − mm − x − mn ) ∈ Z. The resulting measure 0 ≤ D(x) ≤ 1 can be thought as the chance of discriminating the position x from its neighbors xneig ∈ Vx . In summary.n (x)].e. whereas distant positions tend to yield more different TDOAs. n) are denoted as (mm . The value D(x) = 100 % means that all neighboring points are mapped into distinct TDOA values. . one with high and the other with low D.n (x. For each position x.

regardless of the position.26 43. Discriminability maps obtained in the experiment of Section 3.59 %. Table 1. As can be observed.87. This is reflected by the median values whether they are at the referred height or not. 2(a) and 2(b) show the histograms of location estimation errors4 obtained for each frame related to the two source positions described in Table 1.26 % and 43. 3(a) and 3(b) for the ULA and HNLA. Selected source positions for both array geometries. 0. Fig.50) D(x) (%) 73. two arrays were employed: an uniform linear array and a harmonically-nested linear array [11] (HNLA). With respect to the linear array. two discriminability maps were computed for the XY plane at two distinct heights. 1. .00) (1. for the specific scenarios simulated here. Moreover.57. As can be seen. The objective is to showcase the proposed algorithm.5 0 0 1 2 3 100 80 y (m) 60 40 20 0 3 2. The corresponding array signals were artificially generated and their source location was estimated every frame of 4096 samples with 50 % overlap by the SRP method with a phasetransform (PHAT) [3]. This experiment demonstrates that indeed. 2l.59 spherical array. Number of occurrences Number of occurrences 100 50 0 100 50 0 0 1 2 Bins (error in meters) 3 0 1 2 Bins (error in meters) 3 (a) ULA for position I. For example. 4.6 cm and 94. From these discriminability maps. the majority of frames was estimated with an 1-m error. A great performance difference was also observed for the spherical array whose histograms are shown in Figs. the mode of the histogram for the source position with higher D was much closer to zero than the one obtained for the position with lower discriminability. were selected: one with low and other with high D(x) for each geometry.4 cm and 101.43.7 cm for the positions with Ds of 73.5 2 1.5 2 1. As can be seen.07. l. If one is trying to find out the global behavior of a given array and set of parameters regardless of individual positions.4 cm. Fig.5 1 0. The ULA had a microphone spacing of 7. only the x and y coordinates were considered. 0. Both were chosen to have 11 microphones and an array aperture of 76. for the position with low D (Fig. Index I II III IV x (m) (0. 1. 2. 1. then it is possible to compute an overall measure Do as the mean discriminability across all possible source positions. the overall discriminability obtained for the ULA and HNLA for different sampling frequencies and neighbor distances is shown in Fig. 0.5 1 0. 0. discussing how it can be employed to gather information regarding the array geometry as well as how both the sampling rate and neighbor distance affect the measure.33.5 1 0. Array Geometry Linear (ULA) Spherical (USA) Pos.57. 2(c) and 2(d). shown in Table 1.7 cm for the positions with Ds of 85. For the linear case. 4 The error is calculated as the Euclidean distance between actual and estimated source positions.33 16.5 0 0 1 2 3 100 80 60 40 20 0 x (m) x (m) x (m) (a) ULA (height z = 0 cm).33 % and 16. 8l] for l =2.67. This shows that. 3(c). 100 50 0 100 50 0 of the two histograms: 3. the ULA might be a better choice than the HNLA. 4l. The discriminability maps for both array geometries. the median error found were 18. the ULA achieved a better discriminability than the HNLA for the selected parameters. whereas the microphones in the HNLA follows a non-uniform distribution: [8l. l. Similarly to what was observed in the discriminability maps. The discriminability map displays the measure for every individual positions. respectively. respectively. (b) USA array (height z = 150 cm).50) (1. The first study looks into how the discriminability map changes with the array geometry. as ex- 0 1 2 Bins (error in meters) 3 0 1 2 Bins (error in meters) 3 (c) USA for position III. the ULA has higher overall discriminability than the HNLA for all sampling frequencies and neighbor distances tested.67. CASE STUDIES In this section. Figs. as regards the array discriminability.5 2 y (m) 1. Again. if one is interested in source localization and the source can be located at any position in space. 0. sampling rate of 48 kHz and a neighbor distance of 5 cm are shown in Fig. four source positions. (d) USA for position IV.5 0 0 1 2 3 100 80 y (m) 60 40 20 0 3 2. (c) USA array (height z = 50 cm). respectively.3 2. 4l. some case studies based on the D are conducted. 1(b) and 1(c). all three coordinates were employed.8 cm. Source positions are indicated by small squares. the discriminability of a given position is coherently related to its localizability by a given method.67 85. Histograms of location estimation errors (SRP-PHAT). To this end. l.00) (2. from Do it is possible to observe that increasing the sampling frequency and the neighbor distance improves localizability. l.68 cm. 1. Number of occurrences Number of occurrences (b) ULA for position II. As for the spherical array. 2(d)). as exhibited in Figs.67 %. respectively. 2l.

Zheng. Speech and Signal Proc. vol. 15. 3. Knapp and G.5 1 0. Affes. the effect of different choices of source position neighborhoods should be understood. April 2007.. D only hints whether the interpolation can improve the results. R. A modified version of the method.” in IEEE Int. A. Inc. in order to indicate how reliably it could predict the performance of such methods. on Signal Proc. 6. vol. Adcock.. Moreover. [6] C. 2010. pp. it was shown how the method can be employed to study a given array geometry and the influence of the two parameters in its spatial resolution. 5. no.” Computer Speech and Language. Honolulu. vol. pp. REFERENCES [1] J. 777–786. USA. (c) Overall discriminability. Experimental results show that the output of the proposed method is related to the localizability of a point in space by a specific sound localization method. 2. F. 4–24. In the discriminability maps. “Self-localizing dynamic microphone arrays.” IEEE Trans. [9] J. beside the array geometry: the sampling frequency and the distance between neighboring spatial points. Of course. November 2007. “The generalized correlation method for estimation of time-delay. [2] J. Silverman.” IEEE Trans. pp. on Audio. H. the proposed method considers the use of discrete correlation functions. for SSL methods employing grid searches (such as the SRP). Benesty. [7] M. “Beamforming: A versatile approach to spatial filtering. (b) Discriminability map for HNLA.” IEEE Trans. . Fig.5 0 0 1 2 3 100 80 60 40 20 0 Overall discriminability (%) 100 80 60 40 20 0 0 50 d = 5 cm (ULA) d = 5 cm (HNLA) d = 15 cm (ULA) d = 15 cm (HNLA) 100 150 200 Sampling rate (kHz) 250 x (m) x (m) (a) Discriminability map for ULA. which by its turn inherently depends on the related acquired signals. For example. 8. M. The proposed method depends only on two parameters. actual improvement will depend heavily on the quality of the interpolated cross-correlation.3 2. Benesty. CONCLUDING REMARKS This paper proposed an algorithmic approach to the problem of measuring how difficult it is for a microphone array with given geometry to locate a sound source at an arbitrary position in space. May 2000. and Y. and Renato De Mori. 1976. Do.5 0 0 1 2 3 100 80 y (m) 60 40 20 0 3 2. and M.” IEEE Trans. and Y. P. [5] B. no.. J. no. and Cybernetics–Part C: Applications and Reviews. 1997. 3(c)). Discriminability maps and overall discriminability. Chen.. FL. on Acoustics. Providence. The proposed algorithm can also be used in tandem with a grid search method. pp. DiBiase. November 2002. and H. an interpolation that increases the cross-correlation sampling rate from 16 to 48 kHz can increase the overall discriminability from 20 % up to 50 % (see Fig. indicates that a cross-correlation interpolation scheme may be beneficial [3] and further quantifies how much interpolation is needed for the desired D. pp. 57. [3] H.. [4] P. Springer. the previous example shows that a spatial resolution of 5 cm working with signals sampled at 16 kHz might not be a reasonable set of parameters for the considered array geometries. D. April 1988. 4. on the other hand. Yu. vol. J. vol. [8] Maurizio Omologo. Usually. In fact. the meaning of the neighbor distance and sampling rate for someone studying a given array geometry has to be discussed.D. ASSP-24. no. Germany. April 2009. Speech. thesis. 1383–1395. J. June 2004. As regards next steps.5 1 0. Piergiorgio Svaizer. 4. Orlando. no. Man. Moreover. R. Also.5 2 y (m) 1. 121–124. Ph. Buckley. The sampling rate. A High-Accuracy. RI. 3.” IEEE ASSP Magazine. “A generalized steered response power method for computationally viable source localization. the neighbor distance indicates the desired precision of the SSL algorithm. “A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array. Dmochowski. Brown University. Aarabi. pp. 1. 53. and S. on Instrumentation and Measurement. Silverman. 9. 32. J.” IEEE Trans. Benesty. Spoken Dialogues with Computers. in which case the proposal can provide some clues about which spatial regions can be searched more thoroughly. H. Dmochowski. pp. “Experimental evaluation of a nested microphone array with adaptive noise cancellers. 2510–2526. Acoust. Conf. 1995. vol. “On spatial aliasing in microphone arrays. pected. Low-Latency Technique for Talker Localization in Reverberant Environments. 4. vol. on Systems. for a neighbor distance of 10 cm. then. and Language Proc. the circles indicate the positions of the microphones. the correlation between the output of the proposed method and the error of some SSL algorithms should be quantified. [10] J. At this point. and S. For instance. 320–327. 153–169. El-Tanany. Microphone Array Signal Processing. 5. no. C. USA. Brandstein. Goubran. Carter. Berlin Heidelberg. the neighbor distance might help set the distance between adjacent search-grid positions or some stop criterion for iterative search methods [3]. Van Veen and K. “A practical time-delay estimator for localizing speech sources with a microphone array. Huang.5 2 1. whereas some SSL methods employ its continuous counterpart. Academic Press. vol. Affes. 474–484. could be envisaged by substituting a continuousscale measure for the hard decision-based measure used to compare discrete delays. Speech and Audio Process. pp.. [11] Y. chapter Acoustic Transduction.