© All Rights Reserved

17 views

© All Rights Reserved

- homework
- 241508843 00 Calculator Techniques 01
- hw7
- Average Reward Proof
- Particle Size Distribution Example
- spherical coordinate
- Presentation on Pom Manish
- Lecture03_Orthogonal Representation of Signals
- 5-2-5
- lesson 86-steps used in solving equations (2)
- .._TenthClass_BitBanks_MathsEM_2-Functions
- AM Paper 1 Set 1
- 0580_s10_ms_42
- Distributed Observer Design for Leader following Control of Multi-Agent System with Pinning Technique
- Frequency Analysis of Variable Networks
- Lagrange
- Year 9 Linear Relationship Assessment Menu
- lesson 86-steps used in solving equations
- 4 x 4 O's and X's Antisymmetry
- Anagha Gupte Teach (LP)

You are on page 1of 165

A Thesis Submitted

Doctor of Philosophy

by

Lalan Kumar

To the

March, 2015

Abstract

With increased computational power and evolution of compact device technology, microphone

arrays are being used in hand held devices like mobile phones to large scale defense equip-

ments. Source Localization is a central problem in microphone array signal processing, and

it becomes even more challenging in the presence of noise, reverberation and sensor array

ambiguities. In this thesis, novel methods for acoustic source localization are proposed in

spatial and spherical harmonics domain.

In the context of spatial domain signal processing, a high resolution method that utilizes

the phase of MUltiple SIgnal Classiﬁcation (MUSIC), is proposed for far-ﬁeld source local-

ization over planar array. This method computes the group delay of MUSIC, and is called

MUSIC-Group delay (MGD) method. The MUSIC-Group delay method is able to resolve

closely spaced sources with a minimal number of sensors in contrast to the standard MUSIC

method, even in a reverberant environment.

Signal processing in spherical harmonics domain provides ease of beampattern steering

and a uniﬁed formulation for a wide range of array conﬁgurations. Both far-ﬁeld and near-

ﬁeld source localization problems are addressed in the spherical harmonics (SH) domain. The

MUSIC-Group delay method is formulated in spherical harmonics domain (called SH-MGD),

to resolve the spatial ambiguity in planar array. A search-free algorithm, SH-root-MUSIC is

also proposed for azimuth only estimation of far-ﬁeld sources.

A new data model is developed for near-ﬁeld source localization in spherical harmonics

domain. In particular, three methods namely SH-MUSIC, SH-MGD and SH-MVDR, that

jointly estimate the range and bearing of multiple sources are proposed. The near-ﬁeld

MVDR beampattern analysis is also performed to illustrate the signiﬁcance of the proposed

method. Stochastic Cramér-Rao bound for far-ﬁeld and near-ﬁeld data model is formulated

in spherical harmonics domain to evaluate the location estimator. Several experiments on

3-D source localization are conducted in reverberant and noisy environments. Additionally,

experiments are also performed on real signal acquired over spherical microphone array in

anechoic chamber. The comparative performance of the proposed methods is presented in

terms of root mean square error, probability of resolution and average error distribution.

iii

Dedicated

To

My Spiritual Master,

His Holiness Radhanath Swami

iv

Acknowledgment

I take this opportunity to thank my counselor, Dr. Makarand Upkare, for turning direction of

my life to research and teaching. I would like to express my deepest gratitude to my advisor,

Dr. Rajesh M. Hegde, for tirelessly inspiring, motivating and guiding. Without his constant

support with all accommodative nature, this thesis would not have come to reality. His out of

box thinking, passion for research and timeliness made my research path smoother. His four

years of association has taught me many great values, which I will cherish throughout my

life. I am also thankful to Prof. Harish Karnick and Prof. Pradip Sircar for useful discussion

on various occasions.

I gratefully acknowledge the ﬁnancial support from MHRD, Government of India (2010-

2011) and Tata Consultancy Services (TCS) under TCS research scholarship program (2011-

2015). The travel support for national and international conference from Government of

India, TCS and IIT Kanpur gave me opportunities to explore the world around.

I would also like to extend my appreciation to all my MiPS labmates. In particular, I

thank Ardhendu, Kushagra, Waquar, Karan, Sudhir, Sandeep, Sachin and Shreyan whose

presence made this journey full of fun and learning. In addition, I am grateful to Ishtiyaq

Husain and Mr. Narendra Singh who provided all support needed in my research.

I am indebted to my father S.L. Baranwal and my mother Late Shanti Devi for giving me

everything. I am equally indebted to all my brothers Ashok, Nakul and Sunil for supporting

me in everyway. I am particularly thankful to my wife Mrs. Deepshikha and little son

Madhav, for bearing my late nights in the lab. I am grateful to her for being so tolerant and

patient about my situation.

At last, I would like to thank my spiritual master HH Radhanath Maharaj and Lord

Krishna who arranged all these.

v

Contents

List of Figures xi

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Problem Statement and Research Objectives . . . . . . . . . . . . . . . . . . 3

1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 The Spherical Coordinate System . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Acoustic Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Solution to Wave Equation in Cartesian Coordinates . . . . . . . . . . . . . . 10

2.4.1 Plane Wave Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4.2 Spherical Wave Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5 Solution to Wave Equation in Spherical coordinates . . . . . . . . . . . . . . 13

2.5.1 Plane Wave Solution for Rigid Sphere . . . . . . . . . . . . . . . . . . 16

2.5.2 Spherical Wave Solution for Rigid Sphere . . . . . . . . . . . . . . . . 18

vi

2.5.3 Range Criterion for Near-ﬁeld and Far-ﬁeld in Source Localization . . 19

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Geometry of Microphone Array . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.1 Uniform Linear Array . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.2 Uniform Circular Array . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.3 Spherical Microphone Array . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Microphone Array Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3.1 Spatial Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3.2 Acoustic Noise and Reverberation . . . . . . . . . . . . . . . . . . . . 29

3.4 Acoustic Source Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4.1 Correlation-based Source Localization . . . . . . . . . . . . . . . . . . 31

3.4.1.1 Source Localization using Plain Time Correlation . . . . . . 32

3.4.1.2 Source Localization using Generalized Cross-correlation . . . 33

3.4.2 Beamforming-based Source Localization . . . . . . . . . . . . . . . . . 34

3.4.2.1 Delay-and-Sum Beamforming . . . . . . . . . . . . . . . . . . 35

3.4.2.2 Capon Beamforming . . . . . . . . . . . . . . . . . . . . . . . 36

3.4.2.3 Beampattern Analysis . . . . . . . . . . . . . . . . . . . . . . 38

3.4.3 Subspace-based Source Localization . . . . . . . . . . . . . . . . . . . 39

3.4.3.1 The MUSIC Method . . . . . . . . . . . . . . . . . . . . . . . 40

3.4.3.2 Computing MUSIC Spectrum from Sample Covariance Matrix 42

3.4.3.3 The MUSIC-Group Delay Method . . . . . . . . . . . . . . . 44

3.4.3.4 The MUSIC-Group Delay Method using Shrinkage Estimators 45

3.4.3.5 The root-MUSIC Method . . . . . . . . . . . . . . . . . . . . 46

3.5 Wideband Source Localization . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

phone Array 51

vii

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization . . . 52

4.2.1 Music-Group Delay Method for Source Localization over Planar Array 52

4.2.2 Spectral Analysis of the MUSIC-Group Delay Function under Rever-

berant Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.2.3 Two-dimensional Additive Property of the MUSIC-Group Delay Spec-

trum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.3 Localization Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.3.1 Performance under Sensor Perturbation Error . . . . . . . . . . . . . . 60

4.3.2 Cramér-Rao Bound Analysis . . . . . . . . . . . . . . . . . . . . . . . 62

4.3.3 Source Localization Error Analysis under Reverberant Environments . 64

4.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.4.1 Experimental Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.4.2 Experiments on Speech Enhancement in Multi-source Environment . . 66

4.4.3 Experiments on Perceptual Evaluation of Enhanced Speech . . . . . . 68

4.4.4 Experiments on Distant Speech Recognition . . . . . . . . . . . . . . . 69

4.5 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2 Fundamentals of Spherical Array Processing . . . . . . . . . . . . . . . . . . . 72

5.2.1 The Spherical Fourier Transform . . . . . . . . . . . . . . . . . . . . . 72

5.2.2 Beampattern Analysis in Spherical Harmonics Domain . . . . . . . . . 74

5.3 Microphone Array Data Model in Spherical Harmonics Domain . . . . . . . . 76

5.3.1 Data Model in Spatial Domain . . . . . . . . . . . . . . . . . . . . . . 76

5.3.2 Data Model in Spherical Harmonics Domain . . . . . . . . . . . . . . 78

5.4 Advantage of Array Data Model Formulation in Spherical Harmonics Domain 79

5.4.1 Reduced Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.4.2 Frequency Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.4.3 Ease of Beamforming . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

viii

5.5 Far-ﬁeld Source Localization using Spherical Microphone Array . . . . . . . . 81

5.5.1 Spherical Harmonics MVDR Method . . . . . . . . . . . . . . . . . . . 82

5.5.2 Spherical Harmonics MUSIC Method . . . . . . . . . . . . . . . . . . 83

5.5.3 Spherical Harmonics MUSIC-Group Delay Method . . . . . . . . . . . 83

5.5.4 Noise Whitening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.6 Formulation of Stochastic Cramér-Rao Bound for Far-ﬁeld Sources . . . . . . 85

5.6.1 Existence of the Stochastic CRB in Spherical Harmonics Domain . . . 86

5.6.2 CRB Analysis in Spherical Harmonics Domain . . . . . . . . . . . . . 87

5.7 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.7.1 Experiments on Far-ﬁeld Source Localization in Noisy Environments . 89

5.7.2 Experiments on Far-ﬁeld Source Localization in Reverberant Environ-

ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.7.3 Statistical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.7.4 Experiments on narrowband source Tracking . . . . . . . . . . . . . . 91

5.8 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.2 Formulation of root-MUSIC in Spherical Harmonics Domain . . . . . . . . . . 96

6.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6.3.1 Experiments on Source Localization . . . . . . . . . . . . . . . . . . . 100

6.3.2 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.4 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7.2 Formulation of Near-ﬁeld Array Data Model in Spherical Harmonics Domain 104

7.2.1 Near-ﬁeld Data model in Spatial Domain . . . . . . . . . . . . . . . . 104

7.2.2 Near-ﬁeld Data model in Spherical Harmonics Domain . . . . . . . . . 106

7.3 Near-ﬁeld Source Localization in Spherical Harmonics Domain . . . . . . . . 108

7.3.1 Spherical Harmonics MUSIC for Near-ﬁeld Source Localization . . . . 109

ix

7.3.2 Spherical Harmonics MUSIC-Group Delay Method for Near-ﬁeld Source

Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

7.3.3 Spherical Harmonics MVDR Method for Near-ﬁeld Source Localization 110

7.4 The Near-ﬁeld MVDR Beampattern Analysis . . . . . . . . . . . . . . . . . . 111

7.5 Cramér-Rao Bound Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.6.1 Experiments on Near-ﬁeld Source Localization . . . . . . . . . . . . . 115

7.6.1.1 RMSE Analysis of Range Estimation . . . . . . . . . . . . . 115

7.6.1.2 Statistical Analysis of Range Estimation . . . . . . . . . . . 116

7.6.2 Experiments on Joint Range and Bearing Estimation . . . . . . . . . . 117

7.6.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 117

7.6.2.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 118

7.6.3 Experiments on Interference Suppression using Near-ﬁeld MVDR Beam-

forming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

7.7 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Appendices 126

A.1 Formulation of Fisher Information Matrix . . . . . . . . . . . . . . . . . . . . 128

A.2 Computing the Derivative of Spherical Harmonics Function Ynm . . . . . . . . 130

References 133

x

List of Figures

2.2 Diagram illustrating general time delay estimation from a traveling plane wave. 11

2.3 Illustration of a traveling spherical wave and associated time delay estimation. 13

2.4 Spherical harmonics plot, Y00 , Y10 , Y11 . . . . . . . . . . . . . . . . . . . . . . . 15

2.5 Variation of mode strength bn in dB as a function of kr and n for an open

sphere. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.6 Plot showing the nature of far-ﬁeld and near-ﬁeld mode strength for the Eigen-

mike system. Near-ﬁeld source is at rl = 1m and order is varied from n = 0

to n = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Front back ambiguity in ULA. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 Uniform circular array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4 Photograph of a spherical microphone array : The Eigenmike system. . . . . 24

3.5 Illustration of various regions in a typical room impulse response (RIR). . . . 30

3.6 Voiced frame of a speech signal of length 512 samples, original signal (top) and

signal delayed by 40 samples (bottom). . . . . . . . . . . . . . . . . . . . . . . 31

3.7 Plain time correlation plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.8 Generalized cross-correlation (GCC), GCC-Roth and GCC-PHAT plots (top

to bottom) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.9 Beamformer block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.10 Delay-and-sum beamformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

xi

3.11 DOA estimation using (a) DSB and (b) MVDR method. A ULA with I = 10

microphones was used for sources located at 20◦ and 60◦ . . . . . . . . . . . . 37

3.12 Delay-and-sum beampattern for ULA with no spatial aliasing for I = 10,

φs = 90◦ and d = 0.5λ (a) in Cartesian coordinates and (b) in polar coordinates. 38

3.13 Delay-and-sum beampattern for ULA under aliasing for I = 10, φs = 90◦ and

d = 2λ (a) in Cartesian coordinates and (b) in polar coordinates. . . . . . . . 39

3.14 Illustration of Delay-and-sum beampattern for UCA with I = 10, Ψs =

(45◦ , 90◦ ), under no spatial aliasing (a) in spherical coordinate system (b)

in rectangular coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.15 MUSIC-Magnitude spectrum for DOA 60◦ and 65◦ using 5 sensors (top) and

for 15 sensors (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.16 MUSIC, Unwrapped phase (of MUSIC) and MUSIC-Group delay spectra for

two sources with azimuth (a) 60◦ and 65◦ , (b) 50◦ and 60◦ . . . . . . . . . . . 45

3.17 Eigenvalue estimation using sample covariance and shrinkage estimator using

10 Sensors for 3 Sources, located at 20◦ , 35◦ and 50◦ . . . . . . . . . . . . . . . 46

3.18 The MUSIC-Magnitude spectrum (Top), the MUSIC-GD spectrum (Middle),

and the MUSIC-GD spectrum with shrinkage estimation (bottom) using 6

sensors for closely spaced sources located at 20◦ and 25◦ , at DRR=20dB. . . 47

3.19 Z-Plane representation of all the roots of root-MUSIC polynomial using 8

sensors for 2 sources with locations 40◦ and 50◦ . . . . . . . . . . . . . . . . . 48

4.1 Spectral magnitude of MUSIC for UCA (top) and ULA (bottom). Sources at

(15◦ ,50◦ ) and (20◦ ,60◦ ) for UCA. Sources at 50◦ and 60◦ for ULA. . . . . . . 53

4.2 Spectral phase of MUSIC for UCA (top) and ULA (bottom). Sources at

(15◦ ,50◦ ) and (20◦ ,60◦ ) for UCA. Sources at 50◦ and 60◦ for ULA. . . . . . . 54

4.3 Illustration of standard group delay of MUSIC and the MUSIC-Group delay

as proposed in this work. (a) Standard group delay spectrum of MUSIC for

UCA (top) and ULA (bottom) (b) MUSIC-Group delay spectrum for UCA

(top) and ULA (bottom). Sources are at (15◦ ,50◦ ) and (20◦ ,60◦ ) for UCA, at

50◦ and 60◦ for ULA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

xii

4.4 Plots illustrating azimuth and elevation angle as estimated by (a) MUSIC-

Magnitude and (b) MUSIC-Group delay spectrum for sources at (15◦ ,100◦ ) and

(17◦ ,105◦ ), reverberation time 400 ms. MM estimates single peak at (18◦ ,105◦ ).

MGD estimates two peak at (19◦ ,100◦ ) and (17◦ ,108◦ ). . . . . . . . . . . . . . 56

4.5 Two dimensional spectral plots for the cascade of two individual DOAs (res-

onators), (a) Source with DOA (15◦ ,60◦ ) (b) Source with DOA (18◦ ,55◦ ) (c)

MUSIC-Magnitude spectrum (d) MUSIC-Group delay spectrum. . . . . . . . 59

4.6 Contour plots of (a) MUSIC-Magnitude spectrum (b) MUSIC-Group delay

spectrum, under sensor perturbation errors. . . . . . . . . . . . . . . . . . . . 62

4.7 Two dimensional scatter plot for localization for the sources at (10◦ ,20◦ ) and

(5◦ ,10◦ ) using (a) MUSIC-Magnitude method and (b) MUSIC-Group delay

method. Reverberation time is 150 ms. SNR is 40 dB. Number of iteration is

500. The red dot indicates the actual DOA. . . . . . . . . . . . . . . . . . . . 64

4.8 Experimental Setup in meeting room with two speakers (S1 and S2) and two

interference (stationary noise source SN and nonstationary noise source NS).

Sources are located at (17◦ ,35◦ ), (19◦ ,40◦ ), (15◦ ,30◦ ) and (21◦ ,45◦ ) respectively.

Radius of the circular array is 10 cm. . . . . . . . . . . . . . . . . . . . . . . . 65

4.9 Flow diagram illustrating the methodology followed in performance evaluation

for distant speech signal acquired over circular array. . . . . . . . . . . . . . . 66

5.2 Illustration of the spherical harmonics beampatterns (a) regular beampattern

for order N = 3, (b)regular beampattern for order N = 4 (c) DSB beampattern

for order N = 3 and, (d) DSB beampattern for order N = 4 . . . . . . . . . . 75

5.3 SH-MVDR spectrum for sources at (20◦ ,50◦ ) and (15◦ ,60◦ ), SNR=10 dB . . . 82

5.4 SH-MUSIC spectrum for sources at (20◦ ,50◦ ) and (15◦ ,60◦ ), SNR=10 dB . . 83

5.5 SH-MGD spectrum for sources at (20◦ ,50◦ ) and (15◦ ,60◦ ), SNR=10 dB . . . . 84

5.6 Variation of CRB for elevation (θ) and azimuth (φ) estimation (a) at various

SNR with 300 snapshots, (b) with varying snapshots at SNR 20dB. Source is

located at (20◦ , 50◦ ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

xiii

5.7 Cumulative RMSE in source angle estimation at various SNRs for two hundred

iterations. The sources are located at (30◦ , 35◦ ) and (50◦ , 60◦ ). . . . . . . . . 90

5.8 Trajectory of elevation angle (θ) followed by the moving source with time for

a ﬁxed azimuth φ = 45◦ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.9 Tracking result for elevation (a)SH-MUSIC and (b) SH-MGD. The azimuth is

ﬁxed at 45◦ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.10 Average error distribution plot for tracking error using SH-MUSIC and SH-

MGD Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.1 Plot of SH-MUSIC illustrating DOA estimation using fourth order Eigenmike

system. Sources are located at (20◦ ,40◦ ) and (20◦ ,70◦ ) with SNR 15dB. . . . 97

6.2 Plot of SH-root-MUSIC illustrating the actual DOA estimates (red stars) and

noisy DOA estimates (blue triangles). A fourth order Eigenmike system is

used. Sources are located at (20◦ ,40◦ ) and (20◦ ,70◦ ) with SNR 15dB. . . . . . 99

6.3 Probability of resolution plot for two sources with azimuth (40◦ , 80◦ ) and co-

elevation 20◦ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

array. The ith microphone is positioned at ri and lth source at rl . . . . . . . . 105

7.2 Illustration of range and elevation estimation by (a) SH-MUSIC method (b)

SH-MGD method (c) SH-MVDR method for ﬁxed azimuth. Illustration of

elevation and azimuth estimation using (d) SH-MUSIC method (e) SH-MGD

method (f) SH-MVDR method for ﬁxed range. The sources are at (0.06m,60◦ ,30◦ )

and (0.08m,55◦ ,40◦ ) at an SNR of 10dB. . . . . . . . . . . . . . . . . . . . . . 110

7.3 Cramér-Rao bound analysis at various SNR, (a) for random signal (b) for

sinusoidal signal. The source location is (0.08m, 40◦ , 50◦ ). . . . . . . . . . . . 114

7.4 Range estimation performance of SH-MGD, SH-MUSIC and SH-MVDR in

terms of probability of resolution. . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.5 The Eigenmike setup in an anechoic chamber at IIT Kanpur for acquiring

near-ﬁeld sources. A near-ﬁeld source is placed at (0.3m, 90◦ , 90◦ ). . . . . . . 117

xiv

7.6 Four dimensional scatter plots using, (a) SH-MUSIC for simulated signal, (b)

SH-MGD for simulated signal, (c) SH-MUSIC for signal acquired over SMA

(d) SH-MGD for acquired over SMA. A narrowband source with frequency

600Hz, located at (0.3m, 90◦ , 90◦ ) is considered. . . . . . . . . . . . . . . . . . 118

7.7 Illustration of near-ﬁeld MVDR beampattern. The desired source is at (0.1m, 50◦ , 30◦ ),

and interfering source at (0.3m, 55◦ , 40◦ ). . . . . . . . . . . . . . . . . . . . . 119

7.8 Radial ﬁltering analysis of the proposed near-ﬁeld MVDR method over a spher-

ical microphone array. (a) Array gain for ﬁxed r = 0.1m. (b) Array gain for

ﬁxed r = 0.3m. (c) Array gain for ﬁxed θ = 30◦ . (d) Array gain for ﬁxed

θ = 40◦ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

xv

List of Tables

4.1 Comparison of average RMSE of various methods with the CRB (illustrated in

the ﬁrst row) for an azimuth range of 10◦ -150◦ and elevation range of 10◦ -80◦

at T 60 of 200 ms and SNR 10dB. . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2 Enhancement in SIR (dB), compared for various methods at diﬀerent rever-

beration time. S1s is the desired speaker, S2s is the competing speaker, S ns is

non-stationary noise source and S sn is stationary noise source. . . . . . . . . 67

4.3 Comparison of perceptual evaluation results using various methods. The re-

sults are compared based on objective measure. . . . . . . . . . . . . . . . . . 68

4.4 Comparison of distant speech recognition performance in terms of WER (in

percentage) at various reverberation time, T60 . . . . . . . . . . . . . . . . . . 69

5.2 Probability of resolution at various SNRs for 200 iterations. Sources are taken

at (30◦ , 35◦ ) and (50◦ , 60◦ ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.1 Comparison of RMSE of various source localization methods at diﬀerent SNR 100

7.1 Cumulative RMSE in range r, at various SNRs for 100 iterations. Sources are

at (0.1m, 30◦ , 45◦ ) and (0.8m, 30◦ , 45◦ ). . . . . . . . . . . . . . . . . . . . . . . 116

xvi

List of Symbols

(.)T Transpose of vector or matrix (.)

(.)H Conjugate transpose of matrix or vector (.)

a Steering vector in spatial domain

anm Steering vector in spherical harmonics domain

A Steering matrix in spatial domain

Anm Steering matrix in spherical harmonics domain

B Mode strength matrix

bn (k, r) nth order far-ﬁeld mode strength

bn (k, r, rl ) nth order near-ﬁeld mode strength

∗ Convolution

c Speed of sound

d Distance between two consecutive microphones in ULA

h1n (kr), hn (kr) Spherical Hankel function of ﬁrst kind

h2n (kr) Spherical Hankel function of second kind

i Microphone index

I Number of Microphones

j Unit imaginary number

jn Spherical Bessel functions of the ﬁrst kind

Jn Bessel function of the ﬁrst kind

k Wavenumber

k Wavevector

l Source index

L Number of sources

m Degree (of spherical harmonics)

n Order (of spherical harmonics or mode strength)

N Order of spherical microphone array

Ns Number of snapshots

p(r, θ, φ, t) Pressure in space-time domain

P (r, θ, φ, ω) Pressure in frequency domain

Pnm Associated Legendre functions

Pnm Spherical Fourier transform of P

q Noise eigenvector

Qn Noise subspace in spatial domain

Qnm Noise subspace in spherical harmonics domain

Rp Array Covariance matrix

RDnm /RPnm Modal covariance matrix

r Position vector

ra Radius of circular/spherical microphone array

s Source signal amplitude in time domain

S Source signal amplitude in frequency domain

T60 Reverberation time

v Sensor noise

Ts Sampling period

yn Spherical Bessel functions of the second kind

Yn Bessel function of the second kind

Ynm Spherical harmonics of order n and degree m

C The set of all complex numbers

∇ Gradient

τ Time delay

Ψ Angular location of a source

Φ Angular location of a microphone

θ Elevation angle

φ Azimuth angle

λ Wavelength

ζ Conﬁdence interval

xviii

List of Abbreviations

2-D Two-dimensional

3-D Three-dimensional

AED Average Error Distribution

BSM Beamspace MUSIC

CRB Cramér-Rao Bound

CTM Close Talk Microphone

DFT Discrete Fourier Transform

DOA Direction of Arrival

DRR Direct to Reverberant energy Ratio

DSB Delay-and-Sum Beamformer

DSR Distant Speech Recognition

ESPRIT Estimation of Signal Parameters using Rotational Invariance Techniques

FFT Fast Fourier Transform

FIM Fisher Information Matrix

FSB Filter Sum Beamformer

GCC Generalized Cross Correlation

IDFT Inverse Discrete Fourier Transform

LCMV Linearly Constrained Minimum Variance

MM MUSIC Magnitude

MUSIC MUltiple SIgnal Classiﬁcation

MGD MUSIC-Group Delay

MVDR Minimum Variance Distortionless Response

PDF Probability Density Function

PHAT Phase Transform

RIR Room Impulse Response

RMSE Root Mean Square Error

SH Spherical Harmonics

SIR Signal to Interference Ratio

SINR Signal to Interference plus Noise Ratio

SMA Spherical Microphone Array

SMGD MUSIC-Group Delay using Shrinkage Estimator

SNR Signal to Noise ratio

TDOA Time Delay Of Arrival

STFT Short Time Fourier Transform

UCA Uniform Circular Array

ULA Uniform Linear Array

WER Word Error Rate

xx

Chapter 1

Introduction

the spatial information of a sound source. The spatial-temporal information available at the

output of the microphone array can be used to estimate various source parameters or extract

the intended source signal. This has many daily life applications like localization and tracking

of multiple sources, estimation of number of sources, noise reduction, echo cancellation and

dereverberation. Array signal processing techniques are utilized in all such applications. The

underlying array signal processing is capable of providing promising solutions to our day to

day problems because of following reasons [1].

• Its capability of enhancing the signal to noise ratio (SNR) of noise-corrupted signal

• Array can be used as spatial ﬁlter to selectively allow signal from one direction while

rejecting from all other directions, known as beamforming

• Beam can be electronically steered by appropriate delay in each channel signal, without

having to point the array physically

With the above three features, microphone arrays are widely used for direction of arrival

(DOA) estimation and speech enhancement for telecommunication and robotics. Source

localization refers to azimuth and elevation estimation of far-ﬁeld sources. For near-ﬁeld

sources, it refers the estimation of azimuth, elevation and range.

1.1 Motivation 2

1.1 Motivation

Source localization has application in acoustic scene analysis, speech separation, distant

speech recognition, surveillance, assisted living environments and hands-free communication.

Hence, it has been an active area of research. Various algorithms for source localization has

been proposed. Minimum Variance Distortionless Response (MVDR) [2] and Multiple SIg-

nal Classiﬁcation (MUSIC) [3] are the most popular nonparametric and parametric methods

respectively. The MUSIC method is widely studied due to its high resolution and computa-

tional eﬃciency. The MUSIC method utilizes magnitude spectrum that gives large number

of spurious peaks or single peak for closely spaced sources with limited number of sensors.

The group delay spectrum has been used widely in temporal frequency processing for its high

resolution properties [4, 5]. However, group delay has hitherto not been utilized for spatial

spectrum analysis. Investigating MUSIC-Group delay (MGD) spectrum for high resolution

DOA estimation for various array conﬁgurations, provided the initial motivation.

In literature on source localization, various microphone array conﬁgurations have been

used, which includes linear, planar and three-dimensional (3-D) arrays. Source localization

problem formulation using uniform linear array (ULA) is simple. However, it can locate

sources in a plane and exhibits front-back ambiguity. Complexity increases with planar

arrays, like the uniform circular array (UCA), but it can localize sources anywhere in space

above the plane of the array. Spherical microphone array (SMA), on the other hand, can

localize sources anywhere in the space with no spatial ambiguity.

Initial work on spherical array can be found in [6], where an approach using spherical

harmonic expansion to design beampatterns for continuous spherical arrays is discussed.

Some other work on discrete spherical array can be found in [7, 8, 9]. However, these works

are related to antenna arrays and the techniques used are from linear arrays.

A general approach using spherical harmonics (SH) signal processing was proposed in

[10, 11] for spherical microphone array utilizing pressure sensors. After the introduction

of spherical harmonics signal processing, spherical microphone array research has attracted

the attention of the researchers. In spherical harmonics domain, the formulation of source

localization problem and beamforming become simple with reduced complexity. The ability

1.2 Problem Statement and Research Objectives 3

of spherical microphone arrays to measure and analyze 3-D sound ﬁelds in an eﬀective manner

and ease of signal processing in spherical harmonics domain has motivated the researchers.

Most of the source localization algorithms utilizing spherical microphone array, were proposed

in last ﬁve years [12, 13, 14, 15, 16, 17]. Spherical microphone array signal processing has

thus become an active area of research. It has extensive applications in three dimensional

sound reception, sound ﬁeld analysis, teleconferencing, direction of arrival estimation and

noise control application. Source localization forms an integral part of these applications.

Hence, signiﬁcant part of the thesis focuses on source localization using spherical microphone

array.

In conventional MUSIC method for source localization, the MUSIC magnitude spectrum

is used. Phase spectrum of MUSIC has been utilized for source localization in [18] for

ULA. The phase spectrum is found to be more robust and has a high resolution. However,

negative diﬀerential of the unwrapped phase spectrum of MUSIC (group delay of MUSIC)

is unexplored for source localization. It is to be noted that group delay function has been

widely used in temporal frequency processing for its high resolution and additive properties

[4, 5]. The additive property of the group delay function for spatial spectrum analysis was

ﬁrst discussed in [19]. In this thesis, MUSIC-Group delay spectrum is utilized for source

localization over planar and spherical microphone array.

Numerous algorithms have been proposed in literature for accurate and search-free DOA

estimation using ULA and UCA. root-MUSIC [20] and Estimation of Signal Parameters us-

ing Rotational Invariance Techniques, ESPRIT [21], fall under this category. Estimating

DOA from roots of MUSIC polynomial is possible because of Vandermonde structure of ar-

ray manifold. This is made possible in UCA for azimuth only estimation using beamspace

transformation [22]. Utilizing manifold separation for Vandermonde structure in array man-

ifold and hence azimuth estimation using root-MUSIC in spherical harmonics domain, needs

to be investigated.

All of the source localization algorithms developed in spherical harmonics domain, deal

1.3 Organization of the Thesis 4

with far-ﬁeld sources only. Far-ﬁeld assumption greatly simpliﬁes the source localization

and beamforming approach. However, applications involving microphone array generally

require assumption of near-ﬁeld sources. In applications like Close Talk Microphone (CTM),

teleconferencing, hands-free telephony and voice-only data entry, the signal source is well

within the near-ﬁeld range of the array. Using the far-ﬁeld assumption in the near-ﬁeld

of an array can result in severe degradation in array performance [23]. Also, important

spatial information is lost. Near-ﬁeld criteria for spherical microphone array was formally

formulated in [24] in terms of range of the near-ﬁeld sources. However, a formal data model for

simultaneous estimation of range and bearing in spherical harmonics domain is not available

in literature. Hence, there is a need for the development of near-ﬁeld data model and source

localization algorithms in spherical harmonics domain.

The performance of any localization estimator is evaluated against Cramér-Rao bound

(CRB). CRB places a lower bound on the variance of an unbiased estimator. Hence, it is also

of suﬃcient interest to develop an expression for Cramér-Rao bound in spherical harmonics

domain.

Based on above discussion, the thesis aims to develop novel methods for source localization

over planar and spherical microphone arrays. Additionally, novel methods for near-ﬁeld and

far-ﬁeld source localization over spherical microphone array are proposed. Cramér-Rao bound

formulation and analysis is also presented.

The thesis comprises of eight chapters. The thesis is divided into three parts. Chapter 2

and chapter 3 give a background on wave propagation and spatial array signal processing

respectively. Chapter 4 proposes novel source localization algorithms in spatial domain. The

latter part of the thesis focuses on signal processing in spherical harmonics domain. The rest

of the thesis is organized as follows.

utilized throughout the thesis and solution to wave equation is provided in this chapter.

The solution to wave equation is given in Cartesian and spherical coordinate system

1.3 Organization of the Thesis 5

for both plane wave and spherical wave case. Scattering from rigid sphere, spherical

harmonics, near-ﬁeld and far-ﬁeld mode strength are introduced herein.

• Chapter 3 : The fundamentals of array signal processing are discussed here. The

spatio-temporal array data model is derived from ﬁrst principles of physics. Various

source localization methods are described for linear, planar and spherical microphone

array.

• Chapter 4 : A robust and high resolution method based on group delay of MUSIC,

called MUSIC-Group delay, using planar array is proposed here. Additive property of

group-delay spectrum of MUSIC is proved. A discussion on the high resolution of the

method is presented. Various experiments are conducted to illustrate its signiﬁcance.

source localization algorithms are discussed in spherical harmonics domain. In particu-

lar, MUSIC-Group delay is proposed in spherical harmonics domain, called SH-MGD.

SH-MGD spectrum is utilized for source localization and tracking. Cramér-Rao bound

formulation and analysis is also presented for the proposed methods.

sources. Manifold separation is utilized herein for showing Vandermonde structure in

array manifold. The SH-root-MUSIC is a search free algorithm which estimates DOA

by computing the roots of SH-MUSIC polynomial.

• Chapter 7 : proposes new methods for near-ﬁeld source localization using spherical

microphone array. A new data model for near-ﬁeld source localization is presented.

Three methods that jointly estimate the range and bearing of multiple sources in the

spherical array framework namely SH-MUSIC, SH-MGD and SH-MVDR, are proposed.

Cramér-Rao bound formulation and analysis in spherical harmonics domain for near-

ﬁeld sources is also presented.

• Chapter 8 : draws conclusions from the methods proposed in the thesis. Future

direction is also detailed herein.

1.4 Summary of Contributions 6

ics domain. Appendix A.2 computes the derivative of spherical harmonics. Appendix

B gives derivative of near-ﬁeld steering matrix in spherical harmonics domain.

This thesis contributes to the body of knowledge on array signal processing both in spatial

and spherical harmonics domain. The speciﬁc contributions are

1. Far-ﬁeld Source Localization in Spatial Domain [25]: The negative diﬀerential

of the unwrapped phase spectrum (group delay) of MUSIC for DOA estimation over

planar arrays is proposed. In particular, MUSIC-Group delay spectrum is utilized

for robust source localization using uniform circular array. Although the group delay

function has been used widely in temporal frequency processing for its high resolution

properties [4], the additive property of the group delay function has hitherto not been

utilized in spatial spectrum analysis. The signiﬁcance of additive property in the context

of DOA estimation is throughly studied.

2. Far-ﬁeld Source Localization in Spherical Harmonics Domain [14]: Two

methods for far-ﬁeld source localization are proposed in spherical harmonics domain.

SH-MGD utilizes the advantages of the MUSIC-Group delay spectrum in a spheri-

cal harmonics framework. A search free algorithm, SH-root-MUSIC, which estimates

DOA by computing the roots of SH-MUSIC polynomial, is also proposed for azimuth

estimation of far-ﬁeld sources.

3. Near-ﬁeld Source Localization in Spherical Harmonics Domain [13] : A new

data model for near-ﬁeld source localization is formulated in the spherical harmonics

domain. SH-MUSIC, SH-MVDR and SH-MGD are proposed for joint estimation of

range and bearing of the sources.

4. Formulation and Analysis of Cramér-Rao Bound in Spherical Harmonics

Domain [26]: Expressions for stochastic Cramér-Rao bound is formulated in spherical

harmonics domain for both far-ﬁeld and near-ﬁeld sources. The existence of CRB for

the spherical harmonics data model is ﬁrst veriﬁed. Subsequently, an expression for

stochastic CRB is derived in spherical harmonics domain.

Chapter 2

Propagation for Source Localization

2.1 Introduction

The focus of this thesis is localization of sound sources in space by utilizing microphone array.

Microphone array is used to measure acoustic waveﬁeld and extract spatial information about

the sources. Array signal processing algorithms depend on accurate characterization of wave

through solution to wave equation. A brief discussion on acoustic wave equation and its

solution is provided in this Chapter. Solution to wave equation governs the propagation of

acoustic wave in a medium. The planar and spherical wave propagation models are extensively

utilized in this thesis. This chapter starts with coordinate system which will be utilized for

deﬁning the location of an acoustic source and point of observation.

In this thesis, spherical coordinate system is utilized. The coordinate system is illustrated in

Figure 2.1. A position vector is denoted by r = (r, θ, φ)T where (.)T denotes the transpose

operator. The range r of a source, can vary from 0 to ∞ and is measured from the origin.

The angle θ is referred as elevation angle and is measured down from positive z axis. The

azimuthal angle φ is measured counterclockwise from positive x axis. The range of θ and

2.2 The Spherical Coordinate System 8

φ are [0, π] and [0, 2π] respectively. In this thesis, location of a source is represented as

rl = (rl , Ψl ), with Ψl = (θl , φl ). The location of a receiver is denoted as ri = (ri , Φi ), where

Φi = (θi , φi ).

Z

The spherical coordinate of a point (r, θ, φ)T in space is related to right-handed Cartesian

coordinates (x, y, z)T by simple trigonometric formulae as

� �T

r = xî + y ĵ + z k̂ = r sin θ cos φ r sin θ sin φ r cos θ (2.4)

where î, ĵ and k̂ are unit vectors in direction of x-axis, y-axis and z-axis. In the rest of the

thesis, a vector will be represented by a column matrix. Although there are other coordinate

systems, spherical coordinate system will be used prominently in this thesis.

A propagating wave in space-time domain will be represented by p(r, θ, φ, t). For acoustic

wave propagation, p(r, θ, φ, t) ≡ p(r, t) represents inﬁnitesimal variation of acoustic pressure

at position r and time t. The pressure is governed by acoustic wave equation, discussed in

the ensuing Section.

2.3 Acoustic Wave Equation 9

In array signal processing, sound information is propagated through the medium to reach the

array. The hydrostatic pressure ﬁeld is generated by means of propagation of compression

and rarefaction of the sound waves.

For a traveling wave, the pressure will be function of space and time. Let the inﬁnitesimal

variation of acoustic pressure from its equilibrium be represented by p(r, t). In a source free

homogeneous ﬂuid with no viscosity, the pressure satisﬁes the following wave equation [27]

1 ∂ 2 p(r, t)

∇2 p(r, t) − = 0, (2.5)

c2 ∂t2

where ∇2 represents the Laplacian operator and c is the speed of sound wave propagation

in a particular medium. At 20◦ C, value of c is 343 m/s in air and 1481 m/s in water.

The derivation of wave equation in Equation 2.5 is a result of basic laws of physics and

conservation of mass. In general, p(r, t) represents a scalar ﬁeld. The same equation is

valid for electromagnetic ﬁeld derived from Maxwell’s equations. p(r, t) in this case, would

represent the electric ﬁeld E.

The wave equation can also be re-written in frequency domain by applying Fourier trans-

formation of Equation 2.5. Hence, the wave equation in frequency domain can be expressed

as [27]

ω

∇2 P (r, ω) + ( )2 P (r, ω) = 0, (2.6)

c

ω

where ω = 2πf is the temporal frequency and k = c is termed as wavenumber. The wave

equation in 2.6 is also known as Helmholtz equation (or reduced wave equation). Wavenumber

k can thus be expressed as

ω 2π 2πf

k= = = . (2.7)

c λ c

where λ is wavelength of arriving wave. In the ensuing Section, a solution to the wave

equation is described.

2.4 Solution to Wave Equation in Cartesian Coordinates 10

In this Section, solution to wave equation in Cartesian coordinates is presented. The Equation

2.5 can be written in Cartesian coordinates as

∂2p ∂2p ∂2p 1 ∂2p

2

+ 2 + 2 = 2 2. (2.8)

∂x ∂y ∂z c ∂t

Perpendicular to direction of propagation of wave, the wave is spread out to form a wave-

front. The initial shape of the wavefront is generally considered to be arbitrary that evolves

according to the mathematics involved in solution to the wave equation. In particular, planar

and spherical wavefronts are studied. In the following Sections, both the planar and spherical

wave solutions are discussed.

Mathematical model of plane wave propagation is given by the solution to the partial diﬀer-

ential Equation 2.8. For a plane wave, value of p(r, t0 ) at given instant t0 , is constant over

all points on a plane, perpendicular to the direction of propagation as shown in Figure 2.2.

Let us consider a source situated in far-ﬁeld region with direction denoted by (θl , φl ). Hence,

the plane wave travels in direction given by (π − θl , π + φl ). The expression for wavevector

is given as

� �T

k = k sin θ cos φ k sin θ sin φ k cos θ . (2.9)

in Equation 2.9 with (π − θl , π + φl ), the expression for wavevector becomes

� �T

kl = − k sin θl cos φl k sin θl sin φl k cos θl . (2.10)

The opposite sign of wavenumber indicates that the wavevector in the new deﬁnition, will be

used for direction of arrival and not for wave propagation as shown in Figure 2.2.

Assuming propagation delay at a reference point to be zero, the delay at receiver ri is

denoted by τi (Ψl ). The delay τi (Ψl ), from geometry shown in Figure 2.2, can be calculated

as

dil

τi (Ψl ) = = k̂l .ri /c = kl .ri /ω (2.11)

c

2.4 Solution to Wave Equation in Cartesian Coordinates 11

At time t2

At reference

Direction of propagation

Source direction

Receiver

(0,0,0)

Figure 2.2: Diagram illustrating general time delay estimation from a traveling plane wave.

where position vector ri and wave vector kl are given by Equations 2.4 and 2.10 respectively.

Equation 2.11 can also be written as

In order to discuss the plane wave solution to wave equation, an arbitrary ﬁeld of the

form below is considered.

p(r, t) = f (t − k̂l .r/c) (2.13)

k̂l

The Equation 2.13 will satisfy the wave equation, where the term c is also known as slowness

vector. It is to be noted that plane wave has meaning only for a single frequency. Hence,

monochromatic plane wave solution to the wave equation at frequency ω, can be written as

[28, 29]

p(r, t) = Aej(ωt−kx x−ky y−kz z) (2.14)

� �T

number. The wave vector kl = kx ky kz , consists of spatial frequencies. Each spatial

frequency denotes 2π times the number of cycles per meter of the monochromatic plane wave

2.4 Solution to Wave Equation in Cartesian Coordinates 12

given by

ω2

kx2 + ky2 + kz2 = . (2.15)

c2

The constraint has to be satisﬁed by all monochromatic wave solutions. The plane wave

solution in Equation 2.14, can also be written in a compact way as

T r)

p(r, t) = Aej(ωt−kl . (2.16)

Taking the Fourier transform of Equation 2.13, the solution can be written in frequency

domain as

T

P (r, ω) = e−kl .r F (ω) = e−kl r F (ω) (2.17)

where F (ω) is temporal spectrum (frequency dependent part) [30]. This is also called solution

to Helmholtz equation.

The spherical waves are generated when sound is emitted equally in all direction (spherical

symmetry) from the center of a sphere, or sound source is highly localized. A traveling

spherical wave is illustrated in Figure 2.3 for a near-ﬁeld source located at rl . Assuming

spherical symmetry of the wave, the acoustic pressure p will be function of radial distance

and time but not of the angular coordinates. In this case, the Laplacian operator can be

converted to [31, p. 20-12],

∂2 2 ∂

∇2 = 2

+ (2.18)

∂r r ∂r

Hence, the wave equation in 2.5 takes the form as

∂ 2 p 2 ∂p 1 ∂2p

+ = . (2.19)

∂r2 r ∂r c2 ∂t2

1 ∂2 1 ∂2p

(rp) = . (2.20)

r ∂r2 c2 ∂t2

∂2 1 ∂2

(rp) = (rp). (2.21)

∂r2 c2 ∂t2

2.5 Solution to Wave Equation in Spherical coordinates 13

(0,0,0) M

Figure 2.3: Illustration of a traveling spherical wave and associated time delay estimation.

If the product rp is considered as a single term, the solution can be written similar to plane

wave solution in Equations 2.13. Hence, the spherical wave model is represented by

f (t − k̂l .r/c)

p(r, t) = . (2.22)

r

Utilizing Equation 2.11 and the geometry of the near-ﬁeld shown in Figure 2.3, the time

delay can be computed as

|ri − rl |

τ = k̂l .r/c = . (2.23)

c

Utilizing the Equations 2.22 and 2.23, the acoustic pressure at a point ri due to a source at

rl is given by

1

p(ri , t) = f (t − |ri − rl |/c) (2.24)

|ri − rl |

Taking the temporal Fourier transform of Equation 2.24, the ﬁnal solution to wave equation

can be written as

e−k|ri −rl |

P (r, ω) = F (ω). (2.25)

|ri − rl |

Similar to Equation 2.16, monochromatic solution to spherical wave is given by

A j(ωt−kT r)

p(r, t) = e l . (2.26)

r

In this Section, solution to the wave equation in spherical coordinate system is described.

The time dependent wave equation in Equation 2.5 can be written in spherical coordinates

2.5 Solution to Wave Equation in Spherical coordinates 14

as

1 ∂ 2 ∂p 1 ∂ ∂p 1 ∂2p 1 ∂2p

(r ) + (sin (θ) ) + = . (2.27)

r2 ∂r ∂r r2 sin (θ) ∂θ ∂θ r2 sin2 (θ) ∂φ2 c2 ∂t2

The solution to Equation 2.27 is given using separation of variables as

1 d2 Φ(φ)

= −m2 (2.29)

Φ(φ) dφ2

d dΘ(θ)

sin θ (sin θ ) + [n(n + 1) sin2 θ − m2 ]Θ(θ) = 0 (2.30)

dθ dθ

d 2 dR(r)

(r ) + [k 2 r2 − n(n + 1)]R(r) = 0 (2.31)

dr dr

1 1 d2 T (t)

= −k 2 (2.32)

T (t) c2 dt2

Solution to Equation 2.32 is

T (t) = T1 ejωt + T2 e−jωt . (2.33)

The ﬁrst solution is taken for the time dependence, with T2 = 0 since e−jωt represents a wave

propagating backward in time and hence, has no signiﬁcance [29].

Solutions to Equations 2.29 and 2.30 are combined into single function, called spherical

harmonics [27]. The spherical harmonics Ynm are deﬁned by

�

2n + 1 (n − m)! m

Ynm (θ, φ) = P (cos θ)ejmφ . (2.34)

4π (n + m)! n

Here n is a non-negative integer and is called order of the spherical harmonics. m is termed

as degree of the spherical harmonics that takes values in −n ≤ m ≤ n. Pnm is the associated

Legendre function of ﬁrst kind. The constant term in spherical harmonics makes the spherical

harmonics function orthonormal. The constant term arises from orthogonal properties of

Legendre functions Pnm (cos θ) and exponential functions ejmφ [32, p. 38]. For negative m, the

spherical harmonics take the form as

� 2π � π

�

Ynm (θ, φ)Ynm� ∗ (θ, φ) sin θdθdφ = δnn� δmm� (2.36)

0 0

2.5 Solution to Wave Equation in Spherical coordinates 15

1 for m = n

δmn � (2.37)

0 otherwise

The spherical harmonics act as basis function for spherical harmonics decomposition of a

square integrable function, similar to complex exponential ejωt acting as basis for decompo-

sition of real periodic functions [33]. Figure 2.4 shows the plot of three spherical harmonics.

The radius shows the magnitude and color indicates the phase. It is to be noted that Y00 is

isotropic while Y10 and Y11 have directional characteristics.

transforming it into spherical Bessel’s equation [27, p. 193]. Hence, the solutions are given as

where jn (kr) and yn (kr) are spherical Bessel functions of the ﬁrst kind and second kind,

respectively. Alternatively, solution can also be written using spherical Hankel function as

follows.

R(r) = R3 h1n (kr) + R4 h2n (kr) (2.39)

where h1n (kr) and h2n (kr) are spherical Hankel function of ﬁrst and second kind respectively.

It may be noted that

2.5 Solution to Wave Equation in Spherical coordinates 16

represents incoming wave. In the rest of the thesis, hn (kr) will be used for spherical Hankel

function of ﬁrst kind, h1n (kr). The spherical Bessel and Hankel functions are related to Bessel

and Hankel functions as

π 1/2

jn (x) ≡ ( ) Jn+1/2 (x) (2.40)

2x

π

yn (x) ≡ ( )1/2 Yn+1/2 (x) (2.41)

2x

π 1/2

h1n (x) ≡ jn (x) + jyn (x) = ( ) [Jn+1/2 (x) + jYn+1/2 (x)] (2.42)

2x

π

h2n (x) ≡ jn (x) − jyn (x) = ( )1/2 [Jn+1/2 (x) − jYn+1/2 (x)] (2.43)

2x

where j is unit imaginary number. Jn+1/2 (.) is the half odd integer order Bessel function of

the ﬁrst kind and Yn+1/2 (.) is half odd integer order Bessel function of the second kind (also

known as Neumann function).

Finally a general solution to Equation 2.27 with ejωt implicit, can be written as

∞ �

� n

� �

P (r, θ, φ, ω) = Amn jn (kr) + Bmn yn (kr) Ynm (θ, φ) (2.44)

n=0 m=−n

∞ �

� n

� �

P (r, θ, φ, ω) = Cmn h1n (kr) + Dmn h2n (kr) Ynm (θ, φ) (2.45)

n=0 m=−n

for traveling wave solution, where the co-eﬃcients Amn , Bmn , Cmn and Dmn are generally

complex valued. Note that the solutions given by Equations 2.44 and 2.45 are frequency

domain representation of p(r, θ, φ, t), where the temporal dependency is implicit in frequency

dependence of the co-eﬃcients [27, p. 186].

In this Section, plane wave propagation is studied in presence of scattering. Scattering prob-

lems concern propagation of wave that collide with some object. In particular, we consider

scattering of plane waves from a rigid sphere. This is because most of the thesis is cen-

tered around spherical microphone array processing where acoustic sensors (microphones)

are embedded on a rigid sphere.

A rigid spherical microphone array with radius ra and I microphones is taken into con-

sideration. For plane wave propagation (far-ﬁeld scenario), ﬁnding pressure becomes interior

2.5 Solution to Wave Equation in Spherical coordinates 17

problem, in which case the only spherical Bessel function is included from the general solu-

tion in Equation 2.44. Let us consider a far-ﬁeld source incident from direction (θl , φl ) at

a point ri = (r, θi , φi ), with r ≥ ra . As discussed in Section 2.4.1, the wave propagates in

opposite direction denoted by (θp , φp ) = (π − θl , π + φl ). Hence, the pressure on an open

sphere (imaginary) due to unit amplitude plane wave, can be written in terms of spherical

harmonics as [27, p. 227],

∞

� n

�

Tr

ejk i

= 4π j n jn (kr) [Ynm (θp , φp )]∗ Ynm (θi , φi ). (2.46)

n=0 m=−n

The Equation 2.46 can be derived as in [34], and it is called Jacobi-Anger expansion. The

relations between spherical harmonics at point of opposite direction suggests [35]

From Equations 2.9 and 2.10, it is clear that sign of wavenumber changes when we use

direction of arrival in place of direction of propagation. Hence, utilizing Equations 2.9, 2.10,

2.47 and reﬂection formula of spherical Bessel function, jn (−z) = (−1)n jn (z), pressure in

Equation 2.46 can be re-written as

∞

� n

�

−jkT

e l ri = 4π n

j jn (kr) [Ynm (θl , φl )]∗ Ynm (θi , φi ). (2.48)

n=0 m=−n

The pressure in equation 2.48 represents pressure without any scatterer in place. In this

case the microphone array is called open sphere, where just the microphones are placed at I

locations. However, for the case of rigid sphere, resultant of incident and scattered pressure

should be taken into consideration. The pressure due to scattering is exterior problem and

hence, solution will include h1n (kr) from the general solution in Equation 2.45. Utilizing the

boundary condition of zero radial velocity for rigid sphere, the pressure in Equation 2.48 can

be re-written as [27],

Tr

∞

� � j � (kra ) � �

n

e−jkl i

= 4π j n jn (kr) − n� hn (kr) [Ynm (θl , φl )]∗ Ynm (θi , φi ). (2.49)

hn (kra ) m=−n

n=0

Combining the Equations 2.48 and 2.49, the plane wave model in spherical coordinate can

be written as,

∞ �

� n

Tr

e−jkl i

= bn (k, r)[Ynm (θl , φl )]∗ Ynm (θi , φi ) (2.50)

n=0 m=−n

2.5 Solution to Wave Equation in Spherical coordinates 18

� j � (kra ) �

= 4πj n jn (kr) − n� hn (kr) , rigid sphere (2.51)

hn (kra )

where r ≥ ra . Figure 2.5 illustrates mode strength bn as a function of kr and n for an

open sphere. For kr = 0.1, zeroth order mode amplitude is 22 dB, while the ﬁrst order

has amplitude −8 dB. Hence, for order greater than kr, the mode strength bn decreases

signiﬁcantly. Therefore, the summation in Equation 2.50 can be truncated to some ﬁnite

N ≥ kr, called the array order.

40

20 n=0

0

n=1

b (kr) in dB

−20

n=2

−40

n

−60 n=3

−80

n=4

−100

−120 −1 0 1

10 10 10

kr

Figure 2.5: Variation of mode strength bn in dB as a function of kr and n for an open sphere.

Similar to plane wave, spherical wave also be expanded in terms of spherical harmonics

using Jacobi-Anger expansion [34]. Looking at Equations 2.25 and 2.50, the pressure at ith

microphone due to lth unit amplitude source located at rl , is given in terms of spherical

harmonics as [24, 36]

N n

e−jk|ri −rl | � �

= bn (k, ra , rl )Ynm (θl , φl )∗ Ynm (θi , φi ). (2.52)

|ri − rl | m=−n n=0

where bn (k, ra , rl ) is nth order near-ﬁeld mode strength. It is related to far-ﬁeld mode strength

bn (k, ra ) as

2.5 Solution to Wave Equation in Spherical coordinates 19

The far-ﬁeld mode strength bn (k, r) is given in Equation 2.51. These ﬁnal expressions for

plane wave and spherical wave in terms of spherical harmonics will be used later to derive

some useful results.

In this Section, the criterion for near-ﬁeld and far-ﬁeld source localization based on the range

is discussed. Spherical wavefronts are assumed when sources are in near-ﬁeld regions. On

the other hand, plane waves are assumed when sources are in far-ﬁeld. The near-ﬁeld and

far-ﬁeld criterion, in general, is determined by the Fresnel and Fraunhofer distances [37]. The

near-ﬁeld Fresnel region is deﬁned by

�

D3 2D2

0.62 < rl < (2.54)

λ λ

where D is array aperture and rl is distance of source from the array. The region deﬁned

2D 2

by rl > λ corresponds to far-ﬁeld Fraunhofer region. However, these parameters do not

indicate the near-ﬁeld range of spherical microphone array.

50 n=0

n=1

0

n=2

Magnitude(dB)

n=3

−50

n=4

−100

Near−field

−150 Far−field

−200

−250 −1 0 1 Kmax

10 10 10

k

Figure 2.6: Plot showing the nature of far-ﬁeld and near-ﬁeld mode strength for the Eigenmike

system. Near-ﬁeld source is at rl = 1m and order is varied from n = 0 to n = 4.

The near-ﬁeld criteria for spherical array is presented in [38], based on similarity of near-

ﬁeld mode strength (|bn (k, ra , rl )|) and far-ﬁeld mode strength (|bn (k, ra )|). The two functions

start behaving in a similar manner at krl ≈ N , for array of order N . This is illustrated in

Figure 2.6 for rigid sphere Eigenmike system [39] with rl = 1m and order varying from n = 0

2.6 Summary 20

N

rN F ≈ . (2.55)

k

But rN F ≥ ra , hence for a source to be in near-ﬁeld, the range of the source should satisfy

kmax

ra ≤ rl ≤ ra (2.56)

k

N

with kmax = ra .

2.6 Summary

In this chapter, the spherical coordinate system and wave propagation model is described.

The solution to plane wave and spherical wave is provided in Cartesian and spherical co-

ordinate system. Spherical harmonics and mode strength are introduced along with their

signiﬁcance in the context of this thesis. These concepts play a signiﬁcant role in solution to

the source localization problem. It is also utilized in formulation of steering vector. Concepts

of open sphere and rigid sphere is also discussed along with the deﬁnition of the near-ﬁeld

criterion for spherical microphone array.

Chapter 3

Processing Techniques

3.1 Introduction

Microphone arrays utilize a large number of microphones to exploit the additional spatial

information available. The signal acquired over microphone array is spatially sampled by the

microphones, thus generating diversity in the space domain in terms of time delays. This is

similar to traditional digital signal processing where diversity is present in the time domain.

The signal is sampled at diﬀerent time instants here. This time sampling allows to design a

FIR ﬁlter to select particular frequencies. On the other hand, the spatial sampling allows the

design of a spatial ﬁlter to pass sources from certain directions while rejecting sources from

other directions. This spatial ﬁltering technique is also called beamforming.

In the previous Chapter, propagation of sound wave and a solution to the wave equation

was discussed. In this Chapter, we express the pressure due to far-ﬁeld sources, received

at microphone array in form of a data model. The data model is derived from ﬁrst prin-

ciples of physics. A brief discussion on commonly used source localization methods is also

provided using uniform linear arrays. They can be broadly divided as covariance-based,

beamforming-based and subspace-based source localization. In the following section, various

array geometries are introduced ﬁrst.

3.2 Geometry of Microphone Array 22

Microphones can be arranged in various geometries to acquire the acoustic signal. Linear,

planar and spherical microphone arrays are widely studied. Although linear microphone

array is simple in structure and processing, it is limited by front-back ambiguity. Planar

arrays overcomes the front-back ambiguity, however, it suﬀers from up-down ambiguity. The

spherical microphone array can localize sources anywhere in space with no spatial ambiguity.

The usefulness and the limitations of uniform linear array, planar array (in particular, circular

array) and spherical microphone array, are detailed in this Section.

S1

M0 M1

Y d

X

M0 M1 M2 M3

d S2

Figure 3.1: Uniform Linear Array geometry. Figure 3.2: Front back ambiguity in ULA.

The simplest array conﬁguration is that of a uniform linear array (ULA). Figure 3.1 shows a

ULA with four microphones placed uniformly on x-axis. The distance between two consecu-

tive microphone is d. A far-ﬁeld source is incident on the array at an azimuthal angle φ. The

extra distance traveled by the wavefront between two consecutive microphones is d cos(φ).

It is to be noted that conﬁguration and hence localization problem formulation for an ULA

is simple. However, it suﬀers from front-back ambiguity. This is also called north-south am-

biguity. The front-back ambiguity is illustrated in Figure 3.2. It can be noticed from Figure

3.2 that a ULA can localize sources only in its own plane with azimuth ranging in [0, π]. Also,

3.2 Geometry of Microphone Array 23

it can not diﬀerentiate between the two positions S1 and S2. It means ULA is capable of

estimating the incident angle with x-axis, however, it is unable to locate which side around

the x-axis. This is called front-back ambiguity.

In a uniform circular array (UCA), microphones are placed uniformly in a circular fashion

as shown in Figure 3.3. A UCA can localize sources with any azimuth, i.e. φ ∈ [0, 2π] and

elevation ranging from 0 to π/2. Although circular array does not suﬀer from front-back

ambiguity, it is limited by up-down ambiguity [40]. Another advantage is that UCA is much

more compact than ULA for same number of microphones and spatial aliasing condition.

Z

The spherical microphone array, can localize sources anywhere in the space. Hence, the

spherical microphone array is capable of measuring and analyzing three dimensional sound

ﬁeld in an eﬀective manner. It is more compact than the UCA. An Eigenmike system [39]

is a spherical microphone array with 32 microphones embedded on rigid sphere of radius 4.2

cm. An Eigenmike system from mh-acoustics is shown in Figure 3.4.

3.3 Microphone Array Data Model 24

In this Section, data model for signals acquired over a microphone array is discussed. A

few assumptions are made to make the formulation analytically tractable. The sources are

assumed in far-ﬁeld of the array. The transmission medium is assumed to be isotropic and

non-dispersive. These assumptions allow a straight line propagation model. The sources are

assumed to be narrowband. The narrowband signal assumption is discussed ﬁrst, prior to

the development of the array data model.

The complex envelope representation of narrowband signal is given as [41],

where x(t) represents the complex envelope of the signal. x(t) and y(t) are slowly varying

functions of time that deﬁnes the amplitude and phase of s(t), and ωc is known center

frequency. Narrowband assumption implies

for all possible propagation delays τ through the array elements. It is reasonable to assume

that the envelope does not change signiﬁcantly, as it traverses from reference point through

the array. In frequency domain,

3.3 Microphone Array Data Model 25

It can be noted from Equation 3.3 that for narrowband condition to hold, the product of

frequency and group delay for amplitude envelope has to be negligible. Similarly, the prod-

uct of frequency and phase delay for phase envelope should be negligible. Mathematically,

following condition should be satisﬁed [42] for narrowband assumption.

ωτ � 1 (3.4)

Assuming slow varying nature of amplitude and phase as suggested in Equation 3.2, the

delayed signal in Equation 3.1 can be written as

From the Equation 3.5, it may be noted that the eﬀect of time delay on received waveform

is simply a phase shift.

Now we consider an arbitrary microphone array with I identical and omnidirectional

microphones. The position vector of ith microphone is given by ri = (ri , Φi )T where Φi =

(θi , φi ) is the angular location and (.)T denotes the transpose of (.). A narrowband sound ﬁeld

of L plane-waves is incident on the array. The direction of arrival of the lth source is denoted

by Ψl = (θl , φl ). For planar wavefront (far-ﬁeld case), the instantaneous pressure amplitude

at the ith microphone due to lth source is sl (t − τi (Ψl )), where τi (Ψl ) is delay of arrival at

ith microphone w.r.t. some reference point, for the lth source sl (t). Note that the reference

point can be any point in space or one of the microphones. However, in general practice,

the reference point is taken to be the array centroid. The total pressure at ith microphone

amounts to be [21, 43]

L

� � �

pi (Ψ; t) = αi (Ψl )sl t − τi (Ψl ) + vi (t) (3.6)

l=1

where αi (Ψl ) is the temporal Green’s function of ith sensor for lth source and vi is uncorrelated

sensor noise component. The noise is assumed to be baseband additive white Gaussian noise

of power σ 2 . The data model in Equation 3.6 is known as anechoic data model [44].

Utilizing the narrowband approximation result in Equation 3.5, the pressure at ith mi-

crophone can be re-written as

L

�

pi (Ψ; t) = αi (Ψl )sl (t)e−jωc τi (Ψl ) + vi (t) (3.7)

l=1

3.3 Microphone Array Data Model 26

the ith microphone in Equation 3.7, can be written as

L

� Tr

pi (Ψ; t) = sl (t)e−jkl i

+ vi (t) (3.8)

l=1

where p(t) = [p1 (t), p2 (t), . . . , pI (t)]T , A(Ψ, k) is I × L steering matrix (also called array

manifold), s(t) = [s1 (t), s2 (t), · · · , sL (t)]T is matrix of signal amplitudes at the reference point

and v(t) is baseband additive white Gaussian sensor noise. The steering matrix A(Ψ, k) is

expressed as

� �

A(Ψ, k) = a(Ψ1 , k) a(Ψ2 , k) . . . a(ΨL , k) (3.10)

� �T

a(Ψl , k) = e−jkTl r1 e−jkl

Tr

2 . . . e−jkl

Tr

I . (3.11)

The steering vector is also called the array manifold vector in literature. Both these phrases

are used interchangeably in the thesis. Utilizing the Equation 2.12, the steering vector ex-

pression, can also be written as collection of phase shifts given by

� �T

al (Ψl , k) = e−jωc τ1 (Ψl ) e−jωc τ2 (Ψl ) · · · e−jωc τI (Ψl ) (3.12)

The Equation 3.9 is referred as spatio-temporal narrowband data model in most general form.

This can be utilized for any array conﬁguration.

For an ULA shown in Figure 3.1, the position vector of ith microphone can be given as

� �T

ri = (i − 1)d 0 0 . (3.13)

It is to be noted that the elevation angle θ is 90◦ for an ULA. Therefore, the wavevector from

Equation 2.10, can be as

� �T

kl = − k cos φl k sin φl 0 (3.14)

3.3 Microphone Array Data Model 27

Hence, utilizing Equation 2.12, the propagation delay at the ith microphone can now be

written as

−(i − 1)d cos φl

τi (Ψl ) = , i = 1, 2, · · · , I. (3.15)

c

Utilizing this in Equation 3.12, the steering vector for ULA takes the form as

� �T

al (φl , k) = 1 ejkd cos φl ej2kd cos φl · · · ej(I−1)kd cos φl (3.16)

ωc

where k = c . It can be observed that steering vector for ULA exhibits Vandermonde

structure.

For UCA shown in Figure 3.3, the wavevector will remain same as in Equation 2.10.

Microphones are uniformly distributed to form circular array with radius ra . Noting that

the elevation angle θ is 90◦ for all the microphones over UCA, the position vector of ith

microphone is given as

� �T

ri = ra cos φi , ra sin φi , 0 (3.17)

Hence, the steering vector for UCA is given as Equation 3.12, with propagation delay as

τi (Ψl ) = . (3.18)

c

The steering matrix for spherical microphone array is discussed in Chapter 5. A detailed

derivation of steering matrix has been provided in spherical harmonics domain in Section 5.3.

the time sampled signal [45]. For microphone arrays, the signal is spatially sampled using

microphones. A similar condition on spatial sampling frequency exists for sensor arrays.

Nyquist sampling theorem suggests

1

fs = ≥ 2fmax (3.19)

Ts

component in frequency spectrum of signal. Similarly, for spatial sampling using ULA, we

have the requirement [46],

1

f xs = ≥ 2fxmax (3.20)

d

3.3 Microphone Array Data Model 28

where fxs is spatial sampling frequency in samples per meter, d is spatial sampling period and

fxmax is the highest spatial frequency component present in spatial spectrum of the signal.

The spatial frequency (number of cycles per meter) in x-axis is given by

sin θ cos φ

f xs = . (3.21)

λ

1

fxmax = . (3.22)

λ

Utilizing Equations 3.20 and 3.22, the Nyquist condition for alias free spatial sampling is

given by

λ

d≤ (3.23)

2

This can be interpolated as Nyquist sampling theorem in spatial domain.

For UCA, a similar condition is found for spacing between two consecutive array elements.

However, the Nyquist sampling is elucidated using phase mode excitation of UCA [47, 48].

The highest phase mode (M ) that can be excited in a UCA with I elements is given as

I ≥ 2M + 1 (3.24)

with M = kra . This reduces the aliasing in beampattern. The condition in Equation 3.24

can be simpliﬁed to [48]

dcir ≤ λ/2 (3.25)

2πra

where dcir = I is circumferential spacing between adjacent array elements.

For spherical microphone array of order N and radius ra , there is no signiﬁcant spatial

aliasing when working in the range N ≥ kr [49]. The terms are elaborated in Section 2.5.1.

It is to be noted that spatial sampling theorem is formulated with respect to a narrowband

signal. The wideband nature of speech sounds allows increase of the microphone array spacing

beyond the Nyquist limit without suﬀering the aliasing artifacts [50]. Additionally, the eﬀect

of spatial aliasing is observed as false peak in the spatial spectrum. In beampattern plots,

the eﬀect is seen as introduction of grating lobes in visible range of array. This is illustrated

in Figure 3.13.

3.3 Microphone Array Data Model 29

The intelligibility of signal acquired over microphone array is aﬀected by noise and rever-

beration. The additive acoustic noise is undesired external disturbances, present as sound

events. This can be observed during the silence periods of the source signal. The multi-path

propagation phenomenon of sound waves is called reverberation. The eﬀect of reverberation

can be seen as smearing of the speech in spectrogram and time domain waveform [51].

The acoustic noise does not refer to concrete statistical, frequency, spatial or propagation

characteristics. Hence, the noise could be either stationary or nonstationary. Also, the noise

could be directional (an interfering speech source) or non-directional (background noise).

The noise may also be thermal noise generated by the sensor circuitry. This sensor noise and

background noise are considered to be spatially white. The noise at diﬀerent microphones

are assumed to be uncorrelated. The noise is also considered to be uncorrelated with the

desired sources. The noise used in this thesis will be sensor noise with noise variance σ 2 .

Reverberation arises because of multi-path propagation. The data model in Equation

3.6 takes only the direct-path into account. However, in practice, signal propagation follows

multi-path in reverberant environment. Hence, the recorded signal consists of contributions

from direct and multi-path. Due to multi-path propagation, the sound persists in space even

after original sound from the source has vanished. The duration for which the sound exists

up to a minimum audible range, is called reverberation time. In particular, it is measured as

T60 , which is deﬁned as time required for a sound in room to decay by 60dB [52].

The data model presented in Equation 3.6 includes only direct-path and is not valid in case

of reverberation. The received signal at the ith microphone under reverberant environments

is given by [44]

L

�

pi (t) = hil ∗ sl (t) + wi (t) , t = 1, 2, · · · , Ns (3.26)

l=1

where hil is the room impulse response (RIR) between ith microphone and lth source. The

∗ symbol denotes the convolution. The impulse response in a room consists of direct sound,

early reﬂections and late reﬂections [51] as illustrated in Figure 3.5.

The initial region in room impulse response with nearly zero amplitude is followed by a

peak. This region corresponds to direct-path propagation. The amplitude of the peak due

3.3 Microphone Array Data Model 30

0.04 n

d

Impulse Response

0.03

0.02

0.01

−0.01

0 0.05 0.1 0.15 0.2 0.25

Early Time(s)

Reflection Late Reflection

T

60

Figure 3.5: Illustration of various regions in a typical room impulse response (RIR).

to direct-path propagation may be greater or less than the amplitude of the late reﬂections

depending on the distance of the source from the microphone. Strong direct-path means

the source is close to the microphones. The early reﬂection are often taken as the ﬁrst

50 ms of the impulse response. They mainly originate from the ﬁrst order reﬂection, have

directionality and are highly correlated with the direct signal.The remaining part is referred as

late reﬂections with suﬃciently smaller magnitude. The late reﬂections can also be conceived

as spatially white noise.

Reverberation is measured using reverberation time T60 or direct to reverberant energy

ratio (DRR). DRR is deﬁned as

� � �

nd

t=0 h2 (t)

DRR = 10 log10 �∞ 2

dB (3.27)

t=nd +1 h (t)

where samples of h(t) up to nd represents direct-path propagation, while samples with indices

greater than nd represent only the reverberation due to reﬂected paths. With an increase in

DRR or decrease in the reverberation time, the room impulse response h comes very close to

a delta function improving the accuracy of the DOA estimation.

3.4 Acoustic Source Localization 31

phone array processing is utilized. The central problem in all such applications is source

localization. In this Section, a brief background on various methods of source localization

is provided. The acoustic pressure described by Equation 3.9 is utilized herein to provide

a background to source localization using a ULA. Diﬀerent methods of source localization

using ULA are discussed in the ensuing Sections.

In correlation-based source localization method, the time delay of arrival (TDOA) is com-

puted between a pair of microphones. The time delay corresponds to lag at which the

cross-correlation is maximum [44]. The DOA is estimated from time delay using the relation

in 3.15. A ULA consisting of I microphones with distance between two consecutive micro-

��

phones as d, is considered. Total number of microphone pairs is I2 which is the number of

combinations of I taken 2 at a time. Several variants of correlation-based source localization

are discussed herein.

10

Original Signal

−10

0 50 100 150 200 250 300 350 400 450 500

10

Delayed Signal

−10

0 50 100 150 200 250 300 350 400 450 500

Samples−−>

Figure 3.6: Voiced frame of a speech signal of length 512 samples, original signal (top) and

signal delayed by 40 samples (bottom).

3.4 Acoustic Source Localization 32

The cross-correlation between two observed signals, p1 (t) and p2 (t) is deﬁned as

where lg is the lag and (.)∗ denotes complex conjugate. In practice, the cross-correlation is

estimated for any two ﬁnite signals as

Ns

�

r̂pP1Tp2C (lg ) = p1 (t)p∗2 (t − lg ). (3.29)

t=−Ns

The cross-correlation r̂pP1Tp2C attains its maximum when lg equals the actual delay τ . The

proof can be seen in [44]. Hence, the TDOA can be estimated as

1

τ̂ P T C = argmax r̂pP1Tp2C (lg ). (3.30)

fs lg

where fs is the sampling rate. DOA is now estimated using Equation 3.15. The concept is

illustrated using Figures 3.6 and 3.7. Two observed signals with time lag 40 samples, is shown

in Figure 3.6. Their cross-correlation is plotted in Figure 3.7. The peak in cross-correlation

plot can be observed at a lag equal to 40 in Figure 3.7. The plain time correlation method is

simple to implement. However, its performance is limited by factors like signal self correlation

and reverberation.

4000

Time Correlation

X: 472

Y: 3491

2000

−2000

−4000

0 100 200 300 400 500 600 700 800 900 1000

4000

Time Correlation

X: −40

Y: 3491

2000

−2000

−4000

−500 −400 −300 −200 −100 0 100 200 300 400 500

Lag−−>

3.4 Acoustic Source Localization 33

4000

X: −40

GCC

Y: 3491

2000

0

−500 −400 −300 −200 −100 0 100 200 300 400 500

GCC−Roth

2

X: −40

Y: 1

0

−500 −400 −300 −200 −100 0 100 200 300 400 500

GCC−Phat

1 X: −40

Y: 1

0.5

0

−500 −400 −300 −200 −100 0 100 200 300 400 500

Lag−−>

Figure 3.8: Generalized cross-correlation (GCC), GCC-Roth and GCC-PHAT plots (top to

bottom)

Generalized cross-correlation (GCC) was introduced to overcome the limitation of plain time

correlation [53]. It implements frequency domain cross-spectrum with a weighting function.

Let the discrete Fourier transform (DFT) of signal output from two microphones, be repre-

sented by p1 (k) and p2 (k). The general expression for GCC is given by

rpGCC

1 p2

(lg ) = F −1 {w(k)p1 (k)p∗2 (k)} (3.31)

where F −1 stands for inverse discrete-time Fourier transform and w(k) is weighting func-

tion. The term w(k)p1 (k)p∗2 (k) is called generalized cross-spectrum. The TDOA estimate is

obtained from the lag time that maximizes the generalized cross-correlation, as

1

τ̂ GCC = argmax rpGCC

1 p2

(lg ) (3.32)

fs lg

For w(k) = 1, GCC degenerates to cross-correlation with implementation through DFT

and inverse DFT (IDFT).

3.4 Acoustic Source Localization 34

one of the signal. Hence, the Roth ﬁlter is given by

1

wROT H (k) = (3.33)

p1 (k)p∗1 (k)

For reverberant environments, phase transform (PHAT) [53] weighting function is used for

TDOA estimation using GCC. The PHAT weighting function is given by

1

wP HAT (k) = (3.34)

|p1 (k)p∗2 (k)|

The PHAT ﬁlter normalizes the amplitude of the spectral density of the two signal and utilizes

only the phase information for computing the cross-correlation. Figure 3.8 plots all variants

of GCC for the signals shown in Figure 3.6.

Beamforming is a spatial ﬁltering technique where signal from a given direction is passed

undistorted, while signals from all other directions are attenuated. It is equivalent to forming

a beam in the look direction which is done by weighting and summing the array outputs.

This is illustrated in Figure 3.9.

The beamformed array output is given by

3.4 Acoustic Source Localization 35

� �T

where w = w1 w2 · · · wI is beamforming weight vector and (.)H denotes conjugate

transpose of (.). Power spectrum of the spatially ﬁltered signal

should give peak in DOAs for sources located in the ﬁeld of view of the array. This tech-

nique is used in beamforming-based source localization. Diﬀerent choice of weights leads to

diﬀerent beamforming techniques. Two prominent beamforming techniques are presented in

the ensuing Sections.

suﬀers diﬀerent delays at diﬀerent microphones. The array output is delayed so that signal

from desired direction is aligned. The aligned signals are summed, to realize a delay-and-sum

beamformer (DSB). This is illustrated in Figure 3.10.

t2

t1

S

O t2

U

R

C

E t1

t2

t1

N

O

I

S 0

E

3.4 Acoustic Source Localization 36

w

a(φ, k)

w= , (3.38)

I

where a(φ, k) is steering vector deﬁned by Equation 3.16. The solution doesn’t depend upon

the input signal and only takes into consideration the steering vector of the signal of interest.

Hence, the Delay-and-sum beamformers are not adaptive. The spatial power spectrum for

DSB from Equation 3.36, can now be written as

1

It is to be noted that the I2

term is removed from the power spectrum, which does not aﬀect

the DOA estimation in any way. Delay-and-sum beamforming DOA estimates are given by

the location of L highest peaks corresponding to L sources in DSB spatial power spectrum.

DSB based soured localization is inconsistent when multiple sources are present. Bias of the

estimates also become signiﬁcant for closely spaced and correlated sources.

beamformer or Capon beamformer [2] is adaptive in the sense that it takes into account the

signal characteristics along with the steering vector of the signal of interest. The capon

spatial ﬁlter design problem is based on maximizing the signal to interference plus noise ratio

(SINR). SINR is deﬁned as

E|wH a(φ)s(t)|2 σ 2 |wH a(φ)|2

SINR = H 2

= s H (3.40)

E|w v(t)| w Rv w

where σs2 is signal power for an individual source signal and Rv = E[v(t)vH (t)]. Maximizing

SINR results in minimizing wH Rv w or minimizing the variance of wH v. Also, distortionless

response gives wH a(φ) = 1. Hence, minimum variance distortionless response formulation of

capon beamformer is given by

w

3.4 Acoustic Source Localization 37

Rv −1 a(φ)

w= (3.42)

aH (φ)Rv −1 a(φ)

in ﬁnal form of weight vector as

Rp −1 a(φ)

w= (3.43)

aH (φ)Rp −1 a(φ)

Utilizing the expression for MVDR weights in Equation 3.36, the spatial power spectrum for

MVDR can written as

1

PM V DR = . (3.44)

aH (φ)Rp −1 a(φ)

The MVDR DOA estimates can be given as L largest peaks in the MVDR power spectrum

corresponding to L sources.

The MVDR ﬁlter steered to certain direction φ, attenuates any other signal impinging on

the array from a DOA�= φ. The DSB ﬁlter pays uniform attention to all other DOAs �= φ

also. DOA estimation using DSB and MVDR power spectrum is illustrated in Figure 3.11.

A ULA with 10 microphones was used. The sources are assumed to be at 20◦ and 60◦ .

120 1.4

100 1.2

1

80

0.8

PMVDR

PDSB

60

0.6

40

0.4

20 0.2

0 0

0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 180

Azimuth(φ) Azimuth(φ)

(a) (b)

Figure 3.11: DOA estimation using (a) DSB and (b) MVDR method. A ULA with I = 10

microphones was used for sources located at 20◦ and 60◦ .

3.4 Acoustic Source Localization 38

0 0

−30 30

−20

−60 60

G(φ,φs) dB

−40

−90 90

−60

0 20 40 60 80 100 120 140 160 180

Azimuth(φ) 180

(a) (b)

Figure 3.12: Delay-and-sum beampattern for ULA with no spatial aliasing for I = 10, φs =

90◦ and d = 0.5λ (a) in Cartesian coordinates and (b) in polar coordinates.

also called directivity pattern, array pattern or spatial pattern. Beampattern analysis gives

an insight into the design of spatial ﬁlters. For given weight vector w of a beamformer,

beampattern speciﬁes the response of the beamformer to a source arriving from the arbitrary

direction in the ﬁeld of view of the array. Beampattern is typically measured as the array

response to a single plane wave [48]. Hence, the beamformed output can be written as

Formulation of delay-and-sum beampattern for a ULA is now presented herein. A ULA

aperture, steered to direction φs is considered. The beampattern for such ULA can be written

as

G(φ, φs ) = |wH (φs )a(φ, k)| (3.46)

Utilizing the DSB weight from Equation 3.38, the beampattern for the ULA is given by

1 H

G(φ, φs ) = |a (φs , k)a(φ, k)| (3.47)

I

3.4 Acoustic Source Localization 39

where |(.)| is absolute value of (.). Substituting the expression for steering vector from

Equation 3.16, beampattern for a ULA can be written as

I

1 � j(i−1)kd(cos(φ)−cos(φs ))

G(φ, φs ) = e

I

i=1

� � � �� �

� sin Ikd cos(φ) − cos(φ s ) �

� 2

� � � �

=� � (3.48)

� I sin kd cos(φ) − cos(φ )� �

2 s

Beampatterns for diﬀerent beamformers and for diﬀerent array aperture can be formulated

in the similar lines. Narrowband beampatterns of a delay-and-sum beamformer, is illustrated

in Figure 3.12 without spatial aliasing and in Figure 3.13 under aliasing. A ULA with 10

microphones are used for this, with steering angle φs = 90◦ . Delay-and-sum beampattern

for UCA is also plotted in Figure 3.14. A 10 element UCA was used with steering direction

as Ψs = (45◦ , 90◦ ). The beampatterns for spherical microphone array will be presented in

Chapters 5 and 7, in spherical harmonics domain. Additional details on other parameters of

microphone array, can be found in [56].

0 0

−30 30

−10

−60 60

−20

G(φ,φs)

−90 90

−40

0 20 40 60 80 100 120 140 160 180

Azimuth(φ) 180

(a) (b)

Figure 3.13: Delay-and-sum beampattern for ULA under aliasing for I = 10, φs = 90◦ and

d = 2λ (a) in Cartesian coordinates and (b) in polar coordinates.

used in many of the applications, they are often limited by resolution ability. These meth-

3.4 Acoustic Source Localization 40

(a) (b)

(45◦ , 90◦ ), under no spatial aliasing (a) in spherical coordinate system (b) in rectangular

coordinate system

ods fail in multi-source environments when sources are closely spaced. The limitation arises

because these methods do not exploit the sensor array data eﬃciently. Schmidt proposed

MUltiple SIgnal Classiﬁcation (MUSIC) algorithm [3], based on decomposition of array co-

variance matrix into noise and signal subspace. The geometrical interpretation of MUSIC

algorithm was also given in [3]. The MUSIC algorithm forms the basis of various other

subspace-based methods like root-MUSIC [20], Estimation of Signal Parameters via Rota-

tional Invariance Techniques (ESPRIT) [21], minimum-norm [57] and MUSIC-Group delay

[19]. In the following sections we present MUSIC, root-MUSIC and MUSIC-Group delay for

ULA.

The MUSIC algorithm is high resolution source localization method, which utilizes the eigen

structure of the input covariance matrix. However, it requires very precise and accurate array

3.4 Acoustic Source Localization 41

calibration. The narrowband data model for ULA from Equation 3.9 can be re-written as

p1 (t) s1 (t)

� �

p2 (t) s (t)

= a (φ , k) a (φ , k) · · · a (φ , k) 2 + v(t) (3.49)

.. 1 1 2 2 L L ..

. .

pI (t) sL (t)

Geometrically, received data p and steering vector al can be seen as vectors in I dimensional

space, and p is linear combination of steering vectors with sl as co-eﬃcients.

The array covariance matrix can be written as

= ARs AH + σ 2 I

= Ri + σ 2 I (3.51)

E[|s1 |2 ] ... ... 0

0 E[|s2 |2 ] ... 0

H

Rs = E[ss ] = (3.52)

... ... ... 0

... ... ... E[|sL |2 ],

and I is identity matrix. It can be noted that Rs is an L × L diagonal matrix that has

all the eigenvalues (diagonal element) positive, making Rs to be positive deﬁnite matrix.

Steering matrix A comprises of steering vectors which are linearly independent. Hence, A

has full column rank. Full column rank of A and positive deﬁniteness of Rs guarantees that,

when number of sources L is less than number of sensors I, the I × I matrix Ri is positive

semideﬁnite with rank L. It implies that I −L eigenvalues of Ri will be zero. Hence, assuming

qu to be uth eigenvector corresponding to zero eigenvalue, we have

Ri qu = ARs AH qu = 0 (3.53)

qH H

u ARs A qu = 0 (3.54)

3.4 Acoustic Source Localization 42

AH q u = 0 (3.55)

aH

l (φl )qu = 0∀l = 1, 2, · · · , L and ∀u = 1, 2, · · · , I − L. (3.56)

Equation 3.56 implies that all the (I − L) noise eigenvectors (qu ) are orthogonal to the L

steering vectors. All such noise eigenvectors are denoted by Qn , as a I × (I − L) matrix. Qn

is called the noise subspace. Now, the MUSIC spectrum is formulated as

1 1 1

PM U SIC (φ) = �I−L = = . (3.57)

u=1 |aH (φ)qu |2 ||QH

n a(φ)||

2 (aH (φ)Q H

n Qn a(φ))

As the noise eigenvector is orthogonal to steering vector, the denominator becomes zero

for φ = DOA. Hence, the DOA is estimated from the L largest peaks in MUSIC spectrum

corresponding to L incident sources.

It is to be noted that in practice, array covariance matrix Rp is available for processing and

not Ri . Additionally, Rp is to be estimated as sample covariance matrix for Ns snapshots,

given by

Ns

1 �

R̂p = p(t)pH (t) (3.58)

Ns

t=1

When the data is Gaussian, the sample covariance matrix converges to true covariance matrix.

Now, Qn has to be estimated from R̂p . Let qi be any eigenvector of R̂i with eigenvalue as

λi , then

R̂i qi = λi qi

= (λi + σ 2 )qi

3.4 Acoustic Source Localization 43

It means that any eigenvector of R̂i is also an eigenvector of R̂p with eigenvalue as (λi + σ 2 ).

So, if R̂i = QΛQH then

2 ··· ···

λ1 + σ 0 0 0 0

0 λ2 + σ 2 ··· 0 0 ··· 0

.. .. .. .. .. .. ..

. . . . . . .

H

R̂p = Q

0 0 · · · λL + σ 2 0 ··· 0 Q (3.59)

0 0 ··· 0 σ2 · · · 0

. .. .. .. .. . . ..

.. .

. . . . .

0 0 ··· 0 0 ··· σ2

The eigenvector matrix Q is decomposed into signal subspace Qs and noise subspace Qn

as follows. The eigenvectors corresponding to the highest L eigenvalues, form the signal

subspace matrix of order I × L. The other I − L columns of Q (noise eigenvectors) form

noise subspace, Qn with eigenvalues σ 2 . It is to be noted that noise eigenvalues are negligible

when compared to signal eigenvalues. Now the MUSIC spatial spectrum can be computed

as in Equation 3.57.

Music Spectrum

0.5

0

0 10 20 30 40 50 60 70 80 90

DOA

Music Spectrum

0.5

0

0 10 20 30 40 50 60 70 80 90

DOA

Figure 3.15: MUSIC-Magnitude spectrum for DOA 60◦ and 65◦ using 5 sensors (top) and for

15 sensors (bottom).

The MUSIC spectrum is plotted for two closely spaced sources at 60◦ and 65◦ using 5 and

3.4 Acoustic Source Localization 44

15 sensors in Figure 3.15. The MUSIC spectrum is also called MUSIC-Magnitude spectrum

as it utilizes the magnitude spectrum of MUSIC, as it can be seen from Equation 3.57.

MUSIC-Group delay as proposed in [19], utilizes phase spectrum of MUSIC for robust source

localization. MUSIC-Magnitude spectrum requires a large number of sensors to resolve closely

spaced sources as shown in Figure 3.15. In reverberant environments, it requires a compre-

hensive search algorithm for deciding candidate peaks for DOA due to a large number of

spurious peaks [58].

MUSIC-Group delay spectrum is deﬁned as [19]

I−L

�

PM GD (φ) = ( |∇arg(aH (φ)qu )|2 )PM U SIC (φ) (3.60)

u=1

where ∇ arg indicates gradient of unwrapped phase spectrum of (aH (φ)qu ). The gradient

is with respect to the spatial variables φ. A sharp transition at the DOAs is observed in

unwrapped phase spectrum of MUSIC. Hence, gradient of the unwrapped phase spectrum

(group delay of MUSIC) results in sharp peaks at the location of the DOAs. In practice,

abrupt changes can occur in the phase due to small variations in the signal caused by micro-

phone calibration errors. This leads to spurious peaks in group delay spectrum. However,

the product of MUSIC and group delay spectra, called MUSIC-Group delay [19], removes

such spurious peaks and gives high resolution estimation.

Figure 3.16 illustrates MUSIC, unwrapped phase and MUSIC-Group delay spectra for

two sources with azimuth (60◦ , 65◦ ) and (50◦ and 60◦ ). High resolving capability of MUSIC-

Group delay can be seen using limited number of sensors. The MUSIC-Group delay spectrum

is able to preserve the peaks corresponding to DOAs due to additive property of group delay

spectra. A mathematical proof of additive property using ULA, is also given in [19]. A

detailed similarity between corresponding spectra obtained for ULA and UCA, is provided

in Chapter 4.

3.4 Acoustic Source Localization 45

1 1

MM

MM

0.5 0.5

0 0

0 10 20 30 40 DOA 50 60 70 80 90 0 10 20 30 40 DOA 50 60 70 80 90

1 1

MP

MP

0 0.5

−1 0

0 10 20 30 40 DOA 50 60 70 80 90 0 10 20 30 40 DOA 50 60 70 80 90

1 1

MGD

MGD

0.5 0.5

0 0

0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 DOA 50 60 70 80 90

DOA

(a) (b)

Figure 3.16: MUSIC, Unwrapped phase (of MUSIC) and MUSIC-Group delay spectra for

two sources with azimuth (a) 60◦ and 65◦ , (b) 50◦ and 60◦ .

This work proposes to utilize covariance matrix estimated using shrinkage estimators in

computation of MUSIC-Group delay [59]. Shrinkage estimators are a widely used class of

estimators which regularize the covariance matrix by shrinking it toward some target structure

Rt [60]. It is formulated as linear combination of unbiased estimate (sample covariance

matrix) Ru and a biased (target) estimate Rt . Hence, the covariance matrix is now estimated

using shrinkage estimator as

R̂p = βRt + (1 − β)Ru (3.61)

where β ∈ [0, 1], denotes the shrinkage intensity. The value of β is chosen so that the average

likelihood of omitted samples is maximized as suggested in [61].

Subspace based methods are prone to errors at low signal to noise ratio (SNR) and

high reverberation. Under such condition, the noise eigenvalues are no more negligible and

may become comparable to signal eigenvalues, leading to erroneous result [62]. However,

the shrinkage estimation method of estimating correlation matrix, indeed suppress the noise

eigenvalues as can be seen from the Figure 3.17.

The high resolution of the MUSIC-Group delay with shrinkage estimator (SMGD) as

compared to the MUSIC-Group delay (MGD) and MUSIC-Magnitude computed from the

3.4 Acoustic Source Localization 46

2

10

Eigenvalues from sample covariance

Eigenvalues using Shrinkage estimation

0

10

Eigenvalues

−2

10

−4

10

1 2 3 4 5 6 7 8 9 10

Index

Figure 3.17: Eigenvalue estimation using sample covariance and shrinkage estimator using

10 Sensors for 3 Sources, located at 20◦ , 35◦ and 50◦ .

The subspace-based methods described till now, have signiﬁcant limitation. The accuracy is

limited by the discretization at which the spectrum (PM V DR (φ), PM U SIC (φ), or PM GD (φ))

is estimated. Moreover, it requires a comprehensive search algorithm for deciding candidate

peak corresponding to DOA of a source. root-MUSIC proposed in [20], is a search free

algorithm, and it estimates DOAs as roots of MUSIC polynomial. Hence, the solution is

exact and not limited by the discretization.

The MUSIC spectrum as in Equation 3.57, can also be written as

−1 H H

PM U SIC (φ) = a (φ)Qn Qn a(φ)

= aH (φ)Ca(φ) (3.62)

where, C = Qn QH

n

Substituting z = ejkd cos(φ) in Equation 3.16, steering vector for ULA can be expressed as

� �T

a(φ) = 1 z z 2 · · · z I−1 . (3.63)

Utilizing Equation 3.63 in Equation 3.62, the MUSIC spectrum can now be written in a

3.4 Acoustic Source Localization 47

1

MUSIC

0.5

0

0 10 20 30 40 50 60

1

MGD

0.5

0

0 10 20 30 40 50 60

1

SMGD

0.5

0

0 10 20 30 40 50 60

Azimuth (φ)

Figure 3.18: The MUSIC-Magnitude spectrum (Top), the MUSIC-GD spectrum (Middle),

and the MUSIC-GD spectrum with shrinkage estimation (bottom) using 6 sensors for closely

spaced sources located at 20◦ and 25◦ , at DRR=20dB.

I−1 �

� I−1

−1

PM U SIC (z) = z n Cmn z −m (3.64)

m=0 n=0

I−1 �

� I−1

P (z) = z n−m Cmn (3.65)

m=0 n=0

The double summation in Equation 3.65, can be written as single summation by Substituting

n − m = r which suggests

(I−1)

�

P (z) = Cr z r (3.66)

r=−(I−1)

�

where, Cr = Cmn .

n−m=r

It can be observed that the root-MUSIC polynomial is of degree (2I − 2) with (2I − 2) roots.

1

Additionally, if z is the root of the polynomial, z∗ is also the root. This can be seen from the

1

notational deﬁnition of z. Since, z and z∗ have the same phase and reciprocal magnitude,

one root is within the unit circle while the other is outside. Hence, (I − 1) roots are within

the unit circle and rest (I − 1) roots are outside. As DOA information is present in phase

3.5 Wideband Source Localization 48

Imaginary Part

0

−1

−2

−2 −1 0 1 2

Real Part

Figure 3.19: Z-Plane representation of all the roots of root-MUSIC polynomial using 8 sensors

for 2 sources with locations 40◦ and 50◦ .

which is same for both the set, any one set of roots can be utilized for DOA estimation. Also,

without noise, all roots should fall on unit circle. However, because of noise, the roots move

away from the unit circle. Hence, out of (I − 1) roots within the unit circle, L roots close to

unit circle can be used for DOA estimation. The azimuth is estimated using

�ln(z)

φ = cos−1 { } (3.67)

kd

where � is imaginary part. Figure 3.19 plots the roots of root-MUSIC polynomial using 8

sensors for two sources at 40◦ and 50◦ .

The algorithms discussed in the previous Section, are limited to narrowband source localiza-

tion only. The methods can not be directly applied to speech signal which is wideband in

nature. For narrowband sources, time delay directly translates to a phase shift in the fre-

quency domain and, the phase shift is approximately constant over the signal bandwidth. As

can be seen from Equation 3.68, the phase shift is function of delay only which is function of

array structure and the source location. Hence, DOA estimation can be performed utilizing

classical narrowband source localization algorithms.

3.5 Wideband Source Localization 49

When the signal is wideband, the phase shift is function of frequency also, along with

source location and array geometry. The phase shift is no more constant over the frequency of

interest. Also, the number of signiﬁcant eigenvalues for the array covariance matrix become

larger than the number of sources L. This is due to mixing of diﬀerent frequency components

[63]. Hence, as the bandwidth of the source increases, decomposing the covariance matrix

into signal subspace and noise subspace becomes diﬃcult.

In order to deal with localization of wideband sources, the array output is decomposed

into multiple narrowband frequency components using fast Fourier transform (FFT). The

output at each array is segmented into Ns snapshots. The temporal FFT is applied to

each snapshot to determine K frequency components. Representing the array output for the

tth snapshot (where t = 1, 2, · · · Ns ) and the κth frequency component as Pt,κ , the sample

covariance matrix is computed as

Ns −1

1 � H

R̂P κ = Pt,κ Pt,κ (3.69)

Ns

t=0

the wideband source localization is divided into two categories, incoherent and coherent.

Incoherent methods for wideband source localization process each frequency bin indepen-

dently. Narrowband source localization is applied over each frequency bin. An average DOA

estimate is found over all frequency bins. The incoherent MUSIC-Magnitude spectrum can

thus be written as

1

PM U SIC (φ) = �K−1 (3.70)

κ=0 (aH (f H

κ , φ)Qn Qn a(fκ , φ))

DOA estimates at high SNR for well separated sources. The coherent approach to wideband

source localization is presented in [64].

3.6 Summary 50

3.6 Summary

The array data model used in source localization is discussed in this chapter. Various

approaches to source localization over uniform linear array are also discussed. Among

correlation-based, beamforming-based, and subspace-based methods, subspace-based meth-

ods exhibit high resolution. A robust method for source localization based on phase infor-

mation of MUSIC, called MUSIC-Group delay, is also described. The MUSIC-Group delay

method using shrinkage estimators, is introduced for robust source localization over a uni-

form linear array. Eﬀects of noise and reverberation are discussed in the context of source

localization. Methods for wideband source localization are also brieﬂy discussed.

Chapter 4

Source Localization over Planar

Microphone Array

4.1 Introduction

Planar microphone array can localize sources anywhere in azimuthal plane, with elevation

in the range of 0 to 90◦ . Also, they are more compact when compared to linear arrays, for

the same number of microphones. Hence, various planar arrays has been used for source

localization which includes rectangular, circular and V-shaped [65, 66, 67, 68].

As discussed in the previous Chapter, correlation based and beamforming based source

localization methods provide inconsistent results when multiple sources are present. The

bias of the estimates may become signiﬁcant when the sources are closely spaced, correlated

and in reverberant environments. In subspace based methods, MUltiple SIgnal Classiﬁcation

(MUSIC) is widely studied due to its computational eﬃciency. However, it requires a large

number of sensors to resolve closely spaced sources. In reverberant environments, it requires

a comprehensive search algorithm for deciding candidate peaks for direction of arrival (DOA)

due to a large number of spurious peaks [58].

MUSIC algorithm for source localization using uniform circular array (UCA) can be

4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 52

found in [48]. UCA-RB (Real-Beamspace) MUSIC is proposed in [47], utilizes phase mode

excitation based transformation. Conventionally, spectral magnitude of MUSIC is utilized

for computing the DOAs of multiple sources incident on the array of sensors. The phase

information of the MUSIC spectrum has been studied in [18] for DOA estimation over a

uniform linear array (ULA).

In this Chapter, the negative diﬀerential of the unwrapped phase spectrum (group delay)

of MUSIC is proposed for DOA estimation over planar arrays. Although the group delay

function has been used widely in temporal frequency processing for its high resolution prop-

erties [4], the additive property of the group delay function has hitherto not been utilized in

spatial spectrum analysis. In the following section, MUSIC-Group delay (MGD) spectrum is

discussed for robust source localization using uniform circular array.

Localization

Subspace-based methods for DOA estimation based on the spectral magnitude of MUSIC

require a large number of sensors for resolving spatially close sources and are prone to errors

under reverberant conditions. In [19], a novel method for high resolution source localization

based on the MUSIC-Group delay spectrum over ULA has been proposed. This method is

able to resolve closely spaced sources with limited number of sensors. In the following Section,

MUSIC-Group delay based method for two dimensional source localization over planar arrays

is proposed.

4.2.1 Music-Group Delay Method for Source Localization over Planar Ar-

ray

As shown in Section 3.3, the received pressure over planar array with I microphones from L

(L < I) narrowband sources can be written as

4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 53

where Ψ = (θ, φ) is angular location of a source with θ being the elevation and φ being the

azimuth, as deﬁned in Section 2.2. A(Ψ, k) is I × L steering matrix, expressed as

� �

A(Ψ, k) = a(Ψ1 , k) a(Ψ2 , k) . . . a(ΨL , k) . (4.2)

s(t) = [s1 (t), s2 (t), · · · , sL (t)]T is vector of signal amplitudes at the reference point and (.)T

denotes the transpose of (.). A particular steering vector a(Ψ, k) consisting of time delays,

can be expressed as

� �T

a(Ψ, k) = e−jωc τ1 (Ψ) e−jωc τ2 (Ψ) · · · e−jωc τI (Ψ) (4.3)

where τi is time delay at the ith microphone with respect to the reference microphone and

ωc is narrowband signal frequency. The noise v is assumed to be stationary, zero mean,

uncorrelated random process. From equation 3.18, the delays τi (Ψ) is related to azimuth and

elevation angles as

−ra sin θl cos (φl − φi )

τi (Ψl ) = (4.4)

c

where ra is radius of the circular array, φi is azimuth angle of the ith microphone with the

center of the circular array as the reference and c is speed of sound.

4000

2000

MUSIC Magnitude

0

100

50

Ele(θ) 0 0 20 40 60 80 100 120 140 160 180

1

0.5

0

0 10 20 30 40 50 60 70 80 90

Azi(φ)

Figure 4.1: Spectral magnitude of MUSIC for UCA (top) and ULA (bottom). Sources at

(15◦ ,50◦ ) and (20◦ ,60◦ ) for UCA. Sources at 50◦ and 60◦ for ULA.

4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 54

1 1 1

PM U SIC (Ψ) = = = I−L (4.5)

aH (Ψ)Qn [Qn ]H a(Ψ) ||aH (Ψ)Qn ||2 �

|aH (Ψ)qu |2

u=1

Rp = E[p(t)p(t)H ] and qu ∈ Qn , is the uth noise eigenvector. The denominator takes a

null value when Ψ corresponds to signal direction. Hence, the MUSIC-Magnitude spectrum

PM U SIC (Ψ) has a peak at the DOA represented by the azimuth and elevation angle (θ, φ).

However, when the sources are closely spaced, MUSIC-Magnitude spectrum is unable to

resolve them clearly, giving many spurious peaks or single peak when limited number of

sensors are used. This is illustrated in Figure 4.1 and Figure 4.4(a) respectively.

The experimental setup for Figures 4.1-4.3 utilizes a UCA of twelve sensors, placed on two

concentric circles. Four sensors are placed on the inner circle and eight sensors on the outer

circle. The sources are placed at (15◦ ,50◦ ) and (20◦ ,60◦ ). Additionally, a corresponding ﬁgure

utilizing a ULA is also illustrated. The ULA consisting of eight sensors, is used to illustrate

azimuth only of the sources.

5

0

MP

−5

100

80

60

Ele(θ) 40

20

0 60 80 100

0 20 40

1

MP

0.5

0

0 10 20 30 40 50 60 70 80 90

Azi(φ)

Figure 4.2: Spectral phase of MUSIC for UCA (top) and ULA (bottom). Sources at (15◦ ,50◦ )

and (20◦ ,60◦ ) for UCA. Sources at 50◦ and 60◦ for ULA.

To overcome the limitation of MUSIC, the group delay function of MUSIC spectrum is

presented herein for resolving closely spaced sources with limited number of sensors. The

4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 55

proposed MUSIC-Group delay spectrum for two dimensional DOA (azimuth and elevation)

estimation over planar arrays is deﬁned as,

� I−L

� �

PM GD (Ψ) = |∇ arg(aH (Ψ)qu )|2 PM U SIC (Ψ) (4.6)

u=1

where ∇ arg indicates gradient of unwrapped phase spectrum of (aH (Ψ)qu ). The gradient is

with respect to the spatial variables θ and φ.

Phase spectra of MUSIC for UCA and ULA are shown in Figure 4.2. It can be noted from

the ﬁgure that in the neighborhood of the DOA, there is a sharp change in the unwrapped

phase spectrum for both UCA and ULA. Diﬀerentiating this unwrapped phase spectrum

results in very sharp peaks at the location of the DOAs. In practice, abrupt changes in phase

can also occur due to microphone calibration errors. Hence, diﬀerential phase can result in

sharp peak at an angle even if it is not a DOA. This diﬀerential phase (group delay) spectrum

is illustrated in the Figure 4.3(a) for UCA (top) and for ULA (bottom). MUSIC-Group delay

spectrum being product of the MUSIC-Magnitude and the group delay spectra, is able to

remove the spurious peaks and retains only the peaks corresponding to DOAs, as illustrated

in Figure 4.3(b).

4

x 10

30

4

MUSIC−Group Delay

20

2

Standard Group delay

10

0 0

100 50

Ele(θ) 0 0 20 40 60 80 100 120 140 160 180 100

50

1 Ele(θ) 0 0 20 40 60 80 100 120 140 160 180

1

Azimuth(θ)

0.5

0.5

0 0

0 10 20 30 40 50 60 70 80 90

0 10 20 30 40 50 60 70 80 90

Azi(φ) Azi(φ)

(a) (b)

Figure 4.3: Illustration of standard group delay of MUSIC and the MUSIC-Group delay as

proposed in this work. (a) Standard group delay spectrum of MUSIC for UCA (top) and ULA

(bottom) (b) MUSIC-Group delay spectrum for UCA (top) and ULA (bottom). Sources are

at (15◦ ,50◦ ) and (20◦ ,60◦ ) for UCA, at 50◦ and 60◦ for ULA.

4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 56

verberant Conditions

ments is presented. Performance of the subspace-based methods degrades due to multi-path

eﬀects. In subspace-based methods like MUSIC, the signal eigenvalues of the received sig-

nal correlation matrix are signiﬁcant, compared to the noise eigenvalues. However, because

of multi-path eﬀects under reverberation, extraneous eigenvalues become signiﬁcant. This

aﬀects the performance of the subspace-based method, especially MUSIC [69]. A detailed

discussion on reverberation is presented in Section 3.3.2.

MUSIC-Magnitude spectrum and MUSIC-Group delay spectrum plots are shown in Fig-

ure 4.4 for two sources at (15◦ ,100◦ ) and (17◦ ,105◦ ) at reverberation time T60 , 400 ms. The

room impulse response (RIR) is simulated by image method [70], as implemented in [71]. It

can be seen that MUSIC-Group delay spectrum is able to resolve the sources, where MUSIC-

Magnitude spectrum gives single peak. In the following Section, the resolving power of the

MUSIC-Group delay spectrum for azimuth and elevation estimation is justiﬁed by proving

2-D additive property of group delay spectrum.

5 5

x 10 x 10

3 10

MUSIC Magnitude

MUSIC−GD

2

5

1

0 0

100 100

50 50

150 200

Elevation(θ) 100 150 200 Elevation(θ) 0 0 50 100

0 0 50

Azimuth(φ)

Azimuth(φ)

(a) (b)

Figure 4.4: Plots illustrating azimuth and elevation angle as estimated by (a) MUSIC-

Magnitude and (b) MUSIC-Group delay spectrum for sources at (15◦ ,100◦ ) and (17◦ ,105◦ ),

reverberation time 400 ms. MM estimates single peak at (18◦ ,105◦ ). MGD estimates two

peak at (19◦ ,100◦ ) and (17◦ ,108◦ ).

4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 57

trum

The high resolution of the proposed MUSIC-Group delay is due to the additive property of

MUSIC-Group delay spectrum. For closely spaced sources under reverberation, the peaks

corresponding to the DOAs, merge together giving single peak in the MUSIC spectrum.

However, as described in Section 4.2.1, it is to be noted that closely spaced sources can

be resolved by MUSIC-Group delay spectrum using limited number of sensors. This high

resolution property of MUSIC-Group delay spectrum is due to its additive property, since a

product in MUSIC-Magnitude domain is equivalent to an addition in MUSIC-Group delay

domain [19]. The mathematical proof for additive property of MUSIC-Group delay spectrum

for ULA has already been dealt with in [19]. In case of ULA, the steering vector exhibits

Vandermonde structure, and hence root-MUSIC polynomial approach is used for showing the

additive property. This is not the case for UCA as it is clear from Equations 4.3,4.4.

The UCA can be divided into number of cross sections, where each cross section represents

a ULA. A single ULA will be able to estimate only the azimuth angle of arrival. Therefore, in

general two ULAs are suﬃcient to get an estimate of both the azimuth and elevation angles.

Having more than two ULAs improves the robustness of the estimates. For multiple incident

signals, pairing of the corresponding estimates from various ULAs, can be carried out as in

[72]. Other pairing methods for eigenvalue association can be found in [73]. Generalizing the

pairing methods of eigenvalue association for a UCA [72, 73], the steering vector a(Ψ) can

be expressed as a vector of exponentials

� (1) (1) (1) (2) (2)

�T

e−jωc τ1 e−jωc τ2 .. e−jωc τn1 e−jωc τ1 .. e−jωc τn2 .. (4.7)

�

where nr is the number of sensors in the rth cross section of the UCA and ∀r nr = I. Note

(r)

that τi is the delay at the ith microphone in the rth cross section of the UCA. The steering

vector can now be expressed to have Vandermonde structure as follows

� �T

a(Ψ) = z z 2 .. z n1 y y 2 .. y n2 .. (4.8)

where

(1) (2)

z = e−jωc τ1 ; y = e−jωc τ1 . (4.9)

4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 58

From Equation 4.5, constructing the root-MUSIC polynomial for UCA, we have

I−L

�

PP OLY (Ψ) = |aH (Ψ)qu |2 . (4.10)

u=1

Utilizing Equation 4.8 and re-writing the root-MUSIC polynomial as sum of polynomials in

z and y denoted by F (z) and G(y) respectively, we have

For actual DOA, Ψl , the polynomial PP OLY (Ψl ) and hence each polynomial corresponding

to a cross section of the UCA (e.g. F (z)), will become zero.

It is to be noted that F (z) is a polynomial in z having (n1 − 1) roots. Among the (n1 − 1)

roots of this polynomial, there can be maximum of L roots corresponding to L sources. It

is also possible for two or more diﬀerent incident signals to lie on the cone of confusion of

a particular ULA, in which case there will be more than (n1 − 1 − L) roots lying very close

to the origin of the Z-plane. In either case, (n1 − 1 − L) roots with magnitude close to zero

can be ignored. Constructing a polynomial Y (z) from L roots corresponding to L sources,

we have

L

� L

�

Y (z) = (1 − z.zl−1 ) = 1 + bl .z −l (4.12)

l=1 l=1

where zl is the lth root of F (z). It is assumed herein for mathematical simplicity that all

sources fall in the ﬁeld of view of the ﬁrst cross-section. Without loss of generality and to

maintain consistency with the deﬁnition of the MUSIC method, one can invert Y (z) and

express it as a combined resonator H(z), where

1 1

H(z) = L

= L

. (4.13)

� �

1+ bl .z −l

(1 − z.zl−1 )

l=1 l=1

This complies with the approach wherein a DOA is looked at as a pole rather than a zero.

As we are interested in group delay spectrum of the combined resonator, H(z) can also be

re-written as product of poles as shown below

L

� L

� L

�

H(α) = rl ejγl (α) = [ rl ].exp(j γl (α)) (4.14)

l=1 l=1 l=1

4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 59

where rl is the magnitude and γl is the phase of the resonator pole zl . As per deﬁnition in

Equation 4.9, γ should be a function of α, the spatial variable. It may be noted from Equation

4.14 that the combined resonator exhibits a product of magnitude spectra of individual

resonators. On the other hand, it exhibits a sum of phase spectra of individual resonators.

Taking negative derivative of the unwrapped phase spectrum of the combined resonator, we

ﬁnally have

∂

τH (α) = − arg[H(α)] = τH1 (α) + τH2 (α) + . . . + τHL (α). (4.15)

∂α

It is clear from the Equations 4.14 and 4.15 that the MUSIC-Magnitude is a product

spectrum, while the MUSIC-Group delay spectrum exhibits additive property. Due to this

additive property, the peaks are preserved in MUSIC-Group delay spectrum even for closely

spaced sources. On the other hand, MUSIC-Magnitude spectrum fails to do so. This is

illustrated in Figure 4.5.

Figure 4.5: Two dimensional spectral plots for the cascade of two individual DOAs (res-

onators), (a) Source with DOA (15◦ ,60◦ ) (b) Source with DOA (18◦ ,55◦ ) (c) MUSIC-

Magnitude spectrum (d) MUSIC-Group delay spectrum.

4.3 Localization Error Analysis 60

Two individual resonators at DOAs (15◦ ,60◦ ) and (18◦ ,55◦ ) are considered as shown in

Figures 4.5(a) and 4.5(b) respectively. The MUSIC-Magnitude and the MUSIC-Group delay

spectra for the cascade of these two resonators are plotted. It can be noted from Figure

4.5(c) that the magnitude spectrum is unable to resolve the two sources, as the two peaks

are merged due to multiplicative property of magnitude spectrum. On the contrary, the

MUSIC-Group delay spectrum is able to resolve the two sources owing to its 2-D additive

property, as can be seen in Figure 4.5(d).

Subspace-based methods like MUSIC and MUSIC-Group delay are sensitive to ﬁnite sample

eﬀects, imprecisely known noise covariance, a perturbed array manifold and reverberation.

Finite sample eﬀects occur since it is not possible to obtain a perfect covariance matrix

R of the received data over an array. In practice, estimation of the sample covariance R̂

requires averaging over several snapshots of the received data. The ﬁnite sample eﬀects can

be neglected by taking high SNR or large number of snapshots. The error due to imprecisely

known noise covariance is also neglected to analyze the eﬀect of sensor position error and

reverberation on the proposed method. In the ensuing Section, performance of MUSIC and

MUSIC-Group delay is presented under sensor perturbation errors. Performance evaluation

is also conducted in a reverberant environment. A numerical analysis is presented comparing

root mean square error (RMSE) of various methods under reverberation with the Cramér-Rao

bound (CRB).

Let ri be the nominal sensor position for the ith sensor. The position matrix r is formed from

the nominal sensor positions as

� �

R = r1 r 2 . . . rI .

� �T

a(Ψl , k) = e−jkTl r1 , e−jkTl r2 , . . . , e−jkTl rI . (4.16)

4.3 Localization Error Analysis 61

The displacements of the ith sensor from the nominal sensor positions is given as

1 0

µi ∼ N (0, σ 2 )

0 1

These position perturbations are assumed to be i.i.d. Gaussian random variables and are

independent of the signals or any additive noises that may occur at the sensor outputs. In

any DOA estimation process, the sensor perturbations are assumed to be time-invariant i.e.

the same perturbation is used for t = 1, 2, ..N s. The position error matrix µ is formed similar

to the position matrix R as

� �

µ = µ1 µ2 . . . µI .

Hence, perturbed sensor positions are given by R̃ = R + µ. The lth steering vector associated

with the sensor perturbation can now be written as [74]

T

e−jkl µ1 0 ... 0

. ..

e−jkl µ2 . .

T

0 .

Γl = .

.. . .. . ..

. 0

Tµ

0 ... 0 e−jkl I

Under sensor perturbation error, the signal model in Equation 4.1 turns out to be

� �

Ã(Ψ, k) = ã(Ψ1 , k), ã(Ψ2 , k), . . . , ã(ΨL , k) . (4.18)

The eﬀect of the sensor perturbation on the array autocorrelation matrix is simulated as

described in [74], and the analysis is carried out. The resolution of the MUSIC-Magnitude

and MUSIC-Group delay methods under perturbation errors is illustrated in Figure 4.6(a),

and Figure 4.6(b) respectively. The ﬁgure illustrates contour plots for the respective spectra.

Note that the spectrum for MUSIC-Magnitude shows a single peak with contours around it,

while the spectrum for MUSIC-Group delay shows two distinct peaks with diﬀerent contours.

4.3 Localization Error Analysis 62

35 Source 1 :

35 Source 1 :

(θ,φ)=(20,45) (θ,φ)=(20,45)

30 Source 2 : 30 Source 2 :

(θ,φ)=(15,50) (θ,φ)=(15,50)

Elevation(θ)

Elevation(θ)

25 25

20 20

15 15

10 10

5 5

35 40 45 50 55 60 35 40 45 50 55 60

Azimuth(φ) Azimuth(φ)

(a) (b)

Figure 4.6: Contour plots of (a) MUSIC-Magnitude spectrum (b) MUSIC-Group delay spec-

trum, under sensor perturbation errors.

Cramér-Rao bound provides a lower bound for the mean square error (MSE) of an unknown

parameter. Average RMSE in DOA estimates has been compared with CRB for various

methods. Circular array geometry being uncoupled, the statistical coupling eﬀect between

azimuth and elevation estimate is ignored. The Cramér-Rao inequality for estimating pa-

rameter αr is given as

var(α̂r ) ≥ [F −1 ]rr (4.19)

where the rsth element of the Fisher information matrix F is given by [75, 68]

∂Rp −1 ∂Rp

Frs = Ns tr{R−1

p R }. (4.20)

∂αr p ∂αs

4.3 Localization Error Analysis 63

For 2-D DOA estimation, the unknown parameter vector is α = [θ, φ]. The elements of Fisher

information matrix is given by

p ARs ) × (Aθ PA Rp Aθ ) ]

p ARs ) × (Aθ PA Rp Aφ ) ]

P⊥ H −1 H

A = I − A(A A) A

L

� ∂A

Aθ =

∂θl

l=1

matrix, as deﬁned in Section 4.2.1.

The azimuth and elevation angle are varied from 10◦ -150◦ and 10◦ -80◦ respectively at

reverberation time, T60 = 200 ms and SNR =10 dB. The DOA estimation is done using

MVDR and Beamspace MUSIC (BSM) [76, 47], apart from MUSIC-Magnitude and MUSIC-

Group delay. For this simulation, 15 channel UCA with r = λ (the wavelength), is considered.

The maximum phase mode excited for BSM is taken to be 7. Two closely spaced, uncorrelated

sources, with 2◦ separation in azimuth and elevation, are taken in this analysis. Average

RMSE for azimuth and elevation estimates obtained by the four methods are compared with

average Cramér-Rao bound in Table 4.1. It can be seen that average RMSE for MUSIC-

Group delay is the lowest.

Table 4.1: Comparison of average RMSE of various methods with the CRB (illustrated in

the ﬁrst row) for an azimuth range of 10◦ -150◦ and elevation range of 10◦ -80◦ at T 60 of 200

ms and SNR 10dB.

Cramér-Rao bound Ele : 1.482×10−6 Azi : 1.4977×10−6

Azi 5.2723 Azi 12.3479

MGD BSM

Ele 5.2329 Ele 9.0577

Azi 5.2985 Azi 12.2562

MM MVDR

Ele 5.3316 Ele 11.1128

4.3 Localization Error Analysis 64

based methods. In this Section, performance evaluation of the proposed method is conducted

in indoor environment. It is well known that eﬀect of reverberation is prominent in such

environments. Hence, we consider a small meeting room setup shown in the Figure 4.8.

The set-up has four participants around the table. The error analysis in DOA estimation is

presented herein by scatter plot. The reverberation is simulated as discussed in Section 4.4.1.

The Noise is generated using a zero mean and unit variance Gaussian distribution.

The experiment is conducted under reverberation, with T60 of 150 ms which typically

corresponds to small meeting room. DOA estimation trials are conducted for two closely

spaced sources at (10◦ ,20◦ ) and (5◦ ,10◦ ). The SNR considered is 40 dB, to analyze the

eﬀect of reverberation. For 500 number of independent trials, the azimuth and the elevation

estimates are plotted in the Figure 4.7. In case of MUSIC-Magnitude, there were several

cases where the estimates overlapped each other, leading to poor localization of the sources.

Also, the estimates are unevenly distributed around the actual, as illustrated in Figure 4.7(a).

Figure 4.7(b), shows the distribution of the estimates of the proposed method. It can be seen

that the average estimate will be closer to the actual in case of the proposed method.

20 20

15 15

Elevation(θ)

Elevation(θ)

10 10

5 5

0 0

0 10 20 30 0 10 20 30

Azimuth(φ) Azimuth(φ)

(a) (b)

Figure 4.7: Two dimensional scatter plot for localization for the sources at (10◦ ,20◦ ) and

(5◦ ,10◦ ) using (a) MUSIC-Magnitude method and (b) MUSIC-Group delay method. Rever-

beration time is 150 ms. SNR is 40 dB. Number of iteration is 500. The red dot indicates

the actual DOA.

4.4 Performance Evaluation 65

enhancement, perceptual evaluation and distant speech recognition. In the following section,

the experiment on speech enhancement is presented as improvement in signal to interfer-

ence ratio (SIR) [77]. Experiments on perceptual evaluation are also conducted for various

methods and quantiﬁed using objective measures. Distant speech recognition experiment re-

sults are presented as word error rate (WER). The proposed method, MUSIC-Group delay is

compared with MUSIC-Magnitude (MM), Beamspace MUSIC (BSM) [76, 47], linearly con-

strained minimum variance (LCMV) and minimum variance distortionless response (MVDR).

The proposed algorithm was tested in a typical meeting room environments. A room with

dimensions, 730 cm × 620 cm × 340 cm was used in the experiments. The experimental

setup consists of a uniform circular, 15 channel microphone array with a radius of 10 cm.

It has one desired speaker, one competing speaker and two interfering sources as shown in

Figure 4.8.

Figure 4.8: Experimental Setup in meeting room with two speakers (S1 and S2) and two

interference (stationary noise source SN and nonstationary noise source NS). Sources are

located at (17◦ ,35◦ ), (19◦ ,40◦ ), (15◦ ,30◦ ) and (21◦ ,45◦ ) respectively. Radius of the circular

array is 10 cm.

4.4 Performance Evaluation 66

White noise and babble noise from NOISEX-92 [78] database were used as stationary

and nonstationary interfering sources respectively. The signals are acquired over the array of

microphones. Under reverberation, the signal is convolved with room impulse response.

In real life experimental conditions, a room impulse response is generated in two ways. A

microphone is used to record a short sounding pulse, giving room impulse response. Another

way involves the use of the maximum length sequence (MLS). RIR is simulated using image

method [70] as implemented in [71].

DOAs are estimated using various algorithms over the acquired signals. A ﬁlter sum

beamformer (FSB) is trained using the DOA estimates obtained. The signals are recon-

structed using the beamformer. Distant speech recognition (DSR) and speech enhancement

experiments are conducted on the reconstructed speech signal. The complete procedure is

depicted in Figure 4.9.

Z

DOA TDOA FSB Expt.

X Y

Figure 4.9: Flow diagram illustrating the methodology followed in performance evaluation

for distant speech signal acquired over circular array.

The performance of the proposed method is presented herein as improvement in SIR. The

input SIR of the lth speaker relative to the stationary (sn) or nonstationary (ns) interfering

4.4 Performance Evaluation 67

Table 4.2: Enhancement in SIR (dB), compared for various methods at diﬀerent reverberation

time. S1s is the desired speaker, S2s is the competing speaker, S ns is non-stationary noise source

and S sn is stationary noise source.

Output SIR Output SIR Output SIR

Input SIR

(150ms) (200ms) (250ms)

Methods Source S sn S ns S sn S ns S sn S ns S sn S ns

MGD

S2s 10 5 46.0928 43.001 41.8349 35.358 40.3478 21.8217

S1s 10 5 42.5781 31.2571 36.575 30.5659 35.0554 30.2473

MM

S2s 10 5 45.5462 29.2422 42.003 25.3432 38.7453 21.7951

S1s 10 5 39.332 27.2701 38.9946 25.8978 38.821 24.5881

BSM

S2s 10 5 39.8571 28.7702 38.2909 27.0717 37.9901 25.6131

S1s 10 5 33.0964 27.365 30.8716 25.2634 30.1894 23.0318

LCMV

S2s 10 5 34.0859 26.7355 32.0898 25.1449 28.0 23.6274

S1s 10 5 34.7763 23.0224 26.2894 22.61 25.0518 22.1845

MVDR

S2s 10 5 33.0054 24.6961 31.058 23.5055 27.7594 19.3616

� �NDF T −1

x ν ξ=0 (ssl (ν, ξ)hslm0 (ν, ξ))2

SIRin,l [dB] = 10log10 � �N (4.21)

DF T −1

ν ξ=0 (sx (ν, ξ)hxm0 (ν, ξ))2

l ∈ {1, 2}

x ∈ {ns, sn}

where, ssl (ν, ξ) is lth speech signal in short time Fourier transform (STFT) domain with

a rectangular window of length NDF T , hslm0 is impulse response for lth speaker and m0

microphone pair, ν is the frame number and ξ is the frequency index.

4.4 Performance Evaluation 68

� �NDF T −1

x ν ξ=0 (yls (ν, ξ))2

SIRout,l [dB] = 10log10 � �N (4.22)

DF T −1

ν ξ=0 (y x (ν, ξ))2

l ∈ {1, 2}

x ∈ {ns, sn}

signal herein was LCMV. The result on SIR improvement is presented in Table 4.2. It can

be seen that the proposed method performs better than all the conventional methods.

Table 4.3: Comparison of perceptual evaluation results using various methods. The results

are compared based on objective measure.

Method T60 LLR SegSNR WSS PESQ

MGD

250 1.6193 -4.8298 36.7986 2.2229

150 1.62 -3.1847 35.6 2.2815

MM

250 1.6487 -4.9995 37.73 2.2215

150 1.6657 -3.2 35.5639 2.28

BSM

250 1.6878 -5.108 38.5765 2.2

150 1.668 -3.22 36.2 2.2826

LCMV

250 1.6994 -5.095 40.0321 2.1746

150 1.67 -3.4 36.4 2.2815

MVDR

250 1.7379 -5.0356 40.0647 2.1753

In this Section, we evaluate the proposed method by computing objective measures of per-

ceptual evaluation on enhanced speech. Here desired speaker and stationary noise source

pair is considered for evaluation. Six hundred sentences from TIMIT database [79] were se-

lected and randomized to perform the experiments. The objective measures for evaluating

speech quality used herein are, Log-Likelihood Ratio measure (LLR) [80], segmental SNR

4.4 Performance Evaluation 69

(segSNR) [80], Weighted-Slope Spectral (WSS) distance [81] and Perceptual Evaluation of

Speech Quality, PESQ [82]. The results are presented in Table 4.3, at two reverberation level,

T60 = 150 ms and 250 ms. PESQ and segSNR scores are high while LLR and WSS scores

are low for the proposed method, indicating better reconstruction of the signal.

Table 4.4: Comparison of distant speech recognition performance in terms of WER (in per-

centage) at various reverberation time, T60 .

ss1 ss2

T60 T60 T60 T60

Methods CTM

(150ms) (250ms) (150ms) (250ms)

MM 14.21 26.01 13.78 25.56

MONC BSM 9.2 15.02 27.99 15.22 27.32

LCMV 16.59 29.04 16.3 28.39

MVDR 17.04 30.16 16.96 29.86

MGD 8.81 15.79 9.16 16.02

MM 10.15 18.06 10.92 18.68

TIMIT BSM 6.73 10.98 19.16 12.1 20.12

LCMV 12.18 20.44 15.25 21.67

MVDR 14.08 22.47 17.41 24.37

Speaker independent large vocabulary speech recognition experiments are conducted for

speech acquired over circular microphone arrays [83, 84] in a meeting room scenario. The

experimental results are presented as word error rate (WER). The WER is calculated as

(Wn − (Ws + Wd + Wi ))

W ER = 100 − · 100

Wn

where Wn is the total number of words, Ws the total number of substitutions, Wd the total

number of deletions, and Wi the total number of insertions.

4.5 Summary and Contributions 70

To ensure conformity with standard databases, sentences from TIMIT database [79] were

selected. Continuous digit recognition experiments were conducted on MONC [85] database.

Separate set of sentences were used for training and testing. For TIMIT, complete test set of

1344 sentences from 112 male and 56 female were used. The rest were used for training the

speech models. For MONC, the speech models were trained with 8400 isolated and continuous

digit sentences. For testing, 650 continuous digit sentences were used. Three states, eight

mixture HMMs (Hidden Markov Models) were used in the experiments on TIMIT database.

For the experiments on MONC database three states, sixteen mixture HMMs were used.

Table 4.4 lists WER for various methods along with close talking microphone (CTM) as the

benchmark. The MUSIC-Group delay method indicates reasonable reduction in WER when

compared to other methods.

In this Chapter, a novel high resolution source localization method based on the MUSIC-

Group delay spectrum is discussed. The method provides robust azimuth and elevation

estimates of closely spaced sources as indicated by source localization experiments when

compared to conventional source localization methods. The signiﬁcance of the MUSIC-Group

delay method in speech enhancement and distant speech recognition is also illustrated from

improvements in signal to interference ratios and lower word error rates.

Chapter 5

Spherical Microphone Array

5.1 Introduction

After the introduction of higher order spherical microphone array (SMA) and associated

signal processing in [10, 11], the spherical microphone array is widely being used for direction

of arrival (DOA) estimation [12, 13, 14, 15, 16, 17, 86], tracking of acoustic sources [33]

and sound ﬁeld decomposition [87]. The growing research interest in spherical microphone

array can be attributed to the ability of such arrays to measure and analyze three-dimensional

sound ﬁelds in an eﬀective manner. In other words, SMA can localize sound sources anywhere

in space. Additionally, the beampattern can be steered to any direction in three-dimensional

(3-D) space without changing the shape of the pattern. Hence, a spherical microphone array

allows full 3-D control of the beampattern. Another point of such array is the ease of array

processing in spherical harmonics (SH) domain.

In this chapter, novel far-ﬁeld source localization methods over a spherical microphone

array are presented. The chapter starts with a discussion on fundamentals of spherical array

processing. Spherical Fourier transform and beampattern analysis in spherical harmonics do-

main are introduced. Development of far-ﬁeld array data model from spatio-temporal domain

to spherical harmonics domain follows. Thereafter, formulation for the existing conventional

5.2 Fundamentals of Spherical Array Processing 72

(SH-MVDR) [16] and spherical harmonics MUltiple SIgnal Classiﬁcation (SH-MUSIC) [16, 15]

is presented. Finally, a high resolution source localization method for spherical microphone

array is proposed using spherical harmonics MUSIC-Group delay (SH-MGD) spectrum. Sev-

eral experiments are conducted for 3-D source localization in noisy and reverberant envi-

ronments. Additional experiments on source tracking are also conducted. The performance

of the SH-MGD method is compared to other conventional methods in performance evalu-

ation section. Root mean square error (RMSE), probability of resolution and average error

distribution (AED) are utilized for evaluating the proposed method.

Fourier transform (SFT) is essential component for spherical array signal processing in spher-

ical harmonics domain. To start with, two assumptions are made. It is assumed that the

sound pressure on entire sphere is known. This assumption is not true in practice, and the

pressure is sampled spatially using microphones. Sampling weights based on certain sampling

criteria are introduced to take this into account [88]. It is also assumed that the sound ﬁeld is

composed of plane waves. This is approximately true when the sound has traveled suﬃcient

distance from the source.

Let us consider a spherical microphone array with I identical and omnidirectional micro-

phones, mounted on the surface of a sphere with radius ra . The position vector of ith

microphone is given by

� �T

ri = ra sin θi cos φi ra sin θi sin φi ra cos θi (5.1)

where θ is elevation angle, φ is azimuth angle and (.)T denotes transpose of (.). The spherical

microphone array is assumed to be of order N . The order of the array is deﬁned in the

Section 2.5.1. Spherical Fourier transform under the aforementioned deﬁnition of spherical

microphone array is now detailed.

5.2 Fundamentals of Spherical Array Processing 73

Figure 5.1: Computation of spherical Fourier transform over sphere with radius r = 1

Let the pressure received at (r, Φ) = (r, θ, φ) be denoted by p(t, r, Φ) ↔ P (k, r, Φ) with r ≥

ra and k is wavenumber. The spherical Fourier transform (SFT) [89] or spherical harmonics

decomposition [15] of the received pressure is

�

Pnm (k, r) = P (k, r, Φ)[Ynm (Φ)]∗ dΩ (5.2)

Ω∈S 2

where Ynm is spherical harmonics of order n and degree m, dΩ = sin θdθdφ is elemental area

over sphere of unit radius as shown is Figure 5.1, and (.)∗ denotes complex conjugate of (.).

The spherical harmonics Ynm can be written from Section 2.5 as

�

(2n + 1)(n − m)! m

Ynm (Φ) = Pn (cosθ)ejmφ (5.3)

4π(n + m)!

∀0 ≤ n ≤ N, −n ≤ m ≤ n

with Pnm being the associated Legendre functions. Substituting for dΩ, the SFT can be

expressed as

� 2π � π

Pnm (k, r) = P (k, r, Φ)[Ynm (Φ)]∗ sin(θ)dθdφ. (5.4)

0 0

In practice, the pressure received is not continuous. It is spatially sampled at the microphone

locations. Hence, SFT of pressure is approximated by a summation as

I

�

Pnm (k, r) ∼

= ai Pi (k, r, Φi )[Ynm (Φi )]∗ . (5.5)

i=1

5.2 Fundamentals of Spherical Array Processing 74

In matrix form for all n ∈ [0, N ], m ∈ [−n, n] and I, the SFT becomes

Pnm (k, r) ∼

= YH (Φ)ΓP(k, r, Φ) (5.6)

� �T

where Pnm = P00 P1(−1) P10 P11 · · · PN N is (N + 1)2 × 1 matrix and Y(Φ) is

I × (N + 1)2 matrix whose ith row is given as

� �

y(Φi ) = Y00 (Φi ) Y1−1 (Φi ) Y10 (Φi ) Y11 (Φi ) . . . YNN (Φi ) . (5.7)

of pressure at I microphones and (.)H denotes conjugate transpose of (.).

The inverse spherical Fourier transform relation is given by

N �

� n

P (k, r, Φ) ∼

= Pnm (k, r)Ynm (Φ). (5.8)

n=0 m=−n

Observing Equations 5.7 and 5.8, P (k, r, Φ) of the highest order N on a surface of a

sphere, has (N + 1)2 independent harmonic components. Hence, we can sample a sound ﬁeld

of order N , with at least (N + 1)2 points on the sphere without losing the information. In

other words, the number of microphone can be [11]

I ≥ (N + 1)2 . (5.9)

Beampatterns for uniform linear array (ULA) and uniform circular array (UCA) is given in

Section 3.4.2.3. As discussed therein, beampattern is typically measured as the array response

to a single plane wave. Hence, consider a sound ﬁeld composed of single plane wave with

unit amplitude incident from direction Ψl = (θl , φl ). In this case, utilizing Equation 2.50 and

deﬁnition of inverse SFT, Pnm can be written,

where bn (k, r) is called mode strength. The expression and signiﬁcance of mode strength

for open sphere and rigid sphere, is detailed in Section 2.5.1. By deﬁnition, the expression

in Equation 5.10 can be regarded as steering vector component in spherical harmonics do-

main. This is shown mathematically in Equation 5.34. Hence, similar to Equation 3.46, the

5.2 Fundamentals of Spherical Array Processing 75

(a) (b)

(c) (d)

Figure 5.2: Illustration of the spherical harmonics beampatterns (a) regular beampattern for

order N = 3, (b)regular beampattern for order N = 4 (c) DSB beampattern for order N = 3

and, (d) DSB beampattern for order N = 4

�� n

N � �

� ∗ �

G=� Wnm (k)Pnm (k, r)� (5.11)

n=0 m=−n

∗ (k) is the complex conjugate of SFT of beamforming weights and |(.)| is absolute

where Wnm

value of (.). In matrix form,

� H �

G = �Wnm (k)Pnm (k, r)� (5.12)

For beampatterns that are rotationally symmetric around array look direction, the beam-

forming weight is given by [90]

∗ dn m

Wnm (k) = [Y (Ψs )] (5.13)

bn n

where dn controls the beampattern and Ψs is array look direction (also called steering direc-

tion) [10]. Utilizing Equations 5.13 and 5.10 in 5.11, the beampattern expression becomes

�� n

N � �

� �

G(Ψl , Ψs ) = � dn Ynm (Ψs )[Ynm (Ψl )]∗ �. (5.14)

n=0 m=−n

5.3 Microphone Array Data Model in Spherical Harmonics Domain 76

where Ψl is varied in the ﬁeld of view of the array to get the array response for beampattern.

Spherical harmonics addition theorem [91] suggests

��N

2n + 1 �

� �

G(Θ) = � dn Pn (cos Θ)� (5.15)

4π

n=0

where Pn (.) is Legendre polynomial, and Θ is the angle between source direction and the

array look direction.

Various choice of dn leads to diﬀerent beampatterns. The beampattern achieved using

dn = 1 is called regular beampattern [92]. For delay sum beampattern, the beampattern

controlling parameter takes values as dn = |bn (k, r)|2 [50]. Regular and delay-and-sum beam-

patterns are shown in Figure 5.2.

Domain

In this Section, data model for received pressure is derived in spherical harmonics domain.

The spatio-temporal data model was derived in Section 3.3. The spatio-temporal data model

is used herein to derive spatio-frequency and subsequently the spherical harmonics data

model.

Let us consider L narrowband, far-ﬁeld sources incident over a spherical microphone array

with I microphones. The microphones are mounted on the surface of a sphere with radius ra .

The amplitude of lth source is given by sl (t). The time delay of arrival at center of the sphere

is taken to be zero. For far-ﬁeld source and omnidirectional sensor assumptions, the pressure

at ith microphone due to lth source will be sl (t − τi (Ψl )), where τi (Ψl ) is the propagation

delay between the reference point and the ith microphone for the lth source impinging from

direction Ψl . Hence, the total pressure at the ith microphone, can be expressed as

L

� � �

pi (Ψ; t) = sl t − τi (Ψl ) + vi (t) (5.16)

l=1

5.3 Microphone Array Data Model in Spherical Harmonics Domain 77

where vi (t) is the sensor noise at ith sensor and t = 1, 2, · · · , Ns , with Ns being the snapshots.

The data model in Equation 5.16 is referred to spatio-temporal data model.

Suppose that the microphone output pi (Ψ; t) is sampled with sampling frequency of 1/Ts

Hz. In general, if s(t) is band-limited to the interval [fl , fu ], then fu ≤ 1/2Ts . Computing

the discrete Fourier transform (DFT) of Equation 5.16, the spatio-frequency data model can

be written as

L

�

Pi (Ψ; fν ) = e−j2πfν τi (Ψl ) Sl (fν ) + Vi (fν ), ν = 1, · · · , Ns . (5.17)

l=1

ξν

fν = (5.18)

Ts Ns

Utilizing ωτi (Ψl ) = kTl ri from Equation 2.12 and dropping ν for notational simplicity, the

Equation 5.17 can be re-written in wavenumber (hence frequency) domain as

L

� Tr

Pi (Ψ; k) = e−jkl i

Sl (k) + Vi (k). (5.19)

l=1

Rearranging Equation 5.19 in matrix form, the ﬁnal data model in spatial domain can be

written as

P(Ψ; k) = A(Ψ; k)S(k) + V(k), (5.20)

of uncorrelated sensor noise. The noise components are assumed to be white, circularly

Gaussian distributed with zero mean and covariance matrix σ 2 I, I being the identity matrix.

The steering matrix can be expanded as

Tr Tr Tr

al = [e−jkl 1

, e−jkl 2

, . . . , e−jkl I

]T . (5.21)

It is to be noted that the spatio-frequency data model in Equation 5.20 is similar to spatio-

temporal data model as derived in Equation 3.9.

5.3 Microphone Array Data Model in Spherical Harmonics Domain 78

Motivation to work in spherical harmonics domain comes from reduced dimensionality and

Tr

ease of array processing. Recollecting from Section 2.5.1 that each term e−jkl i represents

plane wave model in spherical coordinate system. Hence, the steering vector component for

spherical microphone array in spatial domain can be written from Equation 2.50 as

N �

� n

−jkT

ail = e l ri = bn (k, r)[Ynm (θl , φl )]∗ Ynm (θi , φi ) (5.22)

n=0 m=−n

From Equations 5.22 and 5.21, the spatial steering matrix can be written in terms of spherical

harmonics as

A(Ψ; k) = Y(Φ)B(k, r)YH (Ψ) (5.23)

where Y(Φ) a I × (N + 1)2 matrix, whose ith row deﬁned in Equation 5.7, is collection of all

the spherical harmonics. The L × (N + 1)2 matrix Y(Ψ) can be expanded on similar lines

by replacing Φi with Ψl in Equation 5.7. The (N + 1)2 × (N + 1)2 matrix B(k, r) is given by

� �

B(k, r) = diag b0 (k, r), b1 (k, r), b1 (k, r), b1 (k, r), . . . , bN (k, r) . (5.24)

A particular mode strength bn (k, r) of order n is deﬁned for open sphere and rigid sphere in

Equation 2.51.

Substituting expression for steering matrix from (5.23), in Equation 5.20, multiplying

both sides by YH (Φ)Γ and utilizing Equation 5.6, the data model becomes

YH (Φ)ΓY(Φ) ∼

= I. (5.26)

It is to be noted that B(k, r) is a constant for a given array geometry and frequency of

operation. It is invertible for array geometry like rigid and dual [16]. Hence, multiplying

5.4 Advantage of Array Data Model Formulation in Spherical Harmonics

Domain 79

both side of Equation 5.27 by B−1 (k, r) we have the ﬁnal data model in spherical harmonics

domain as,

[Dnm ](N +1)2 ×Ns = [YH ](N +1)2 ×L [S]L×Ns + [Znm ](N +1)2 ×Ns (5.28)

where

It must be noted that η(k) is known for a given array geometry and frequency of operation.

Comparing the spatio-frequency data model in Equation 5.20 and spherical harmonics

data model in Equation 5.28, the steering matrix in spherical harmonics domain turns out

to be Anm (Ψ) = YH (Ψ). Hence, a particular steering vector can be written as

∗ ∗ ∗ ∗ ∗

anm (Ψl ) = yH (Ψl ) = [Y00 (Ψl ), Y1−1 (Ψl ), Y10 (Ψl ), Y11 (Ψl ), . . . , YNN (Ψl )]T . (5.32)

∗

anm = Ynm (Ψl ). (5.33)

For data model in Equation 5.27, the steering vector component will be

∗

anm = bn (k, r)Ynm (Ψl ) (5.34)

Harmonics Domain

Formulation of various problems in spatial domain and spherical harmonics domain is similar

[50]. Hence, the results of the spatial domain can directly be applied in the spherical har-

monics domain. Additionally, array processing in spherical harmonics domain has got some

exclusive advantages over spatial domain.

5.4 Advantage of Array Data Model Formulation in Spherical Harmonics

Domain 80

Observing the spatio-frequency data model in Equation 5.20 and spherical harmonics data

model in Equation 5.28, we conclude that the dimensionality of data is reduced from I to (N +

1)2 , as indicated by the relation in Equation 5.9. This is achieved by simple multiplication

B−1 (k, r)YH (Φ)Γ to spatio-frequency data model. Hence, spherical harmonics formulation

is computationally more eﬃcient.

The steering matrix in spherical harmonics domain assumes the form as YH (Ψ), which is

frequency independent. Due to frequency independent nature of steering matrix, frequency

smoothing can be performed which restore the rank of signal covariance matrix [16]. Subspace

based method MUSIC requires full rank of signal covariance matrix, which is not possible

when sources are correlated. MVDR requires full rank of array covariance matrix, as it

involves inverting of the covariance matrix.

Utilizing the spherical harmonics data model in Equation 5.28, the model array covariance

matrix can be written as

where RS (k) = E[S(k)SH (k)] is the signal covariance matrix. Utilizing Equation 5.30,the

model noise covariance matrix is given as RZnm (k) = σ 2 η(k)η H (k). It can be noted from

Equation 5.35 that the frequency smoothing of model array covariance matrix can be per-

formed by averaging RDnm over frequency which will smooth the signal covariance matrix

as steering matrix is frequency independent. However, it is not possible to perform rank

restoration of covariance matrix in spatial data model as the steering matrix is frequency

dependent. Frequency smoothed covariance matrix can be written as

Ns

1 �

R̃Dnm = RDnm (kν )

Ns

ν=1

5.5 Far-ﬁeld Source Localization using Spherical Microphone Array 81

where

Ns

1 �

Σ= η(kν )η H (kν ) (5.37)

Ns

ν=1

Ns

1 �

R̃S = RS (kν ) (5.38)

Ns

ν=1

rank of signal covariance matrix which is L. Therefore, averaging across L frequencies may

be suﬃcient. However, in practice, averaging is done with larger number of frequencies to

improve the estimation.

Processing in spherical harmonics domain provides ease of beamforming, due to reduced di-

mensionality of array covariance matrix and simple structure of steering vector component.

Here, we present the weights for MVDR beamforming. The problem formulation for beam-

forming weights is similar to spatial domain as presented in Section 3.4.2.2. The MVDR

based beamforming problem formulation is given by

Wnm

R−1

Dnm anm

Wnm = . (5.40)

anm H R−1

Dnm anm

It is to be noted that Wnm and RDnm are of lower dimension when compared to their spatial

domain counterpart. Also, comparing Equations 5.33 and 5.22, the steering vector component

has simple form in spherical harmonics domain while it involves double summation in the

case of spatial domain.

Array

The data model in Equation 5.28, corresponds to spherical harmonics data model for sources

in far-ﬁeld, and is utilized herein for source localization. Formulation for spherical harmonics

5.5 Far-ﬁeld Source Localization using Spherical Microphone Array 82

MVDR (SH-MVDR) and spherical harmonics MUSIC (SH-MUSIC) are presented for source

localization. Subsequently, a high resolution source localization method, spherical harmonics

MUSIC-Group delay (SH-MGD) is proposed.

Utilizing the weights as deﬁned in Equation 5.40, the power spectrum of MVDR in spherical

harmonics domain, can be written as [16]

1

PSH−M V DR (Ψ) = . (5.41)

anm H (Ψ)R−1

Dnm anm (Ψ)

The DOA estimates are given as L largest peaks in SH-MVDR power spectrum corresponding

to L sources. As a spatial ﬁlter, SH-MVDR steered to certain DOA Ψs , attenuates any other

signal impinging on the array from DOA �= Ψs . The performance of SH-MVDR is limited

when the sources are closely spaced. This is illustrated is Figure 5.3. SH-MVDR spectrum

is unable to resolve two closely space sources with location (20◦ ,50◦ ) and (15◦ ,60◦ ) at SNR

10 dB. An open sphere is taken for simulation.

300

250

200

SH−MVDR

150

100

P

50

0

100

50

Elevation(θ) 150 200

0 50 100

0 Azimuth(φ)

Figure 5.3: SH-MVDR spectrum for sources at (20◦ ,50◦ ) and (15◦ ,60◦ ), SNR=10 dB

5.5 Far-ﬁeld Source Localization using Spherical Microphone Array 83

as [16]

1

PSH−M U SIC (Ψ) = (5.42)

anm H (Ψ)Qnm Qnm H anm (Ψ)

where anm (Ψ) is a steering vector deﬁned in Equation 5.32, and Qnm is noise subspace ob-

tained from eigenvalue decomposition of modal covariance matrix RDnm computed in Equa-

tion 5.35. The denominator takes zero when Ψ corresponds to DOA owing to orthogonality

between noise eigenvector and steering vector. Hence, we get a peak in SH-MUSIC spectrum.

However, when sources are closely spaced, SH-MUSIC spectrum is unable to resolve them

accurately giving many spurious peaks. Figure 5.4 illustrates the SH-MUSIC spectrum for

an Eigenmike system [39]. The simulation is done considering open sphere for the sources at

(20◦ ,50◦ ) and (15◦ ,60◦ ) with SNR = 10 dB.

250

200

PSH−MUSIC

150

100

50

0

100

50

0 20 40 60

Azimuth(φ)

Figure 5.4: SH-MUSIC spectrum for sources at (20◦ ,50◦ ) and (15◦ ,60◦ ), SNR=10 dB

The SH-MUSIC spectrum gives many spurious peaks for closely spaced sources and hence

determining the candidate peak becomes challenging. It is to be noted from SH-MUSIC

expression that it utilizes magnitude spectra of anm H (Ψ)Qnm . To overcome the limitation

of SH-MUSIC, MUSIC-Group delay is formulated in spherical harmonics domain. This is

5.5 Far-ﬁeld Source Localization using Spherical Microphone Array 84

called spherical harmonics MUSIC-Group delay. It utilizes the diﬀerential of phase spectrum

(group delay) of SH-MUSIC. The spherical harmonics formulation of MUSIC-Group delay is

given by

U

�� �

PSH−M GD (Ψ) = |∇arg(anm H (Ψ)qu )|2 PSH−M U SIC (Ψ) (5.43)

u=1

where U = (N + 1)2 − L, ∇ is the gradient operator, arg(.) indicates unwrapped phase, and

qu represents the uth eigenvector of the noise subspace, Qnm . The ﬁrst term within (.) is the

group delay spectrum. The gradient is taken with respect to the spatial variable Ψ = (θ, φ).

The SH-MGD being the product spectrum removes the spurious peaks. The prominent peaks

corresponding to DOAs are retained as illustrated in Figure 5.5. In addition, the group delay

of MUSIC follows additive property, that enables the group delay spectrum to better preserve

the peaks than the magnitude spectrum following multiplicative property. A mathematical

proof for additive property of spatial domain MUSIC-Group delay spectrum is provided in

Section 4.2.3. All the spatial domain results are also valid in spherical harmonics domain [50].

Hence, the additive property of group delay spectrum also holds in the spherical harmonics

domain.

4

x 10

6

4

PSH−MGD

0

100

50

0 20 40

Azimuth(φ)

Figure 5.5: SH-MGD spectrum for sources at (20◦ ,50◦ ) and (15◦ ,60◦ ), SNR=10 dB

5.6 Formulation of Stochastic Cramér-Rao Bound for Far-ﬁeld Sources 85

Formulation of MUSIC requires noise to be spatially white. For spatially white noise, the

modal noise covariance matrix RZnm (k), is not spatially white. Hence, whitening is required

before applying SH-MUSIC or SH-MGD. Whitening the smoothed model array covariance

matrix results in covariance matrix as [16],

−1/2

R̃w

Dnm = Σ R̃Dnm Σ−1/2 , (5.44)

where Σ is computed from Equation 5.37. The new model array covariance matrix and

steering vector is used in the computation of SH-MUSIC and SH-MGD.

ﬁeld Sources

Cramér-Rao bound (CRB) places a lower bound on the variance of an unbiased estimator.

It provides a benchmark against which any estimator is evaluated. Although various source

localization algorithms have been proposed in spherical harmonics domain [13, 14, 15, 16,

17, 86], literature on Cramér-Rao bound in spherical harmonics domain is rare. Hence, it is

of suﬃcient interest to develop an expression for Cramér-Rao bound in spherical harmonics

domain.

In [75], CRB expression was derived for the case of ULA but without using the theory

of CRB. This is addressed in [94], which provides a textbook derivation for stochastic CRB.

Explicit CRBs of azimuth and elevation are developed in [95, 68] for planar arrays. CRB

analysis is presented for near-ﬁeld source localization in [96, 97] using ULA and UCA respec-

tively. In [98], closed-form CRB expressions has been derived for 3-D array made from ULA

branches. The formulations developed in previous work, makes use of the standard spatial

data model.

We will make use of transformed data model, Equation 5.28 as our observation. Under

stochastic assumption, the unknown signal S(k) is taken to be circularly Gaussian distributed

5.6 Formulation of Stochastic Cramér-Rao Bound for Far-ﬁeld Sources 86

with zero mean. The parameter vector will include the DOAs, signal covariances and the noise

variance. However, DOAs are usually the parameters of interests in array signal processing.

A closed-form expression for stochastic CRB(DOA) is presented herein. Hence, the unknown

direction parameter vector taken here is

α = [θ T φT ]T (5.46)

The existence of the stochastic CRB is ﬁrst validated for spherical harmonics data model. In

this context, the probability density function (PDF) of the observed data model is proved to

satisfy the regularity condition. Mean of the observation from Equations 5.28 and 5.30 under

stochastic signal assumption is

where C = ηη H . RDnm is replaced with RD for notational simplicity. Hence, for the

2

observation Dnm (k) ∈ C(N +1) with Dnm ∼ N (0, RD ), the probability density function

(likelihood function) can be written as [99, p. 502],

1 � �

p(Dnm (k); α) = exp − Dnm H RD −1 Dnm (5.48)

π (N +1)2 |R D|

Utilizing Dnm H RD −1 Dnm = tr{Dnm Dnm H RD −1 }, the log-likelihood function can be

written as

ln p(Dnm (k); α) = K0 − ln |RD | − tr{Dnm Dnm H RD −1 } (5.49)

where K0 is a constant and tr{.} denotes trace of matrix {.}. According to the CRB theorem

[100], if the likelihood function satisﬁes the regularity conditions

� �

∂ ln p(Dnm (k); α)

E =0, (5.50)

∂α

5.6 Formulation of Stochastic Cramér-Rao Bound for Far-ﬁeld Sources 87

then the variance of any unbiased estimator for rth parameter αr , follows the inequality

� 2 �

∂ ln p(Dnm (k); α)

F (α)rs = −E . (5.52)

∂αr ∂αs

∂ ln |RD | ∂RD −1

Having the identity, ∂α = tr{RD −1 ∂R

∂α },

D

∂α = −RD −1 ∂R

∂α RD

D −1

and knowing the

fact that expectation and trace operation commute, it can be shown that given likelihood

function satisﬁes the regularity conditions.

For developing the CRB expression, Fisher information matrix is obtained ﬁrst. The steps

involved in obtaining Fisher information matrix from Equation 5.28, are detailed in Appendix

A.1. The ﬁnal expression for Fisher information matrix block are :

� �

Fθφ = 2Re (RS YRD −1 YH RS )T � (Ẏθ RD −1 Ẏφ

H

) + (RS YRD −1 Ẏθ ) � (RS YRD −1 Ẏφ

H T H

)

� �

Fθθ = 2Re (RS YRD −1 YH RS )T � (Ẏθ RD −1 Ẏθ

H

) + (RS YRD −1 Ẏθ ) � (RS YRD −1 Ẏθ

H T H

)

where � is Hadamard product, Y represents Y(Ψ) and vector derivative of steering matrix

YH (Ψ) is deﬁned as

L

�

H

Ẏθ = ẎθHr

r=1

∂YH

ẎθHr = .

∂θr

∂Y H ∂Y H

The steps involved in computation of ∂θr and ∂φr is detailed in Appendix A.2. Fφφ and

Fφθ can be expressed in a similar manner. The Fisher Information matrix is ﬁnally given by

Fθθ Fθφ

F = .

Fφθ Fφφ

Behavioral study of the stochastic CRB at various SNRs and snapshots is presented for

Eigenmike microphone array [39]. The order of the array was taken to be N = 3. The

5.7 Performance Evaluation 88

signal and noise are taken to be Gaussian distributed with zero mean. A source with DOA

(20◦ , 50◦ ) is considered. Two sets of simulations are conducted with 500 independent trials.

In the ﬁrst set, simulation is conducted for 300 snapshots, at various SNRs. In the second

set, CRB is computed for various snapshots at SNR of 20dB. The CRB for azimuth and

elevation is plotted in Figure 5.6. It can be noted that a lower bound on CRB is attained at

higher SNR. A similar observation is made when larger number of snapshots are used.

−3

x 10

5

CRB(θ)

CRB(φ)

4

3

CRB

0

0 2.5 5 7.5 10 12.5 15 17.5 20

SNR(dB)

(a)

−4

x 10

1.4

CRB(θ)

1.2 CRB(φ)

0.8

CRB

0.6

0.4

0.2

0

50 75 100 125 150 175 200 225 250 275 300

Snapshots

(b)

Figure 5.6: Variation of CRB for elevation (θ) and azimuth (φ) estimation (a) at various SNR

with 300 snapshots, (b) with varying snapshots at SNR 20dB. Source is located at (20◦ , 50◦ ).

Experiments on source localization and source tracking [101] are performed to evaluate the

proposed SH-MGD method. Experiments on source localization are presented as cumulative

5.7 Performance Evaluation 89

root mean square error (RMSE) for noisy and reverberant environments. Statistical analy-

sis of the proposed method for source localization is presented as probability of resolution.

Probability of resolution is measure of ability to resolve the sources within a conﬁdence in-

terval. Additionally, narrowband source tracking results are also discussed. Tracking results

are presented as the estimated two dimensional trajectory of the elevation angle for a ﬁxed

azimuth. The proposed method is compared with SH-MUSIC and SH-MVDR. An Eigenmike

microphone array [39] is utilized in the experiments. It consists of 32 microphones embedded

in a rigid sphere of radius 4.2 cm.

Two far-ﬁeld sources at locations (30◦ , 35◦ ) and (50◦ , 60◦ ) are considered. A fourth order

Eigenmike system is utilized for localization experiments. The azimuth and elevation of the

sources are estimated using SH-MGD, SH-MUSIC and SH-MVDR at various SNRs. Two

hundred independent trials are performed and the locations of the sources are estimated.

The results are presented as cumulative root mean square error. The cumulative RMSE is

deﬁned as

T 2

1 �� (t) (t)

RM SE = [(θl − θ̂l )2 + (φl − φ̂l )2 ], (5.53)

4T

t=1 l=1

where, t indicates trial number, T is the total trials and l denotes the source number. (θl , φl )

is the actual source location while (θ̂l , φ̂l ) is the corresponding estimates.

The cumulative RMSE is presented herein using bar plot as shown in Figure 5.7 for

various SNRs. It is to be noted that the proposed SH-MGD performs reasonably better

when compared to conventional methods like SH-MUSIC and SH-MVDR at low SNRs. High

error at low SNR is noted since, SH-MVDR is unable to resolve the two sources.

ronment

To evaluate the proposed method for robustness under reverberation, source localization

experiments are conducted at various reverberation times T60 . A detailed discussion on

reverberation can be found in Section 3.3.2. A room with dimensions, 7.3m × 6.2m × 3.4m

5.7 Performance Evaluation 90

12

SH−MGD

SH−MUSIC

10

SH−MVDR

8

RMSE

0

5 10 15 20

SNR(dB)

Figure 5.7: Cumulative RMSE in source angle estimation at various SNRs for two hundred

iterations. The sources are located at (30◦ , 35◦ ) and (50◦ , 60◦ ).

is utilized in the experiments. The room impulse response for spherical microphone array is

generated as in [102].

The experiments are performed at various reverberation times (T60 ). Two far-ﬁeld sources

with location (30◦ , 60◦ ) and (35◦ , 50◦ ) are considered. The order of the array is assumed to

be N = 3. Localization experiments are conducted for 300 iterations at three diﬀerent

reverberation times, 150 ms, 200 ms and 250 ms. The experiment is repeated for three

methods, SH-MGD, SH-MUSIC and SH-MVDR. Results are presented as RMSE values in

Table 5.1. SH-MGD has reasonably lower RMSE than other conventional methods.

Table 5.1: Comparison of RMSE of various methods at diﬀerent reverberation time (T60 ).

T60 T60 T60

Angle Method

(150ms) (200ms) (250ms)

SH-MGD 0.6403 0.6419 0.6475

θ SH-MUSIC 0.6688 0.8144 0.7989

SH-MVDR 1.1034 1.1579 1.1738

SH-MGD 1.4387 1.4665 1.4866

φ SH-MUSIC 1.7866 1.9127 1.6484

SH-MVDR 2.276 2.3481 2.4927

5.7 Performance Evaluation 91

In this Section, statistical evaluation of source localization methods is illustrated using prob-

ability of resolution at various SNRs. The probability of resolution is given by

T 2

1 ��� (t) (t) �

Pr = P r(|θl − θ̂l | ≤ ζ) + P r(|φl − φ̂l | ≤ ζ)

4T

t=1 l=1

�T � 2

1 � (t) (t) �

= sgn(ζ − |θl − θ̂l |) + sgn(ζ − |φl − φ̂l |) , (5.54)

4T

t=1 l=1

where ζ is conﬁdence interval, P r(.) denotes the probability of an event, and sgn(x) is deﬁned

as

1 if x ≥ 0

sgn(x) = (5.55)

0 if x < 0

Two sources with locations (30◦ , 35◦ ) and (50◦ , 60◦ ) are considered. The conﬁdence inter-

val is taken to be ζ = 3◦ . A 4th order spherical microphone array is used in the experiments.

The probability is calculated for two hundred independent trials. Results on probability of

resolution is listed in table 5.2 for various SNRs. At low SNR both SH-MGD and SH-MUSIC

outperforms SH-MVDR. The zero probability of resolution of SH-MVDR is because of its

inability to resolve sources at low SNR in the given conﬁdence interval. At high SNRs, all

the methods provides reasonably similar performance.

Table 5.2: Probability of resolution at various SNRs for 200 iterations. Sources are taken at

(30◦ , 35◦ ) and (50◦ , 60◦ ).

SNR SNR SNR SNR

Methods

(5dB) (10dB) (15dB) (20dB)

SH-MGD 0.9167 0.9971 1 1

SH-MUSIC 0.9444 0.9829 0.9987 1

SH-MVDR 0 0 0.4179 1

Source tracking is one of the major application of acoustic source localization in audio surveil-

5.7 Performance Evaluation 92

lance. In this section, elevation angle of a moving source is tracked. The source continuously

emits narrowband signal impinging on the spherical array. The azimuthal angle (φ) of the

source is ﬁxed at 45◦ and the elevation angle is varied as the trajectory given in Figure 5.8.

80

60

Elevation(θ)

40

20

0

0 5 10 15 20 25

Time(sec)

Figure 5.8: Trajectory of elevation angle (θ) followed by the moving source with time for a

ﬁxed azimuth φ = 45◦ .

The elevation angle is tracked at ﬁxed azimuth using SH-MGD and SH-MUSIC methods.

Figure 5.9(a) illustrates the tracked trajectory by SH-MUSIC method. It can be noted that

the trajectory is not estimated well due to spurious peaks that are present in SH-MUSIC

spectrum. Trajectory obtained from SH-MGD is shown in Figure 5.9(b). The estimated

trajectory is close to the actual trajectory indicating eﬃcient tracking.

100 100

80 80

Elevation(θ)

Elevation(θ)

60 60

40 40

20 20

0 0

0 5 10 15 20 25 0 5 10 15 20 25

Time(sec) Time(sec)

(a) (b)

Figure 5.9: Tracking result for elevation (a)SH-MUSIC and (b) SH-MGD. The azimuth is

ﬁxed at 45◦ .

5.8 Summary and Contributions 93

The tracking experiment is repeated for 25 diﬀerent trajectories. The elevation angle is

ﬁxed at 90◦ . The azimuth angle is varied as diﬀerent sinusoids, similar to one shown in Figure

5.8. An average error distribution (AED) of the tracking error is obtained. AED plots for

tracking error obtained from SH-MUSIC and SH-MGD, is illustrated in Figure 5.10. It may

be noted that error variance for SH-MGD is smaller than that of SH-MUSIC.

400

No. of cases

200

0

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6

Error deviation in SH− MUSIC

400

No. of cases

200

0

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6

Error deviation in SH−MGD

Figure 5.10: Average error distribution plot for tracking error using SH-MUSIC and SH-MGD

Method.

In this chapter, a far-ﬁeld data model in the spherical harmonics domain is formulated. Ad-

vantage of array processing in spherical harmonics domain is also detailed. A high resolution

source localization method called the spherical harmonics MUSIC-Group delay, for spherical

microphone array is proposed. Formulation and analysis of Cramér-Rao bound for far-ﬁeld

sources is presented in spherical harmonics domain. Experimental results on multi-source

localization in noisy and reverberant environments, indicate the robustness of the method.

RMSE and statistical analysis is presented to evaluate the performance of source localiza-

5.8 Summary and Contributions 94

tion methods. Experiments on tracking a single source are motivating enough to extend this

approach to track multiple sources in real time that are closely spaced in a Kalman ﬁlter

framework.

Chapter 6

root-MUSIC

6.1 Introduction

Accurate and search free algorithms for direction of arrival (DOA) estimation has been a very

active area of research in source localization. Root-MUSIC (MUltiple SIgnal Classiﬁcation)

[20] and Estimation of Signal Parameters using Rotational Invariance Techniques, ESPRIT

[21], fall under this category. As discussed in Section 3.4.3.5, the root-MUSIC estimates

DOA as the roots of MUSIC polynomial owing to Vandermonde structure of array mani-

fold (steering vector) in case of uniform linear array (ULA). Such structure is not observed

in array manifold for uniform circular array (UCA) [48]. Zoltowski proposed beamspace

transformation based on the phase mode excitation, to get Vandermonde structure in array

manifold with respect to azimuth angle [22]. Hence, it enables to apply the root-MUSIC for

azimuth estimation at given elevation. The technique was further extended to sparse UCA

root-MUSIC which utilizes modiﬁed beamspace transformation in [103]. Another approach

for extending the ULA root-MUSIC for a planar array, is presented in [104] using manifold

separation. The idea of manifold separation is to write the planar array steering vector as

a product of a characteristic matrix of the array and a vector with Vandermonde structure

depending on the azimuth angle. The manifold separation utilizing spherical harmonics (SH)

6.2 Formulation of root-MUSIC in Spherical Harmonics Domain 96

After the introduction of higher order spherical microphone array and associated signal

processing in [10, 11], various existing DOA estimation techniques have been reformulated

in the spherical harmonics domain. The element space MUSIC is implemented in terms of

spherical harmonics, called SH-MUSIC, in [15, 16]. The minimum variance distortionless

response (MVDR) spectrum in terms of spherical harmonics, SH-MVDR, is utilized for DOA

estimation in [16].

In this Chapter, SH-root-MUSIC (SH-RM), a polynomial rooting technique, for DOA

estimation using spherical microphone array is proposed. Root-MUSIC technique in general,

has low computational complexity because of the direct polynomial solution [106]. Also,

it provides exact solution and is not limited by the discretization issues associated with the

SH-MUSIC and SH-MVDR methods for DOA estimation. However, as in earlier work on root-

MUSIC for planar array [22, 103, 104], the proposed SH-root-MUSIC can estimate azimuth

at ﬁxed elevation. This is because, all approaches to root-MUSIC induces Vandermonde

structure in azimuth. In the following Section, formulation of root-MUSIC and proof of

Vandermonde structure in azimuth is presented.

main

The order of spherical microphone array is deﬁned in the Section 2.5.1. A sound ﬁeld of L

plane-waves is incident on the array with wavenumber k. The lth source location is denoted

by Ψl = (θl , φl ) where θ is elevation angle and φ is azimuthal angle . Similarly, the ith sensor

location is given by Φi = (θi , φi ).

The spherical harmonics data model, from Equation 5.28 can be written as

where YH (Ψ), is (N +1)2 ×L steering matrix in spherical harmonics domain and (.)H denotes

6.2 Formulation of root-MUSIC in Spherical Harmonics Domain 97

∗ ∗ ∗ ∗ ∗

anm (Ψ) = yH (Ψ) = [Y00 (Ψ), Y1−1 (Ψ), Y10 (Ψ), Y11 (Ψ), . . . , YNN (Ψ)]T . (6.2)

�

m (2n + 1)(n − m)! m

Yn (Ψ) = Pn (cosθ)ejmφ (6.3)

4π(n + m)!

∀0 ≤ n ≤ N, −n ≤ m ≤ n.

Ynm are solutions to the Helmholtz equation, Pnm are associated Legendre functions and (.)∗

denotes complex conjugate of (.).

SH-root-MUSIC estimates DOAs as roots of the SH-MUSIC polynomial. Hence, rewriting

the expression for SH-MUSIC spectrum from Equation 5.42, we have

1

PSH−M U SIC (Ψ) = H (Ψ)Q H

(6.4)

anm nm Qnm anm (Ψ)

where Qnm is noise subspace obtained from eigenvalue decomposition of modal array covari-

ance matrix RDnm . The model array covariance matrix is written as

Figure 6.1: Plot of SH-MUSIC illustrating DOA estimation using fourth order Eigenmike

system. Sources are located at (20◦ ,40◦ ) and (20◦ ,70◦ ) with SNR 15dB.

6.2 Formulation of root-MUSIC in Spherical Harmonics Domain 98

The SH-MUSIC plot is shown in Figure 6.1 for two sources at azimuth (40◦ ,70◦ ) and

co-elevation 20◦ . The two peaks correspond to the two sources. It is to be noted that SH-

MUSIC spectrum needs human intervention or a comprehensive search algorithm to estimate

the DOA of the desired source. The resolution is also limited by the discretization at which

the spectrum is evaluated. The SH-root-MUSIC overcomes these limitations in estimating

the DOA. However, for root-MUSIC to be applicable in spherical harmonics domain, the

Vandermonde structure in spherical harmonics steering vector need to be shown.

Vandermonde structure in spherical harmonics steering vector is illustrated herein using

manifold separation technique. Utilizing the Equations 6.2 and 6.3, the steering vector for

co-elevation θ0 , can be written in more compact form as

yH (Ψ) = yH (θ0 , φ)

�

(2n + 1)(n − m)! m

where, fnm = Pn (cosθ0 ) (6.7)

4π(n + m)!

The matrix d(φ) consists of only the exponent terms containing the azimuth angle. Each

submatrix of d(φ) corresponding to a particular order, follows Vandermonde structure. For

example, a submatrix of d(φ) corresponding to ﬁrst order is [ejφ , 1, e−jφ ] that exhibits Van-

dermonde structure.

Utilizing Equations 6.2 and 6.8 in 6.4, the SH-MUSIC cost function can be written as

−1

PSHM (φ) = dH (φ)F H (θ0 )Qnm Qnm H F (θ0 )d(φ)

Substituting z = ejφ in Equation 6.11, the SH-MUSIC cost function assumes a form of

6.2 Formulation of root-MUSIC in Spherical Harmonics Domain 99

2N

�

−1

PSHM (φ) = Cu z u (6.12)

u=−2N

where the co-eﬃcients Cu are obtained mathematically. The polynomial has 4N roots. How-

1

ever, these are not independent. If z is root of the polynomial then z∗ will also be the root.

Hence, 2N roots will be within the unit circle and 2N outside the unit circle. Of the 2N

roots within the unit circle, L roots close to unit circle correspond to the DOAs. This is

illustrated in Figure 6.2 for a spherical microphone array with order N = 4.

0.5

Imaginary Part

−0.5

−1

−1 −0.5 0 0.5 1

Real Part

Figure 6.2: Plot of SH-root-MUSIC illustrating the actual DOA estimates (red stars) and

noisy DOA estimates (blue triangles). A fourth order Eigenmike system is used. Sources are

located at (20◦ ,40◦ ) and (20◦ ,70◦ ) with SNR 15dB.

The roots are plotted for two sources with co-elevation angle 20◦ and azimuth angle

(40◦ ,70◦ ) at SNR 15dB. All the roots within and near unit circle are shown in the ﬁgure. The

DOA can be estimated from the roots by using the relation,

φ = �(ln(z)) (6.13)

6.3 Performance Evaluation 100

source localization. The ﬁrst category of experiments provides results on source localization

as root mean square error (RMSE) at various SNRs. Additionally, statistical importance of

the method is shown based on probability of resolution at various SNRs.

An Eigenmike microphone array [39] is used for the simulation. It consists of 32 mi-

crophones embedded in a rigid sphere of radius 4.2 cm. The order of the array is taken to

be N = 4. Two sources with azimuth (40◦ , 80◦ ) and co-elevation 20◦ are considered. The

additive noise is assumed to be zero mean Gaussian distributed. A total of 500 independent

Monte Carlo trials are run for the RMSE and probability of resolution estimation.

The experiments on source localization are presented as cumulative RMSE at various signal

to noise ratios (SNR). The proposed method is compared with other subspace-based meth-

ods SH-MUSIC and SH-MGD. The results are presented as cumulative RMSE for both the

sources. The RMSE is calculated as

T 2

1 �� (t)

RM SE = [(φl − φ̂l )2 ], (6.14)

2T

t=1 l=1

where, t indicates trial number and l denotes the source number. The RMSE values are given

in Table 6.1. The high RMSE for SH-root-MUSIC and SH-MUSIC at low SNR is because of

their inability to resolve the sources.

Table 6.1: Comparison of RMSE of various source localization methods at diﬀerent SNR

SNR SNR SNR SNR SNR SNR

Method

(5dB) (10dB) (15dB) (20dB) (25dB) (30dB)

SH-MGD 2.999 1.848 1.490 1.366 1.360 1.381

SH-RM 8.283 3.308 0.997 0.873 0.662 0.470

SH-MUSIC 10.273 7.321 0.770 0.722 0.731 0.784

6.4 Summary and Contributions 101

for various SNRs. The conﬁdence interval of ζ = 10◦ is used while calculating the probability

over ﬁve hundred independent trials.

1

Probability of Resolution

0.8

0.6

0.4

SH−MGD

SH−RM

0.2 SH−MUSIC

0

5 7.5 10 12.5 15 17.5 20 22.5 25 27.5 30

SNR(dB)

Figure 6.3: Probability of resolution plot for two sources with azimuth (40◦ , 80◦ ) and co-

elevation 20◦ .

T 2

1 �� (t)

Pr = [P r(|φl − φ̂l | ≤ ζ)]

2T

t=1 l=1

�T � 2

1 (t)

= [sgn(ζ − |φl − φ̂l |)], (6.15)

2T

t=1 l=1

1 if x ≥ 0

sgn(x) = (6.16)

0 if x < 0

The result is presented as probability of resolution plot in Figure 6.3. It is to be noted

that the SH-root-MUSIC performs better than the SH-MUSIC method. However, SH-MGD

performs better than both of these methods.

In this chapter, a high resolution source localization method called SH-root-MUSIC is pro-

posed in the spherical harmonics domain. SH-root-MUSIC does not require any search for

6.4 Summary and Contributions 102

estimating the DOAs. It provides DOA estimates as direct roots of SH-MUSIC polynomial.

The Vandermonde structure of array manifold in spherical harmonics domain is shown using

manifold separation technique. The robustness of the method is illustrated using source local-

ization experiments at various SNRs. RMSE and probability of resolution measures indicate

the relevance of the proposed method.

Chapter 7

Spherical Microphone Array

7.1 Introduction

There has been extensive work on far-ﬁeld source localization using spherical microphone

array. The element space Multiple SIgnal Classiﬁcation (MUSIC) [3] is implemented in terms

of spherical harmonics (SH), called SH-MUSIC, in [15, 16]. Estimation of Signal Parameters

via Rotational Invariance Techniques (ESPRIT) [21] algorithm is extended for spherical array

in [17, 86]. The minimum variance distortionless response (MVDR) [2] method in terms of

spherical harmonics, SH-MVDR, is utilized for DOA estimation in [16]. MUSIC-Group delay

[19, 25] has also been extended for spherical array in [14]. All these source localization

methods deals with planar wavefront of far-ﬁeld sources. However, in applications like Close

Talk Microphone (CTM) and video conferencing, assumption of planar wavefront is no more

valid.

Principles of near-ﬁeld source localization using spherical microphone array was ﬁrst de-

tailed in [107]. A spatially orthonormal decomposition of the sound ﬁeld due to a near-

ﬁeld source, was used. The work proposed a close-talk spherical microphone array which

is orientation-invariant with respect to attenuation of far-ﬁeld interferences. A method to

estimate the distance of the array to a near-ﬁeld source using the ratio of mode energies

7.2 Formulation of Near-ﬁeld Array Data Model in Spherical Harmonics

Domain 104

of a spherical orthonormal expansion of the sound ﬁeld was described [107]. The near-ﬁeld

criteria for spherical array was formally formulated in [24] in terms of range of the near-ﬁeld

sources. However, work related to simultaneous estimation of range and bearing of multiple

near-ﬁeld sources using spherical microphone array is hitherto not been investigated. Our

work on near-ﬁeld source localization [13], provides an insight on this. A detailed analysis is

required in this context.

In this chapter, a new data model is formulated for near-ﬁeld source localization in spher-

ical harmonics domain. Various methods for simultaneous estimation of the range and the

bearing of near-ﬁeld sources are proposed. Near-ﬁeld beamforming weights are computed for

radial ﬁltering analysis. Cramér-Rao bound is formulated for evaluating the estimators.

Harmonics Domain

In this Section, the formulation of near-ﬁeld data model in spherical harmonics domain is

described. The formulation starts with near-ﬁeld spatio-temporal data model. This is used

herein to derive spatio-frequency and subsequently the spherical harmonics data model for

near-ﬁeld sources.

A spherical microphone array of order N , radius ra and number of sensors I is considered. The

order of spherical microphone array is deﬁned in the Section 2.5.1. A sound ﬁeld of spherical-

waves with wavenumber k from L near-ﬁeld sources is incident on the array. The lth source

location is denoted by rl = (rl , Ψl ), where Ψl = (θl , φl ). The elevation angle θ is measured

down from positive z axis, while the azimuthal angle φ is measured counterclockwise from

positive x axis. Similarly, the ith sensor location is given by ri = (ra , Φi ), where Φi = (θi , φi ).

Theory of spherical wave propagation as described in Section 2.4.2, suggests that the pressure

at the ith microphone due to lth near-ﬁeld source sl (t) can be expressed as

sl (t − τi (Ψl ))

pil (t) = (7.1)

|ri − rl |

7.2 Formulation of Near-ﬁeld Array Data Model in Spherical Harmonics

Domain 105

Far-field

Near-field

Figure 7.1: Illustration of Near-ﬁeld and far-ﬁeld regions around spherical microphone array.

The ith microphone is positioned at ri and lth source at rl .

|ri − rl |

τi (Ψl ) = (7.2)

c

with c being the speed of sound. The total pressure at the ith microphone in presence of

noise can be written as

L

� sl (t − τi (Ψl ))

pi (t) = + vi (t). (7.3)

|ri − rl |

l=1

where vi (t) is the noise at ith microphone and t = 1, 2, · · · , Ns , with Ns being the snapshots.

Computing the discrete Fourier transform (DFT), the Equation 7.3 turns out to be

L

� e−j2πfν τi (Ψl )

Pi (fν ) = Sl (fν ) + Vi (fν ), ν = 1, · · · , Ns . (7.4)

|ri − rl |

l=1

where j is the unit imaginary number. Utilizing the Equation 7.2, and dropping ν for nota-

tional simplicity, the Equation 7.4 can be re-written in wavenumber (and hence frequency)

domain as

L

� e−jk|ri −rl |

Pi (k) = Sl (k) + Vi (k). (7.5)

|ri − rl |

l=1

7.2 Formulation of Near-ﬁeld Array Data Model in Spherical Harmonics

Domain 106

Rearranging Equation 7.5 in matrix form, the near-ﬁeld data model in spatial domain can

ﬁnally be written as

P(k) = A(r, Ψ)S(k) + V(k) (7.6)

matrix of uncorrelated sensor noise. The noise components are assumed to be white, circularly

Gaussian distributed with zero mean and covariance matrix σ 2 I. I is an identity matrix.

Dependency of A on k is removed for notational simplicity. The steering matrix A(r, Ψ) is

� �

A(r, Ψ) = a(r1 , Ψ1 ) a(r2 , Ψ2 ) . . . a(rL , ΨL ) , where (7.7)

� �T

−jk|r1 −rl | e−jk|r2 −rl | e−jk|rI −rl |

a(rl , Ψl ) = e |r −r . . . . (7.8)

1 l| |r2 −rl | |rI −rl |

To utilize the advantage of spherical harmonics signal processing, the spatial domain data

model in Equation 7.6 is converted to data model in spherical harmonics domain. The ith

term in Equation 7.8 refers to pressure at location ri due to lth unit amplitude source. This

can be alternatively expanded in terms of spherical harmonics as (Section 2.5.2),

N n

e−jk|ri −rl | � �

= bn (k, ra , rl )Ynm (Ψl )∗ Ynm (Φi ) (7.9)

|ri − rl | m=−n

n=0

where bn (k, ra , rl ) is nth order near-ﬁeld mode strength. It is related to far-ﬁeld mode strength

bn (k, ra ) as [24]

The far-ﬁeld mode strength for open sphere (virtual sphere) and rigid sphere is given by

� j � (kra ) �

= 4πj n jn (kr) − n� hn (kr) , rigid sphere. (7.12)

hn (kra )

Here jn is spherical Bessel function of ﬁrst kind, hn is spherical Hankel function of ﬁrst kind

and � refers to ﬁrst derivative. As discussed in Section 2.5.3, for signal processing in spherical

7.2 Formulation of Near-ﬁeld Array Data Model in Spherical Harmonics

Domain 107

harmonics domain, mode strength is the deciding criteria for near-ﬁeld extent rather than

the usual Fraunhofer distances. This leads to near-ﬁeld range as

kmax

ra ≤ rl ≤ ra (7.13)

k

N

where, kmax = (7.14)

ra

Ynm in Equation 7.9 represents spherical harmonic of order n and degree m. The spherical

harmonics can be written from Section 2.5 as

�

(2n + 1)(n − m)! m

Ynm (Φ) = Pn (cosθ)ejmφ (7.15)

4π(n + m)!

∀0 ≤ n ≤ N, −n ≤ m ≤ n

with Pnm being the associated Legendre functions, and (.)∗ denotes the complex conjugate.

Substituting the expression from Equation 7.9 in Equation 7.8, the steering matrix in

Equation 7.7 can be written as

� �

A(r, Ψ) = Y(Φ) B(k, ra , r1 )yH (Ψ1 ) B(k, ra , r2 )yH (Ψ2 ) · · · B(k, ra , rL )yH (ΨL )

(7.16)

where Y(Φ) is I × (N + 1)2 matrix. A particular ith row vector can be written as

� �

y(Φi ) = Y00 (Φi ) Y1−1 (Φi ) Y10 (Φi ) Y11 (Φi ) . . . YNN (Φi ) . (7.17)

y(Ψl ) is 1 × (N + 1)2 matrix with similar structure as in Equation 7.17, replacing Φi with

Ψl . The (N + 1)2 × (N + 1)2 matrix B(rl ) is given by

� �

B(k, ra , rl ) = diag b0 (k, ra , rl ), b1 (k, ra , rl ), b1 (k, ra , rl ), b1 (k, ra , rl ), . . . , bN (k, ra , rl ) (7.18)

For pressure sampled at I microphones of the spherical array, the spherical Fourier transform

from Section 5.2.1, can be written as

� �T

where Pnm = P00 P1(−1) P10 P11 · · · PN N and Γ = diag(a1 , a2 , · · · , aI ) is a di-

agonal matrix with elements ai being the sampling weights [88]. Additionally, following

orthogonality property of spherical harmonics holds,

YH (Φ)ΓY(Φ) = I (7.20)

7.3 Near-ﬁeld Source Localization in Spherical Harmonics Domain 108

Substituting Equation 7.16 in 7.6, multiplying both sides by YH (Φ)Γ and utilizing Equa-

tions 7.19 and 7.20, the ﬁnal spherical harmonics data model for near-ﬁeld source localization

becomes

� �

Pnm (k) = B(r1 )yH (Ψ1 ) · · · B(rL )yH (ΨL ) S(k) + Vnm (k). (7.21)

Dependency of B(rl ) on k and ra is dropped for notational simplicity. The near-ﬁeld data

model can be written in more compact way as

where Anm (r, Ψ) will be called near-ﬁeld steering matrix in spherical harmonics domain.

The new data model is very similar to spatial domain data model in 7.6. However, the new

steering matrix is given by

� �

Anm (r, Ψ) = B(r1 )yH (Ψ1 ), · · · , B(rL )yH (ΨL ) . (7.23)

The data model in Equation 7.22 is utilized in the ensuing Section for near-ﬁeld source

localization.

main

Following the development of the near-ﬁeld data model, various methods for joint range and

bearing estimation are proposed in this Section. MUSIC, MUSIC-Group delay and MVDR

are formulated in spherical harmonics domain. From the expression of near-ﬁeld steering

matrix in 7.23, a near-ﬁeld steering vector in spherical harmonics domain can be written as

The search has to be performed over r as in Equation 7.13 and over Ψ with (0 ≤ θ ≤ π, 0 ≤

φ ≤ 2π).

7.3 Near-ﬁeld Source Localization in Spherical Harmonics Domain 109

The MUSIC magnitude spectrum for near-ﬁeld source localization in spherical harmonics

domain, can now be deﬁned as

1

PSH−M U SIC (r, Ψ) = (7.25)

aH H

nm (r, Ψ)Qnm Qnm anm (r, Ψ)

where anm (r, Ψ) is near-ﬁeld steering vector in SH domain, deﬁned in Equation 7.24 and

Qnm is noise subspace obtained from eigenvalue decomposition of modal covariance matrix,

RPnm , deﬁned as

RPnm = E[Pnm (k)PH

nm (k)]. (7.26)

The denominator of the MUSIC spectrum tends to zero when (r, Ψ) corresponds to source

location owing to orthogonality between noise eigenvector and steering vector. Hence, a peak

is obtained in SH-MUSIC spectrum at location of the source.

Source Localization

nm Qnm as it is clear from Equation 7.25. As

discussed in Chapter 4, a sharp change in unwrapped phase is seen at the locations of the

sources. Hence, the negative diﬀerentiation of unwrapped phase spectrum (group delay)

results in peak at the source locations. However, the group delay spectrum sometimes may

have spurious peaks due to microphone calibration errors. The product of MUSIC and

Group delay spectra, called MUSIC-Group delay, removes such spurious peaks and gives

high resolution location estimation. The Spherical Harmonics MUSIC-Group delay (SH-

MGD) spectrum for near-ﬁeld source localization is formulated as

��

U �

PSH−M GD (r, Ψ) = |∇arg(aH

nm (r, Ψ)q u )| 2

PSH−M U SIC (7.27)

u=1

where U = (N + 1)2 − L, ∇ is the gradient operator, arg(.) indicates unwrapped phase, and

qu represents the uth eigenvector of the noise subspace, Qnm . The ﬁrst term within (.) is the

group delay spectrum. The gradient is taken with respect to (r, Ψ).

7.3 Near-ﬁeld Source Localization in Spherical Harmonics Domain 110

x : 55

Y : 40

1 x : 55 1 z:1

Y : 0.08 x : 60 x : 60

z:1 Y : 0.06 Y : 30

SH−MUSIC

SH−MUSIC

z : 0.66 z : 0.71

0.5 0.5

0 0

0.1 80

0.08 80 60 80

60 40 60

Range(m) 0.06 20

40 °

Azimuth(φ ) 20 20

40

0.04 0 °

Elevation(θ ) 0 0 Elevation(φ°)

(a) (d)

x : 55

Y : 40

1 x : 55 1 z:1 x : 60

Y : 0.08 x : 60 Y : 30

z:1 Y : 0.06 z : 0.8

z : 0.71

SH−MGD

SH−MGD

0.5 0.5

0 0

0.1 80

0.08 80 60 80

60 40 60

0.06 40 Azimuth(φ°) 20 40

Range(m) 20 20

0.04 0 °

Elevation(θ ) 0 0 Elevation(θ°)

(b) (e)

x : 55

Y : 40

1 x : 55 1 z:1

Y : 0.08

x : 60 x : 60

z:1

SH−MVDR

Y : 30

SH−MVDR

Y : 0.06

z : 0.96 z : 0.96

0.5 0.5

0 0

0.1 80

0.08 80 60 80

60 40 60

0.06 40 20 40

Range(m) 20 Elevation(θ°) Azimuth(φ) 20 °

Elevation(θ )

0.04 0 0 0

(c) (f)

Figure 7.2: Illustration of range and elevation estimation by (a) SH-MUSIC method (b) SH-

MGD method (c) SH-MVDR method for ﬁxed azimuth. Illustration of elevation and azimuth

estimation using (d) SH-MUSIC method (e) SH-MGD method (f) SH-MVDR method for

ﬁxed range. The sources are at (0.06m,60◦ ,30◦ ) and (0.08m,55◦ ,40◦ ) at an SNR of 10dB.

tion

The conventional MVDR minimizes the contribution of interference impinging on the array

from a direction other than the desired DOAs, while it maintains unity gain in the look direc-

tion. Under such conditions, the SH-MVDR power spectrum for near-ﬁeld source localization

7.4 The Near-ﬁeld MVDR Beampattern Analysis 111

can be written as

1

PSH−M V DR (r, Ψ) = −1 . (7.28)

aH

nm (r, Ψ)RPnm anm (r, Ψ)

Figures 7.2 illustrates the performance of SH-MUSIC, SH-MGD and SH-MVDR for range

and bearing estimation using spherical microphone array. The simulation was done consid-

ering rigid sphere with two closely spaced sources at (0.06m,60◦ ,30◦ ), (0.08m,55◦ ,40◦ ) and

SNR 10dB. Figures 7.2(a), 7.2(b) and 7.2(c) show plots corresponding to range and elevation

estimation with known azimuth. Plots in Figure 7.2(d), 7.2(e) and 7.2(f) show azimuth and

elevation of the sources at the given range. It can be noted that SH-MUSIC and SH-MGD

being subspace-based methods have higher resolution than SH-MVDR. The high resolution

of SH-MGD is due to additive property of group delay spectrum. A mathematical proof

for additive property of spatial domain MUSIC-Group delay spectrum is provided in Section

4.2.3. All the spatial domain results are also valid in spherical harmonics domain [50]. Hence,

the additive property of group delay spectrum also holds in the spherical harmonics domain.

Having developed data model and localized sources, near-ﬁeld beampattern is presented in

the ensuing Section. The beampattern is presented for near-ﬁeld MVDR spatial ﬁlter, that

preserves unity gain in look direction while minimizing the output power. The estimated

location can be used for steering the array in look direction for spatial ﬁltering.

Various radial compensation ﬁlters have been utilized in [50, 24], for the design of near-

ﬁeld beampatterns. However, all these radial ﬁlters are designed assuming rotationally sym-

metric around the look direction. The weight vector in this context, is given as Wnm =

[W00 , W1(−1) , W10 , W11 , · · · , WN N ]T , where

dn

Wnm = Y m∗ (Ψs ). (7.29)

bn (k, r, rs ) n

Wnm (k) is spherical Fourier transform of W (k). The dependency on k is dropped for sim-

plicity. Here dn is the design parameter for controlling the beampattern, Ψs is look direction

and rs is look distance. A discussion on some optimal beamforming techniques is also given

in [50, Chapter 11]. However, the optimal beamforming techniques are limited to far-ﬁeld

7.5 Cramér-Rao Bound Analysis 112

sources only. Here we present optimal near-ﬁeld beamforming and in particular, MVDR, a

widely used one. As seen for the case of localization, beamforming techniques can also be

re-formulated in spherical harmonics domain similar to spatial domain. The formulation is

similar in spatial and spherical harmonics domain [50]. The MVDR problem formulation in

spherical harmonics domain is given as

Wnm

where anm is the steering vector. The solution to the given optimization problem in Equation

7.30 is given as,

R−1

Pnm anm

Wnm = −1 (7.31)

aH

nm RPnm anm

Utilizing Equation 7.24, the MVDR weights for steering to source at location, rs = (rs , Ψs )

is given by

R−1 H

Pnm B(rs )y (Ψs )

Wnm = . (7.32)

y(Ψs )BH (rs )R−1 H

Pnm B(rs )y (Ψs )

� �

� �

G = �Wnm H B(rl )yH (Ψl )� (7.33)

Although spherical microphone array is extensively used for source localization [12, 13, 14,

15, 16, 17, 86], CRB analysis in spherical harmonics domain is investigated sparsely. CRB

expression for far-ﬁeld data model in spherical harmonics domain can be found in Section

5.6. The far-ﬁeld data model for spherical microphone array, as derived in Chapter 5 is given

as

Dnm (Ψ; k) = YH (Ψ)S(k) + Znm (k) (7.34)

with YH (Ψ) being the far-ﬁeld steering vector. Comparing the far-ﬁeld data model in Equa-

tion 7.34 with near-ﬁeld data model Equation 7.22, the expression of Fisher information

matrix for near-ﬁeld observation model, can be obtained as in Section 5.6.2, by replacing

7.5 Cramér-Rao Bound Analysis 113

YH (Ψ) with Anm and RD with RP . Hence, the Fisher information matrix elements can be

written as

� −1 −1

Frθ = 2Re (RS AH T H

nm RP Anm RS ) � (Ȧnmr RP Ȧnmθ )

−1 −1

�

+ (RS AH T H

nm RP Ȧnmr ) � (RS Anm RP Ȧnmθ ) (7.35)

� −1 −1

Fθφ = 2Re (RS AH T H

nm RP Anm RS ) � (Ȧnm RP Ȧnmφ )

θ

−1 −1

�

+ (RS AH T H

nm RP Ȧnmθ ) � (RS Anm RP Ȧnmφ ) (7.36)

nm (k)], RS = E[S(k)S (k)], � represents Hadamard product and

α = [r T θ T φT ]T (7.37)

that the parameters (rl , θl , φl ) are present only in lth column of Anm . Hence, the deﬁnition

of vector and scalar diﬀerentiation used herein is

L

�

Ȧnmr = Ȧnmrl (7.38)

l=1

∂Anm

and, Ȧnmrl = . (7.39)

∂rl

The diﬀerentiation w.r.t. other parameters can be written in the similar way. The derivative

of Anm involves diﬀerentiation of spherical Hankel function and spherical harmonics. The

derivative of the near-ﬁeld steering matrix Anm is computed by utilizing the partial derivative

of near-ﬁeld mode strength and spherical harmonics function. The detailed computation of

the derivative of near-ﬁeld steering matrix is given in Appendix B.

The other blocks of FIM can be written in similar way. The ﬁnal FIM is given as

F Frθ Frφ

rr

F = Fθr Fθθ Fθφ .

Fφr Fφθ Fφφ

The bound is obtained from the inverse of ﬁsher information matrix. Cramér-Rao bound is

plotted in Figure 7.3 for various SNRs. The CRB plots for random signal and for sinusoidal

signal are illustrated in Figure 7.3(a) and 7.3(b) respectively. It may be noted that the CRB

7.6 Performance Evaluation 114

−6

x 10

2

CRB(r)

CRB(θ)

1.5 CRB(φ)

CRB

0.5

0

−10 −7.5 −5 −2.5 0 2.5 5 7.5 10

SNR (dB)

(a)

−4

x 10

CRB(r)

CRB(θ)

CRB(φ)

CRB

0

−10 −7.5 −5 −2.5 0 2.5 5 7.5 10

SNR(dB)

(b)

Figure 7.3: Cramér-Rao bound analysis at various SNR, (a) for random signal (b) for sinu-

soidal signal. The source location is (0.08m, 40◦ , 50◦ ).

for random signal is lower than CRB for sinusoidal signal. Also, a lower bound on CRB is

attained at higher SNR.

Experiments on source localization are conducted to evaluate the proposed methods. Sig-

niﬁcance of proposed methods in near-ﬁeld beamforming, is also presented by conducting

an experiment on suppression of undesired sources in near-ﬁeld. Simulation experiments on

7.6 Performance Evaluation 115

are also performed on real signal acquired over spherical microphone array. The experiment

utilizes an Eigenmike system shown in Figure 7.5. It consists of 32 microphones, embed-

ded in rigid sphere of radius 4.2cm. The order of the microphone array is taken to be 4.

Four dimensional scatter plots, root-mean square error (RMSE) and probability of resolution

measures are used to evaluate the source localization performance of the proposed methods.

ability of the proposed methods to radially discriminate aligned sources is analyzed. Two

narrowband sources with location r1 = (0.1, 30◦ , 45◦ ) and r2 = (0.8, 30◦ , 45◦ ) are taken for

the analysis. The frequency of the sources are taken to be 220Hz and 250Hz respectively.

It is to be noted that both the sources have same DOA. However, they are well separated

radially. The DOA of the sources is assumed to be known, and range is estimated at various

SNRs.

The relative performance of various proposed methods is presented herein using cumulative

RMSE. The cumulative RMSE is computed as

T 2

1 �� (t)

RM SE = [(rl − r̂l )2 ], (7.40)

2T

t=1 l=1

where t indicates trial number, T is total trials and l denotes the source number. The RMSE

results are presented in Table 7.1, for 100 independent trials. It can be concluded that for

resolving sources in same direction with diﬀerent radial distance, a high SNR is required.

At low SNRs, SH-MUSIC and SH-MVDR methods are unable to resolve the sources. The

SH-MGD method performs reasonably well even at low SNRs. At high SNRs, all the methods

perform equally well, giving low RMSE.

7.6 Performance Evaluation 116

Table 7.1: Cumulative RMSE in range r, at various SNRs for 100 iterations. Sources are at

(0.1m, 30◦ , 45◦ ) and (0.8m, 30◦ , 45◦ ).

SNR SNR SNR SNR

Methods

(10dB) (20dB) (30dB) (40dB)

SH-MGD 0.0847 0.0785 0.0389 0.0217

SH-MUSIC 0.495 0.495 0.2891 0.0049

SH-MVDR 0.495 0.495 0.495 0.0562

The experimental conditions are similar to one used in the previous Section. The probability

of resolution for range is deﬁned as

T 2

1 �� (t)

Pr = [P r(|rl − r̂l | ≤ ζ)]

2T

t=1 l=1

�T � 2

1 (t)

= [sgn(ζ − |rl − r̂l |)], (7.41)

2T

t=1 l=1

where P r(.) denotes the probability of an event, and sgn(x) is signum function deﬁned in

Equation 5.55. The conﬁdence interval is taken as ζ = 0.08m. The relative performance of

the proposed methods is presented using bar plot in Figure 7.4.

1

SH−MGD

Probability of Resolution

0.8 SH−MUSIC

SH−MVDR

0.6

0.4

0.2

0

10 20 30 40

SNR(dB)

Figure 7.4: Range estimation performance of SH-MGD, SH-MUSIC and SH-MVDR in terms

of probability of resolution.

7.6 Performance Evaluation 117

It is to be noted that the probability of resolution for SH-MGD is high even at low SNR

when compared to SH-MUSIC and SH-MVDR.

In most of the applications involving near-ﬁeld communication, one of the parameters (range,

azimuth or elevation) can be assumed to be constant. However, in this section, experiments

on joint range and bearing (azimuth and elevation) estimation are given. The experiments are

performed for both simulated and actual signals acquired from a spherical microphone array.

Eigenmike system is utilized in an anechoic chamber for acquiring the signals. Experimental

results are illustrated using four-dimensional scatter plots.

An experimental set-up for near-ﬁeld source localization using Eigenmike system is shown

in Figure 7.5. For the real experiments, signal is recorded in an anechoic chamber using

Eigenmike system. A smartphone speaker is utilized as an acoustic source. The source is

ﬁxed at location (0.3m, 90◦ , 90◦ ). A narrowband signal with frequency of 600Hz is used.

Figure 7.5: The Eigenmike setup in an anechoic chamber at IIT Kanpur for acquiring near-

ﬁeld sources. A near-ﬁeld source is placed at (0.3m, 90◦ , 90◦ ).

7.6 Performance Evaluation 118

Experimental results are presented using four-dimensional scatter plots for both, simulation

and signals acquired from spherical microphone array. The SH-MUSIC and SH-MGD spatial

spectrum as proposed in Equations 7.25, 7.27, are utilized for simultaneous estimation of

range and bearing of a source. The near-ﬁeld source localization scatter plots are shown

for SH-MUSIC and SH-MGD in Figure 7.6. The magnitude of SH-MUSIC and SH-MGD

spectrum is represented by a color bar.

1

1

0.935

0.45 0.92

Competing peak 0.87

0.34

(0.31,90°,90°)

0.84

0.4 0.805

Range(r)

0.76

0.32

Range(r)

0.74

0.35 0.681

0.675

0.3 0.601

0.3 0.611 Desired peak

(0.3,90°,90°) 0.521

0.25 0.546

92 0.441

Desired peak 92

0.481

91 (0.3,90°,90°) 91 0.361

90 92 0.416 92

91 90 91 0.281

89 90 0.351 90

89 89

88 88 89 0.202

88 88

Azimuth(φ) Elevation(θ) Azimuth(φ) Elevation(θ)

(a) (b)

1 1

0.98 0.975

0.4 0.375

0.96 0.95

0.39 0.94 0.925

Range(r)

0.37

Range(r)

0.92 0.901

0.38

0.901

0.365 0.876

0.37 0.881 0.851

0.36 0.861

0.36 0.826

105 105

0.841 0.801

100 100

105 0.821 95 100

95 100 95 0.776

90 95 0.801 90 90

90 85 85 0.751

85 85

Azimuth(φ) Elevation(θ) Azimuth(φ) Elevation(θ)

(c) (d)

Figure 7.6: Four dimensional scatter plots using, (a) SH-MUSIC for simulated signal, (b)

SH-MGD for simulated signal, (c) SH-MUSIC for signal acquired over SMA (d) SH-MGD for

acquired over SMA. A narrowband source with frequency 600Hz, located at (0.3m, 90◦ , 90◦ )

is considered.

Figure 7.6(a) and Figure 7.6(b) corresponds to source localization results for simulated

7.6 Performance Evaluation 119

signal. The candidate location corresponding to the highest magnitude of SH-MUSIC and

SH-MGD, is represented by a square in both the ﬁgures. However, in SH-MUSIC spectrum

(Figure 7.6(a)), an additional competing peak can be seen, represented by brown circle. On

the other hand, SH-MGD spectrum in Figure 7.6(b) shows a single candidate peak. Both the

methods are able to estimate the source.

Experimental results corresponding to signal acquired over spherical microphone array

in an anechoic chamber, is shown in Figure 7.6(c) and Figure 7.6(d) for SH-MUSIC and

SH-MGD respectively. It can be noted that SH-MUSIC spectrum has many spurious peaks,

which are greatly reduced in SH-MGD spectrum. In SH-MGD spectrum, peaks can be

observed clearly for elevation and azimuth varying from 85◦ to 90◦ at range 0.36m, which

is close to the location of the source, (0.3m, 90◦ , 90◦ ). It is to be noted that the errors in

source localization are due to reﬂection of sound from the tripods, non-point sound source

and microphone-source physical placement error.

Beamforming

Two near-ﬁeld sources are considered at (0.1m, 50◦ , 30◦ ), and (0.3m, 55◦ , 40◦ ). The source

close to the array is assumed to be the desired source while the other is the interference. The

suppression of the interfering source is illustrated using near-ﬁeld MVDR beampattern as in

Equation 7.33. The MVDR beampattern in this context, is plotted in Figure 7.7. The plot

(0.1m, 50◦ , 30◦ ), and interfering source at (0.3m, 55◦ , 40◦ ).

7.6 Performance Evaluation 120

is illustrated for azimuth and range with known elevations. As expected, the array gain is

close to 0 dB (undistorted) for the desired source, while the interfering source suﬀers very

high attenuation.

From the given beampattern, an important observation and a possible use case can be

made out. Figure 7.7 is plotted again in Figure 7.8, with ﬁxed range and varying azimuth.

Additional 2-D ﬁgures are also given with varying range and ﬁxed azimuth for the purpose

of clarity.

30 −10

20 −20

Array Gain (dB)

Array Gain(dB)

x : 40

10 Y : 14 −30

x:30

Y:−41

0 x : 30 −40

Y : 0.4

x:40

−10 −50 Y:−55

−20 −60

0 10 20 30 40 50 60 0 10 20 30 40 50 60

Azimuth Azimuth

(a) (b)

60

40

40

20 x : 0.1

Array Gain(dB)

x : 0.1

Array Gain(dB)

Y : 1.0 Y : 15

0 20

−20 0

x : 0.3

Y : −41

−40 −20

Y : −55

−80 −60

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Range(m) Range(m)

(c) (d)

Figure 7.8: Radial ﬁltering analysis of the proposed near-ﬁeld MVDR method over a spherical

microphone array. (a) Array gain for ﬁxed r = 0.1m. (b) Array gain for ﬁxed r = 0.3m. (c)

Array gain for ﬁxed θ = 30◦ . (d) Array gain for ﬁxed θ = 40◦ .

It is to be noted that for steering distance set at the desired source (r = 0.1m), for all

7.7 Summary and Contributions 121

azimuth the array gain is close to 0dB (Figure 7.8(a)). The gain variation within a small range

of azimuth is minimal. On the other hand, the gain deteriorates signiﬁcantly at undesired

radial distance as shown in Figure 7.8(b). It is noted that this holds for all look directions.

Figure 7.8(c) and 7.8(d) indicate gain variation with source distance at ﬁxed azimuth. In

order to study the radial ﬁltering ability of proposed method, desired source azimuth angle of

30◦ and radial distance of 0.1m is considered. It may be noted from 7.8(c) that the array gain

is close to 0dB at the desired radial distance of 0.1m. The array gain decreases signiﬁcantly

at undesired radial distance of 0.3m as can be noted from Figure 7.8(c). Subsequently, the

array gain is observed for undesired source azimuthal angle of 40◦ in Figure 7.8(d). The array

gain at the desired radial distance is still close to 0dB. On the other hand, the array gain

signiﬁcantly decreases at undesired radial distance of 0.3m.

This implies that for the microphone array steered in the near-ﬁeld, the MVDR beam-

pattern is robust to minor variation of angle. However, it is very sensitive to the distance.

This was observed in all simulations. Hence, it can be concluded that the proposed near-ﬁeld

MVDR spatial ﬁlter is very sensitive to radial parameter. The MVDR spatial ﬁlter is however

robust to minor azimuth angle variations. This can be utilized in near-ﬁeld communication

applications where minor variation in azimuth angle for the desired source is expected. One

such application can be visualized when a microphone array is integrated in a cellphone. A

speaker in near-ﬁeld of such a microphone array can make changes in his azimuth angle while

interfering sources are generally not present at the desired radial distance.

In this chapter, a data model for near-ﬁeld source localization in spherical harmonics domain

is proposed. Three methods namely SH-MUSIC, SH-MGD and SH-MVDR are formulated

for localization of the near-ﬁeld sources. The methods are veriﬁed using simulation and signal

acquired over spherical microphone array in an anechoic chamber. Formulation and analysis

of Cramér-Rao bound for near-ﬁeld sources in spherical harmonics domain is also presented.

Experiments are conducted for radially separated aligned sources. The proposed methods are

7.7 Summary and Contributions 122

evaluated using RMSE and statistical analysis. The signiﬁcance and practical application of

the proposed methods is discussed using experiment on interference suppression. In this con-

text, the near-ﬁeld MVDR beampattern analysis promises robust near-ﬁeld communication

which can be investigated as future work.

Chapter 8

8.1 Conclusions

This thesis addresses the Source localization problem in spatial and spherical harmonics do-

main. The spatio-temporal array data model is presented starting from ﬁrst principles in

physics. Subsequently the spatio-frequency and spherical harmonics data model is also dis-

cussed. A new spherical harmonics data model is developed for near-ﬁeld source localization

using spherical microphone array. Novel methods for acoustic source localization are proposed

in spatial and spherical harmonics domain.

In spatial domain, a novel high resolution source localization method based on the MUSIC-

Group delay spectrum is proposed. The method provides robust azimuth and elevation

estimates of closely spaced sources as indicated by source localization experiments when

compared to conventional source localization methods. The additive property of the group

delay function in spatial domain is proved mathematically to explain the resolving power

of the proposed method. The signiﬁcance of the MUSIC-Group delay method in speech

enhancement and distant speech recognition is illustrated using improvements in signal to

interference ratios and lower word error rates.

Signal processing in spherical harmonics domain provides ease of beamforming and ef-

ﬁcient array processing due to reduced dimensionality. Both far-ﬁeld and near-ﬁeld source

localization problems are addressed in the spherical harmonics domain. The MUSIC-Group

delay method is formulated in spherical harmonics domain (called SH-MGD) for far-ﬁeld

8.2 Future Directions 124

source localization. The high resolution capability of SH-MGD, make it more relevant when

compared to conventional methods like SH-MUSIC and SH-MVDR. An additional source

localization algorithm called, SH-root-MUSIC is also presented for azimuth only estimation

of far-ﬁeld sources. It retains all the inherent advantages of root-MUSIC including lower

computational complexity. The Vandermonde structure of array manifold is also illustrated

using manifold separation technique.

Near-ﬁeld source localization over a spherical microphone array is also addressed in this

thesis. A new data model for near-ﬁeld source localization using spherical microphone array

is developed. In particular, three methods namely SH-MUSIC, SH-MGD and SH-MVDR,

that jointly estimate the range and bearing of multiple sources are proposed. Results of

near-ﬁeld MVDR beampattern analysis promises robust near-ﬁeld communication applica-

tion. Additionally, stochastic Cramér-Rao bound for far-ﬁeld and near-ﬁeld data model is

formulated in spherical harmonics domain to evaluate the location estimators. The ability of

the proposed methods to radially discriminate aligned sources is also analyzed.

The near-ﬁeld data model developed in this thesis, can be incorporated in a sparsity based

framework for source localization. As a part of future work, sparsity based methods for

near-ﬁeld source localization can be explored. The near-ﬁeld model can be transformed into

a sparse recovery problem where signal vector can be assumed as sparse. The problem can

be solved using an l1-regularized least-square method.

The near-ﬁeld MVDR beampattern analysis result is encouraging. Using the near-ﬁeld

data model developed, a near-ﬁeld MVDR spatial ﬁlter is designed. The near-ﬁeld MVDR

spatial ﬁlter designed in this thesis, exhibits high radial ﬁltering eﬃciency. However, it is

robust to minor azimuth angle variations. This can be utilized in near-ﬁeld communication

applications where minor variation in azimuth angle for the desired source is expected. One

such application can be visualized when a microphone array is integrated in cellphone. A

speaker in near-ﬁeld of such a microphone array can make changes in his azimuth angle while

interfering sources are generally not present at the desired radial distance. Utilization of

8.2 Future Directions 125

near-ﬁeld MVDR spatial ﬁlter in real life application needs further investigation.

The techniques developed in this thesis along with spherical near-ﬁeld acoustic hologra-

phy (NAH), can be utilized for simultaneous source localization and separation. Near-ﬁeld

acoustic holography techniques for source localization assume sources to be in near-ﬁeld.

The NAH techniques are utilized for localization of various automotive noise like wind noise,

tire noise and accessory noise. The near-ﬁeld localization and beamforming techniques pre-

sented in this thesis, can be investigated further for automotive noise source identiﬁcation

and reduction of noise levels.

Appendices

Appendix A

Spherical Harmonics Domain

The array data model formulated in Chapter 5 will be utilized here for stochastic Cramér-Rao

bound (CRB) derivation. The data model and model covariance matrix is re-written from

Equations 5.28 and 5.47 :

[Dnm (k)](N +1)2 ×Ns = [YH (Ψ)](N +1)2 ×L [S(k)]L×Ns + [Znm (k)](N +1)2 ×Ns (A.1)

YH (Ψ) is the steering matrix. A particular lth steering vector can be written as

∗ ∗ ∗ ∗ ∗

yH (Ψl ) = [Y00 (Ψl ), Y1−1 (Ψl ), Y10 (Ψl ), Y11 (Ψl ), . . . , YNN (Ψl )]T (A.3)

where spherical harmonics of order n and degree m can be written from Section 2.5 as

�

m (2n + 1)(n − m)! m

Yn (Φ) = Pn (cosθ)ejmφ (A.4)

4π(n + m)!

∀0 ≤ n ≤ N, −n ≤ m ≤ n.

A closed-form expression for stochastic CRB(DOA) is presented herein. Hence, the unknown

direction parameter vector taken here is

α = [θ T φT ]T (A.5)

A.1 Formulation of Fisher Information Matrix 128

Formulation of Fisher information matrix (FIM) is presented herein. The CRB is computed

from the inverse of the Fisher information matrix. The Fisher information matrix elements

given by Equation 5.52, can be further simpliﬁed to [55],

∂RD ∂RD

Frs = tr{RD −1 RD −1 }. (A.6)

∂αr ∂αs

It is to be noted from Equation A.1 that for Ns wavenumbers (FFT index) corresponding

to Ns snapshots [26], the Fisher Information Matrix elements will be Ns times of given in

Equation A.6. The parameters (θr , φr ) are present in rth column of YH (Ψ). Hence, following

notational deﬁnition for the vector derivative of steering matrix YH (Ψ) is used,

L

�

H

Ẏθ = ẎθHr (A.7)

r=1

∂Y H

with ẎθHr � ∂θr . The scalar derivative ẎθHr , can be extracted from the vector derivative in

Equation A.7 as

ẎθHr = Ẏθ

H

er eTr (A.8)

where er is the rth column vector of an identity matrix. These vector and scalar derivative

of steering matrix is used in ensuing formulation of CRB. Also, Y(Ψ) is replaced with Y for

equations to be more compact.

Utilizing Equation A.2, the partial derivative of covariance matrix RD with respect to

variable θr can be written as

∂RD

= ẎθHr RS Y + YH RS Ẏθr (A.9)

∂θr

Substituting this in Equation A.6 and making use of distributive property of matrix, the FIM

element can be expressed as

�

Fθr φs = tr RD −1 ẎθHr RS YRD −1 ẎφHs RS Y + RD −1 ẎθHr RS YRD −1 YH RS Ẏφs

�

+ RD −1 YH RS Ẏθr RD −1 ẎφHs RS Y + RD −1 YH RS Ẏθr RD −1 YH RS Ẏφs (A.10)

Utilizing tr(A + B) = tr(A) + tr(B), and rewriting the Equation A.10 in short form, the FIM

A.1 Formulation of Fisher Information Matrix 129

element is given by

With suitable pairing and utilizing Hermitian positive semi-deﬁniteness of covariance matrix,

x can be rewritten as

� �∗

Noting the property of trace of a matrix, tr(xH ) = tr(x) , where ∗ denotes the complex

conjugate, and utilizing results of Equation A.14 in Equation A.11, the FIM elements can

now be written as

� �

Fθr ,φs = 2Re tr(x) + tr(y)

� �

= 2Re tr(ẎφHs RS YRD −1 YH RS Ẏθr RD −1 ) + tr(ẎφHs RS YRD −1 ẎθHr RS YRD −1 )

�

Fθr ,φs = 2Re tr(Ẏφ H

es eTs RS YRD −1 YH RS er eTr Ẏθ RD −1 )

�

H

+ tr(Ẏφ es eTs RS YRD −1 Ẏθ H

er eTr RS YRD −1 )

�

= 2Re eTs RS YRD −1 YH RS er eTr Ẏθ RD −1 Ẏφ H

es

�

+ eTs RS YRD −1 Ẏθ H

er eTr RS YRD −1 Ẏφ H

es

� �

Fθφ = 2Re (RS YRD −1 YH RS )T �(Ẏθ RD −1 Ẏφ

H

)+(RS YRD −1 Ẏθ ) �(RS YRD −1 Ẏφ

H T H

)

(A.15)

A.2 Computing the Derivative of Spherical Harmonics Function Ynm 130

where � denotes Hadamard product. The Hadamard product of two matrix are deﬁned as

Similar to Equation A.15, the other block of FIM with only one parameter vector, Fθθ can

be written as

� �

Fθθ = 2Re (RS YRD −1 YH RS )T � (Ẏθ RD −1 Ẏθ

H

) + (RS YRD −1 Ẏθ ) � (RS YRD −1 Ẏθ

H T H

) .

(A.17)

tion Ynm

From Equations A.3 and A.4, the vector derivative Ẏφ can be found using

∂Ynm (Ψs )

= jmYnm (Ψs ). (A.18)

∂φs

Computing Ẏθ involves diﬀerentiation of the associated Legendre function. The derivative of

associated Legendre polynomial can be expressed using following recurrence relations [108],

m m

(z) + (n − m + 1)Pn+1 (z) (A.19)

∂Pnm (z) 1

= 2 [znPnm (z) − (m + n)Pn−1

m

(z)] (A.20)

∂z z −1

This leads to derivative of associated Legendre polynomial given by

∂Pnm (z) 1 m

= 2 [(n − m + 1)Pn+1 (z) − (n + 1)zPnm (z)]. (A.21)

∂z z −1

For z = cos θ, the derivative becomes

∂Pnm (cos θ) 1 � m

�

= (n − m + 1)Pn+1 (cos θ) − (n + 1) cos θPnm (cos θ) .

∂θ sin θ

Now, Ẏθ can be computed by utilizing following in Equation A.7 :

�

∂Ynm (Ψr ) (2n + 1)(n − m)! jmφr 1 m

= e [(n − m + 1)Pn+1 (cos θr ) − (n + 1) cos θr Pnm (cos θr )]

∂θr 4π(n + m)! sin θr

(A.22)

Appendix B

Near-ﬁeld Steering Matrix

In this Appendix, we provide the necessary formulae for ﬁnding the derivative of the near-ﬁeld

steering matrix. Near-ﬁeld steering matrix can be written from Equation 7.23 as

� �

Anm (r, Ψ) = B(r1 )yH (Ψ1 ), · · · , B(rl )yH (Ψl ), · · · , B(rL )yH (ΨL ) . (B.1)

As the parameters (rl , θl , φl ) are present only in lth column of Anm , we need to get the

derivative of the lth column. The rest of columns will produce zero vectors. Hence, the

derivative of steering matrix w.r.t. range (Equation 7.39) turns out to be

∂Anm ∂B(rl ) H

Ȧnmrl = = [0, 0, · · · , y (Ψl ), · · · , 0, 0] (B.2)

∂rl ∂rl

From equation 7.18 and 7.10, it is clear that the above partial derivative involves diﬀeren-

tiation of spherical Hankel function. This can be found using following recurrence relations

[27],

2n + 1

hn (x) = hn−1 (x) + hn+1 (x) (B.3)

x

n+1

h�n (x) = hn−1 (x) − hn (x). (B.4)

x

The above recurrence relation lead to

n

h�n (x) = hn (x) − hn+1 (x) (B.5)

x

∂hn (krl ) n

or, = hn (krl ) − khn+1 (krl ) (B.6)

∂rl rl

132

Similarly, for derivative of steering matrix w.r.t. θ, the scalar diﬀerentiation of matrix

can be written as

∂yH (Ψl )

Ȧnmθl = [0, 0, · · · , B(rl ) , · · · , 0, 0] (B.7)

∂θl

Equations A.3 and A.4, reveal that the diﬀerentiation of near-ﬁeld steering vector w.r.t.

θ needs derivative of associated Legendre function. The derivative of associated Legendre

polynomial is detailed in Appendix A.2.

Finally, the nonzero column in Ȧnmφl , can be written as

∂yH (Ψl )

Ȧnmφl = B(rl ) (B.8)

∂φl

Utilizing Equations A.3 and A.4, the above diﬀerentiation can be found using

∂Ynm (Ψl )

= jmYnm (Ψl ). (B.9)

∂φl

The partial derivatives given by Equations B.2, B.7 and B.8 can be utilized for computing

the derivative of near-ﬁeld steering matrix.

References

and Application. Wiley-IEEE Press, 2013.

the IEEE, vol. 57, no. 8, pp. 1408 – 1418, aug. 1969.

[3] R. O. Schmidt, “Multiple emitter location and signal parameter estimation,” in Proceed-

ings of RADC Spectrum Estimation Workshop, Griﬃss AFB, NY, 1979, pp. 243–258.

spectrum estimation,” IEEE Trans. on Signal Processing, vol. 40, pp. 2281–2289, Sep.

1992.

Journal of the Acoustical Society of America, vol. 63, no. 5, pp. 1638–1640, 1978.

[6] R. DuHamel, “Pattern synthesis for antenna arrays on circular, elliptical and spherical

surfaces,” Radio Direction Finding Section Elect. Eng. Res. Lab. Rep., Univ. of Illinois,

Urbana, 1952.

[7] M. Hoﬀman, “Conventions for the analysis of spherical arrays,” Antennas and Propa-

gation, IEEE Transactions on, vol. 11, no. 4, pp. 390–393, 1963.

[8] A. K. Chan, A. Ishimaru et al., “Equally spaced spherical array.” DTIC Document,

Tech. Rep., 1966.

REFERENCES 134

[9] B. Preetham Kumar and G. R. Branner, “The far-ﬁeld of a spherical array of point

dipoles,” IEEE transactions on antennas and propagation, vol. 42, no. 4, pp. 473–477,

1994.

[10] J. Meyer and G. Elko, “A highly scalable spherical microphone array based on an

orthonormal decomposition of the soundﬁeld,” in Acoustics, Speech, and Signal Pro-

cessing (ICASSP), 2002 IEEE International Conference on, vol. 2. IEEE, 2002, pp.

II–1781.

[11] T. D. Abhayapala and D. B. Ward, “Theory and design of high order sound ﬁeld micro-

phones using spherical microphone array,” in Acoustics, Speech, and Signal Processing

(ICASSP), 2002 IEEE International Conference on, vol. 2. IEEE, 2002, pp. II–1949.

[12] Q. Huang and T. Wang, “Acoustic source localization in mixed ﬁeld using spherical

microphone arrays,” EURASIP Journal on Advances in Signal Processing, vol. 2014,

no. 1, pp. 1–16, June 2014.

[13] L. Kumar, K. Singhal, and R. M. Hegde, “Near-ﬁeld source localization using spher-

ical microphone array,” in Hands-free Speech Communication and Microphone Arrays

(HSCMA), 2014 4th Joint Workshop on, May 2014, pp. 82–86.

[14] ——, “Robust source localization and tracking using MUSIC-Group delay spectrum

over spherical arrays,” in Computational Advances in Multi-Sensor Adaptive Processing

(CAMSAP), 2013 IEEE 5th International Workshop on, St. Martin, France. IEEE,

2013, pp. 304–307.

[15] X. Li, S. Yan, X. Ma, and C. Hou, “Spherical harmonics MUSIC versus conventional

MUSIC,” Applied Acoustics, vol. 72, no. 9, pp. 646–652, 2011.

[16] D. Khaykin and B. Rafaely, “Acoustic analysis by spherical microphone array processing

of room impulse responses,” The Journal of the Acoustical Society of America, vol. 132,

p. 261, 2012.

sources in reverberant environments using EB-ESPRIT with spherical microphone ar-

REFERENCES 135

rays,” in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International

Conference on. IEEE, 2011, pp. 117–120.

[18] K. Ichige, K. Saito, and H. Arai, “High resolution doa estimation using unwrapped

phase information of music-based noise subspace,” IEICE Trans. Fundam. Electron.

Commun. Comput. Sci., vol. E91-A, pp. 1990–1999, August 2008.

speech acquisition from distant microphones,” in Acoustics Speech and Signal Process-

ing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010, pp. 2738–2741.

ﬁnding algorithms,” in Acoustics, Speech, and Signal Processing, IEEE International

Conference on ICASSP’83., vol. 8. IEEE, 1983, pp. 336–339.

[21] R. Roy and T. Kailath, “ESPRIT-estimation of signal parameters via rotational in-

variance techniques,” Acoustics, Speech and Signal Processing, IEEE Transactions on,

vol. 37, no. 7, pp. 984–995, 1989.

[22] C. P. Mathews and M. D. Zoltowski, “Eigenstructure techniques for 2-d angle estimation

with uniform circular arrays,” Signal Processing, IEEE Transactions on, vol. 42, no. 9,

pp. 2395–2407, 1994.

using a radial beampattern transformation,” Signal Processing, IEEE Transactions on,

vol. 46, no. 8, pp. 2147–2156, 1998.

[24] E. Fisher and B. Rafaely, “Near-ﬁeld spherical microphone array processing with radial

ﬁltering,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 19,

no. 2, pp. 256–265, 2011.

[25] L. Kumar, A. Tripathy, and R. Hegde, “Robust multi-source localization over planar

arrays using music-group delay spectrum,” Signal Processing, IEEE Transactions on,

vol. 62, no. 17, pp. 4627–4636, Sept 2014.

REFERENCES 136

[26] L. Kumar and R. Hegde, “Stochastic cramér-rao bound analysis for doa estimation

in spherical harmonics domain,” Signal Processing Letters, IEEE, vol. 22, no. 8, pp.

1030–1034, Aug 2015.

[27] E. G. Williams, Fourier acoustics: sound radiation and nearﬁeld acoustical holography.

academic press, 1999.

of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark, 2008.

John Wiley & Sons, Inc., 1999.

[30] J. Nahas, “Simulation of array-based sound ﬁeld synthesis methods,” Audio Commu-

nication Group,TU Berlin, Diploma thesis, 2011. [Online]. Available: http://www2.ak.

tu-berlin.de/∼akgroup/ak pub/abschlussarbeiten/2011/NahasJohnny DiplA.pdf

Desktop Edition Volume I. Basic Books, 2013, vol. 1.

ing arrays,” Ph.D. dissertation, The Australian National University, Telecommunica-

tions Engineering Group, http://hdl.handle.net/1885/46049, 2000.

ing with spherical microphone arrays,” in Acoustics, Speech and Signal Processing

(ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 3981–3985.

[34] D. Colton and R. Kress, Inverse acoustic and electromagnetic scattering theory.

Springer Science & Business Media, 2012, vol. 93.

[35] B. Rafaely, “Plane wave decomposition of the sound ﬁeld on a sphereby spherical

convolution,” Institute of Sound and Vibration Research, University of Southampton,

Tech. Rep., May 2003. [Online]. Available: http://eprints.soton.ac.uk/46555/1/

Pub9273.pdf?origin=publication detail

REFERENCES 137

[36] M. C. Chan, “Theory and design of higher order sound ﬁeld recording,”

Department of Engineering, FEIT, ANU, Honours Thesis, 2003. [Online]. Available:

http://users.cecs.anu.edu.au/∼thush/ugstudents/MCTChanThesis.pdf

[37] C. A. Balanis, Antenna theory: analysis and design. John Wiley & Sons, 2012.

[38] E. Fisher and B. Rafaely, “The nearﬁeld spherical microphone array,” in Acoustics,

Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on,

2008, pp. 5272–5275.

[40] J. H. Reed, Software radio: a modern approach to radio engineering. Prentice Hall

Professional, 2002.

[41] E. F. Deprettere, SVD and signal processing: algorithms, applications and architectures.

North-Holland Publishing Co., 1989.

environment,” Ph.D. dissertation, Universitat Politecnica de Catalunya, 2007.

[43] A. Manikas, Diﬀerential geometry in array processing. World Scientiﬁc, 2004, vol. 57.

[44] J. Benesty, J. Chen, and Y. Huang, Microphone array signal processing. Springer

Science & Business Media, 2008, vol. 1.

[45] E. C. Ifeachor and B. W. Jervis, Digital signal processing: a practical approach. Pearson

Education, 2002.

1–38, 2001.

[47] C. P. Mathews and M. D. Zoltowski, “Signal subspace techniques for source localization

with circular sensor arrays,” [Technical Reports], 1994, http://docs.lib.purdue.edu/

ecetr/.

REFERENCES 138

[49] B. Rafaely, B. Weiss, and E. Bachmat, “Spatial aliasing in spherical microphone arrays,”

Signal Processing, IEEE Transactions on, vol. 55, no. 3, pp. 1003–1010, 2007.

[50] I. Cohen and J. Benesty, Speech processing in modern communication: challenges and

perspectives. Springer, 2010, vol. 3.

[51] P. A. Naylor and N. D. Gaubitch, Speech dereverberation. Springer Science & Business

Media, 2010.

[52] P. Zahorik, in Direct-to-reverberant energy ratio sensitivity, vol. 112. Acoustical Society

Of America, November, 2002, pp. 2110–2117.

[53] C. Knapp and G. Carter, “The generalized correlation method for estimation of time

delay,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 24, no. 4,

pp. 320–327, 1976.

[54] P. R. Roth, “Eﬀective measurements using digital signal analysis,” Spectrum, IEEE,

vol. 8, no. 4, pp. 62–70, 1971.

[55] P. Stoica and R. L. Moses, Spectral analysis of signals. Pearson/Prentice Hall Upper

Saddle River, NJ, 2005.

[56] W. Herbordt and W. Kellermann, “Adaptive beamforming for audio signal acquisition,”

in Adaptive Signal Processing. Springer, 2003, pp. 155–194.

[57] R. Kumaresan and D. W. Tufts, “Estimating the angles of arrival of multiple plane

waves,” Aerospace and Electronic Systems, IEEE Transactions on, no. 1, pp. 134–139,

1983.

[58] J. Chen, K. Yao, and R. Hudson, “Source localization and beamforming,” Signal Pro-

cessing Magazine, IEEE, vol. 19, no. 2, pp. 30–39, 2002.

[59] L. Kumar, R. Mandala, and R. M. Hegde, “Music-group delay based methods for robust

doa estimation using shrinkage estimators,” in Sensor Array and Multichannel Signal

Processing Workshop (SAM), 2012 IEEE 7th. IEEE, 2012, pp. 281–284.

REFERENCES 139

[60] M. J. Daniels and R. E. Kass, “Shrinkage estimators for covariance matrices,,” Biomet-

rics, vol. 57, no. 4, pp. 1173–1184, 2001.

with limited training data,,” IEEE Transactions on Pattern Analysis and Machine

Intelligence,, vol. E91–A, no. 8, 2008.

[62] R. Mandala, M. Shukla, and R. Hegde, “Group delay based methods for recognition

of distant talking speech,” in Signals, Systems and Computers (ASILOMAR), 2010

Conference Record of the Forty Fourth Asilomar Conference on, Nov 2010, pp. 1702–

1706.

[63] M. Zatman, “How narrow is narrowband?” IEE Proceedings-Radar, Sonar and Navi-

gation, vol. 145, no. 2, pp. 85–91, 1998.

[65] D. Ying and Y. Yan, “Robust and fast localization of single speech source using a planar

array,” Signal Processing Letters, IEEE, vol. 20, no. 9, pp. 909–912, 2013.

based on sparse signal reconstruction,” EURASIP Journal on Advances in Signal Pro-

cessing, vol. 2015, no. 1, pp. 1–16, 2015.

[67] A. Griﬃn, D. Pavlidi, M. Puigt, and A. Mouchtaris, “Real-time multiple speaker doa

estimation in a circular microphone array based on matching pursuit,” in Signal Pro-

cessing Conference (EUSIPCO), 2012 Proceedings of the 20th European. IEEE, 2012,

pp. 2303–2307.

[68] T. Filik and T. E. Tuncer, “Design and evaluation of V-shaped arrays for 2-D DOA

estimation,” in Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE

International Conference on. IEEE, 2008, pp. 2477–2480.

[69] R. Mandala, M. Shukla, and R. Hegde, “Group delay based methods for recognition

of distant talking speech,” in Signals, Systems and Computers (ASILOMAR), 2010

REFERENCES 140

Conference Record of the Forty Fourth Asilomar Conference on, Nov 2010, pp. 1702–

1706.

[70] J.B. Allen and A. Berkley, “Image method for eﬃciently simulating small-room acous-

tics,” Journal of the Acoustical Society of America, vol. 65, pp. 943–950, 1979.

tiscali.nl/ehabets/rir generator.html.

timation with uniformly spaced but arbitrarily oriented velocity hydrophones,” Signal

Processing, IEEE Transactions on, vol. 47, no. 12, pp. 3250 –3260, Dec. 1999.

[73] H. Y., “Techniques of eigenvalues estimation and association,” Digital Signal Process-

ing, vol. 7, pp. 253–259(7), October 1997.

[74] V. Cevher and J. H. McClellan, “2-d sensor perturbation analysis: equivalence to awgn

on array outputs,” in SAM 2002, Washington, DC, 4–6 August 2002.

[75] P. Stoica and N. Arye, “MUSIC, maximum likelihood, and Cramer-Rao bound,” Acous-

tics, Speech and Signal Processing, IEEE Transactions on, vol. 37, no. 5, pp. 720–741,

1989.

[76] P. Stoica and A. Nehorai, “Comparative performance study of element-space and beam-

space music estimators,” Circuits, Systems and Signal Processing, vol. 10, no. 3, pp.

285–292, 1991.

reverberant noisy environment with multiple interfering speech signals,” Audio, Speech,

and Language Processing, IEEE Transactions on, vol. 17, no. 6, pp. 1071–1086, 2009.

[78] A. Varga and H. J. Steeneken, “Assessment for automatic speech recognition: Ii. noisex-

92: A database and an experiment to study the eﬀect of additive noise on speech

recognition systems,” Speech Communication, vol. 12, no. 3, pp. 247 – 251, 1993.

REFERENCES 141

Linguistic Data Consortium, 1993.

[80] J. Hansen and B. Pellom, “An eﬀective quality evaluation protocol for speech enhance-

ment algorithms,” in Proc. ICSLP, vol. 7, 1998, pp. 2819–2822.

[81] D. Klatt, “Prediction of perceived phonetic distance from critical-band spectra: A ﬁrst

step,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on

ICASSP’82., vol. 7. IEEE, 1982, pp. 1278–1281.

[82] “Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end

speech quality assessment of narrow-band telephone networks and speech codecs,” ITU-T

Draft Recommendation P.862, 2001.

[83] M. Seltzer, “Bridging the gap: Towards a uniﬁed framework for hands-free speech

recognition using microphone arrays,” in Hands-Free Speech Communication and Mi-

crophone Arrays, 2008. HSCMA 2008, May 2008, pp. 104 –107.

[84] W. Zhang and B. Rao, “Robust broadband beamformer with diagonally loaded con-

straint matrix and its application to speech recognition,” in Proc. IEEE Int. Conf.

Acoust., Speech, Signal Processing. , 2006, pp. 785–788.

[85] CSLU, “Multi channel overlapping numbers corpus distribution,” Linguistic Data Con-

sortium, http://www.cslu.ogi.edu/corpora/corpCurrent.html.

[86] R. Goossens and H. Rogier, “Closed-form 2D angle estimation with a spherical array

via spherical phase mode excitation and ESPRIT,” in Acoustics, Speech and Signal

Processing, 2008. ICASSP 2008. IEEE International Conference on. IEEE, 2008, pp.

2321–2324.

ing spherical microphone arrays,” in Acoustics, Speech and Signal Processing, 2008.

ICASSP 2008. IEEE International Conference on. IEEE, 2008, pp. 277–280.

[88] B. Rafaely, “Analysis and design of spherical microphone arrays,” Speech and Audio

Processing, IEEE Transactions on, vol. 13, no. 1, pp. 135–143, 2005.

REFERENCES 142

the 2-sphere,” Advances in applied mathematics, vol. 15, no. 2, pp. 202–250, 1994.

Signal Processing Letters, IEEE, vol. 12, no. 10, pp. 713–716, 2005.

[91] G. Arfken and H. J. Weber, Mathematical Methods For Physicists. 5th ed. San Diego

: Academic press, 2001.

[92] Z. Li and R. Duraiswami, “Flexible and optimal design of spherical microphone arrays

for beamforming,” Audio, Speech, and Language Processing, IEEE Transactions on,

vol. 15, no. 2, pp. 702–714, 2007.

array beamforming,” in Speech Processing in Modern Communication. Springer, 2010,

pp. 281–305.

[94] P. Stoica, E. G. Larsson, and A. B. Gershman, “The stochastic CRB for array process-

ing: a textbook derivation,” Signal Processing Letters, IEEE, vol. 8, no. 5, pp. 148–150,

2001.

[95] H. Gazzah and S. Marcos, “Cramer-Rao bounds for antenna array design,” Signal

Processing, IEEE Transactions on, vol. 54, no. 1, pp. 336–345, 2006.

[96] A. Weiss and B. Friedlander, “Range and bearing estimation using polynomial rooting,”

Oceanic Engineering, IEEE Journal of, vol. 18, no. 2, pp. 130–137, 1993.

[97] J.-P. Delmas and H. Gazzah, “CRB analysis of near-ﬁeld source localization using

uniform circular arrays,” in Acoustics, Speech and Signal Processing (ICASSP), 2013

IEEE International Conference on. IEEE, 2013, pp. 3996–4000.

[98] D. T. Vu, A. Renaux, R. Boyer, and S. Marcos, “A Cramér Rao bounds based analysis

of 3D antenna array geometries made from ULA branches,” Multidimensional Systems

and Signal Processing, vol. 24, no. 1, pp. 121–155, 2013.

REFERENCES 143

versity press, 2005.

(v. 1),” 1993.

[101] H.-C. Song and B.-w. Yoon, “Direction ﬁnding of wideband sources in sparse arrays,” in

Sensor Array and Multichannel Signal Processing Workshop Proceedings, 2002. IEEE,

2002, pp. 518–522.

responses for spherical microphone arrays,” in Acoustics, Speech and Signal Processing

(ICASSP), 2011 IEEE International Conference on. IEEE, 2011, pp. 129–132.

[103] R. Goossens, H. Rogier, and S. Werbrouck, “Uca root-music with sparse uniform cir-

cular arrays,” Signal Processing, IEEE Transactions on, vol. 56, no. 8, pp. 4095–4099,

2008.

conﬁgurations,” in Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Pro-

ceedings. 2006 IEEE International Conference on, vol. 4. IEEE, 2006, pp. IV–IV.

[105] M. Costa, A. Richter, and V. Koivunen, “Uniﬁed array manifold decomposition based

on spherical harmonics and 2-d fourier basis,” Signal Processing, IEEE Transactions

on, vol. 58, no. 9, pp. 4634–4645, 2010.

VK and Williams DB, editor, The Digital Signal Processing Handbook, chapter, vol. 62,

1999.

processing, vol. 86, no. 6, pp. 1254–1259, 2006.

graphs, and mathematical tables. Courier Dover Publications, 2012.

Publications Related to Thesis

Work

In Peer Reviewed International Journal

1. Lalan Kumar and Rajesh Hegde, Stochastic Cramér-Rao bound analysis for doa esti-

mation in spherical harmonics domain, Signal Processing Letters, IEEE, vol. 22, no. 8,

pp. 1030-1034, Aug 2015.

2. Lalan Kumar, Ardhendu Tripathy, and Rajesh Hegde, ”Robust Multi-source Localiza-

tion over Planar Arrays using Music-Group Delay Spectrum,” Signal Processing, IEEE

Transactions on , vol.62, no.17, pp.4627-4636, Sept.1, 2014.

3. Lalan Kumar, and Rajesh Hegde, ”Novel Methods for Localization and Reconstruc-

tion of Near-ﬁeld Sources in Spherical Harmonics Domain”, Signal Processing, IEEE

Transactions on, Under Review.

In Peer Reviewed International Conferences

1. Lalan Kumar, Kushagra Singhal, and Rajesh Hegde, ”Near-ﬁeld source localization

using spherical microphone array,” Hands-free Speech Communication and Microphone

Arrays (HSCMA), 2014, 4th Joint Workshop on , pp.82-86, 12-14 May 2014

2. Lalan Kumar, Kushagra Singhal, and Rajesh Hegde, Robust source localization and

tracking using music-group delay spectrum over spherical arrays, in Computational Ad-

vances in Multi-Sensor Adaptive Processing (CAMSAP), 2013 5th IEEE International

Workshop on, Dec 2013, St. Martin, France, pp.304-307

3. Lalan Kumar, Rohan Mandala and Rajesh Hegde, ”MUSIC-Group Delay Based Meth-

ods for Robust DOA Estimation using Shrinkage Estimators”, IEEE Sensor Array and

Multichannel (SAM 2012) Signal Processing Workshop,June 2012, Hoboken, NJ, pp.281

-284.

- homeworkUploaded byseethamrajumukund
- 241508843 00 Calculator Techniques 01Uploaded byChristian Gomez
- hw7Uploaded byTikhon Bernstam
- Average Reward ProofUploaded byabdoul-rahman
- Particle Size Distribution ExampleUploaded byAnonymous uURFSSl
- spherical coordinateUploaded byJoseph Hadchiti
- Presentation on Pom ManishUploaded byGajendra Singh
- Lecture03_Orthogonal Representation of SignalsUploaded bypratibha karki
- 5-2-5Uploaded byratchagar a
- lesson 86-steps used in solving equations (2)Uploaded byapi-276774049
- .._TenthClass_BitBanks_MathsEM_2-FunctionsUploaded bybikshapati
- AM Paper 1 Set 1Uploaded byVizet
- 0580_s10_ms_42Uploaded byLavan Nims
- Distributed Observer Design for Leader following Control of Multi-Agent System with Pinning TechniqueUploaded byseventhsensegroup
- Frequency Analysis of Variable NetworksUploaded bypyramid3498698
- LagrangeUploaded bypuitera
- lesson 86-steps used in solving equationsUploaded byapi-276774049
- Year 9 Linear Relationship Assessment MenuUploaded byAlbert Arulnamby
- 4 x 4 O's and X's AntisymmetryUploaded byAdrian Cox
- Anagha Gupte Teach (LP)Uploaded byNiraj Bhansali
- APGPUploaded byKunja Bihari Padhi
- Cal. Tech 13.pdfUploaded byDoms
- cenf force short value.pdfUploaded bybiraj
- Homework #1Uploaded byOpenStudy
- areportonassms.pdfUploaded byMohan Rao
- Question-paper-Paper-C12-(WMA01)-June-2014.pdfUploaded bymunzarin
- Chapter 12Uploaded byMessi Neymar
- clear all.docxUploaded byLaurindo Panzo
- idmt curve calulationUploaded byHimesh Nair
- EEE312 Lab Sheet 3 Revised_sumUploaded byMasud Sarker

- Kumar_1849Uploaded byLalanKumar
- AN OFF-AXIS POLE FOCUSING METHOD FOR ROBUST DOA ESTIMATION USING ROOT-MUSICUploaded byLalanKumar
- REPRESENTATION AND MODELING OF SPHERICAL HARMONICS MANIFOLD FOR SOURCE LOCALIZATIONUploaded byLalanKumar
- Source Localization over Spherical Microphone ArrayUploaded byLalanKumar
- NEAR-FIELD SOURCE LOCALIZATION USING SPHERICAL MICROPHONE ARRAYUploaded byLalanKumar
- Stochastic Cram ́er-Rao Bound Analysis for DOA Estimation in Spherical Harmonics DomainUploaded byLalanKumar

- 02.DB_Noise - 03.ModulationUploaded byAchilles Aldave
- CAE 3+ SE manualUploaded byJoo-Hyun Yoo
- High Performance SC Delta-Sigma ADC DesignUploaded byraja_ramalingam07
- Digitalcommunicationreceivers Synchronizationchannelestimationandsignalprocessing 131207040630 Phpapp02Uploaded byRajesh Ket
- Audio RetrievalUploaded bynidee_nishok
- Signal Levels 1Uploaded bymakisat
- SDI Eye and Jitter MeasurementsUploaded byJoshua Hernandez
- isbn9512284103Uploaded bySunil Bhati
- Linear Control Systems_B. S. Manke -SolutionsUploaded by4k4you
- Bello62 - The Influence of Fading Spectrum on the Binary Error Probabilites of Incoherent and Differentially Coherent Matched Filter RecieversUploaded byatto_11
- 106478603-Rsrp-vs-Rsrq-vs-Sinr.pdfUploaded byUsman Siddiqui
- AEcoupler acoplador EAUploaded byRoberto Ramirez Alcantar
- 07convolUploaded byElizeu Calegari
- Product Data Aplio 500 Platinum SeriesUploaded bydmseoane
- Public Address & Back Ground Music SystemUploaded bysaravana3kumar3ravic
- 10 EMI 07 Sampled Data Systems and the Z-TransformUploaded byapi-3707706
- Bird Call Identification MQPlknjkn 2010Uploaded byDipen Patel
- EAR 509Uploaded byapi-3833673
- Phamtom Piezo Preamp User Guide V1-2.pdfUploaded bykennkki
- Current Steering DACUploaded byJitendra Mishra
- RT101 Digital Imaging(2)Uploaded byNadeem Hameed
- spectral purityUploaded byplnnc
- LabVIEW Instrument ManualUploaded byAikqsan
- Matlab ReverbUploaded byJo E Fran Rodrigues
- V3000Uploaded byranaway1
- LO Phase NoiseUploaded bydhirajkhanna
- ALESIS AIR FXUploaded byTommaso Scigliuzzo
- IQ Theory of SignalUploaded byMuhammad Rizki
- Signals and Systems - Steven T. Karris - 2ed.pdfUploaded byAndrés Díaz Pinto
- Spectrum Measurement GuideUploaded byPuput Adi Saputro

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.