You are on page 1of 6

THIS IS A PREPRINT OF A PAPER PRESENTED AT THE AES INTERNATIONAL CONFERENCE ON SEMANTIC AUDIO, JUNE 2017, ERLANGEN, GERMANY 1

The Phase Retrieval Toolbox


Zdeněk Průša

Abstract—A Matlab/GNU Octave toolbox for phase (signal) II. O FFLINE ALGORITHMS
reconstruction from the short-time Fourier transform (STFT)
magnitude is presented. The toolbox provides an implementation In this section, we will only consider signals of finite length
of various, conceptually different algorithms ranging from the L with periodic boundaries. The STFT of such signals is also
well known Griffin-Lim algorithm and its derivatives to the very called the (finite) discrete Gabor transform (DGT) [3] and
recent ones. The list includes real-time capable algorithms which we will adopt such name in this paper. The DGT coefficient
are also implemented in real-time audio demos running directly matrix c ∈ CM ×N of a signal f ∈ CL with respect to a window
in Matlab/GNU Octave. The toolbox is well-documented, open-
source and it is available under the GPL3 license. In this paper, g ∈ CL can be obtained as [4]
we give an overview of the algorithms contained in the toolbox L−1
and discuss their properties.
X
c(m, n) = f (l + na)g(l)e−i2πml/M (1)
l=0
I. I NTRODUCTION for m = 0, . . . , M −1 and n = 0, . . . , N −1, M is the number
In many STFT-based audio processing applications, the of frequency channels, N = L/a number of time shifts, a is the
direct synthesis from the STFT magnitude (or spectrogram) hop size in samples. The overline denotes complex conjugation.
is unavoidable. Since the missing information is the phase of Index (l+na) is assumed to be evaluated modulo L; effectively
the complex coefficients, the task is also often called phase introducing the aforementioned periodic boundaries. In other
retrieval or phase reconstruction. To date, many algorithms have words, the signal f and the window g are considered to be
been proposed, but they have mostly been published without single periods of L-periodic signals. Note that in order not
reference implementation or means for reproducing the results. to introduce undesirable effects caused by the phase of the
Moreover, the algorithms have often been formulated and/or window itself, we assume the original, nonperiodized window
evaluated for a single STFT setting using a fixed window for to be whole-point symmetric and centered around the origin.
the analysis and synthesis. In this paper, we present the Phase The complex coefficients can be expressed in polar coordinates
Retrieval Toolbox (PHASERET), which tries to rid the user as
of all hindrances mentioned above and others connected to 
implementation and evaluation of the algorithms. The phase s(m, n) = c(m, n) φ(m, n) = arg c(m, n) , (2)
reconstruction algorithms can be divided into two main classes. s denoting magnitude and φ denoting their phase. The log of
Algorithms in the first class require the knowledge of the the magnitude s (m, n) = log(s(m, n)) is usually visualised
log
magnitude component of the entire signal (offline only) while in the form of a spectrogram. Using matrices, we can write
algorithms in the second class are capable of running in the c = F∗ f , where c ∈ CMN denotes vectorized c such that
vec g vec
real-time setting i.e. to process STFT frames in the frame- c (p) = c(m, n) for p = m + nM and F∗ ∈ CM N ×L is
vec g
by-frame manner. Therefore, the paper is divided into two a conjugate transpose of a “fat” (M N > L) matrix F with
g
parts each devoted to one of the classes. The acronyms in the columns in the form of modulated and circularly shifted
the typewriter font used in the titles and in the text refer to versions of g (see Fig. 1)
the actual function names from the toolbox. For a detailed
description of the functions, please refer to the documentation. gm,n (l) = g(l − na)ei2πm(l−na)/M ,
where (l − na) is again evaluated modulo L. Equation (1)
A. Toolbox Organization
can be therefore equivalently expressed using inner products
The toolbox is mostly written in the Matlab scripting lan-

guage compatible with GNU Octave. The main computational c(m, n) = f , gm,n (3)
functions are written in the C language for greater speed L−1
X
and they are accessed trough MEX functions. The toolbox = f (l)g(l − na)e−i2πm(l−na)/M . (4)
depends on the Large Time-Frequency Analysis Toolbox [1, 2] l=0
(LTFAT) and adopts its conventions and the GPLv3 license. The As long as the matrix Fg has full row rank (linearly independent
documentation is available at http://ltfat.github.io/phaseret/doc rows), the signal can be recovered using f = F cvec where
g
e
and the GitHub development page containing GIT reposi- 

−1

tory, release packages and the issue tracker is available at Fge = Fg Fg Fg is the left inverse of Fg such that
∗ ∗
http://github.com/ltfat/phaseret. F g
e F g = F g F g
e = I. The fundamental property of invertible
Gabor systems is that Fge has the same structure as Fg with
Z. Průša* is with the Acoustics Research Institute, Austrian Academy g
e being the synthesis window, defined by
of Sciences, Wohllebengasse 12–14, 1040 Vienna, Austria, email:
zdenek.prusa@oeaw.ac.at (corresponding address),  −1
This work was supported by the Austrian Science Fund (FWF): Y 551–N13 e = Fg F∗g
g g. (5)
2 THIS IS A PREPRINT OF A PAPER PRESENTED AT THE AES INTERNATIONAL CONFERENCE ON SEMANTIC AUDIO, JUNE 2017, ERLANGEN, GERMANY

1 Listing 1: Code example of a typical workflow


0.8
0.6 % Load t h e g l o c k e n s p i e l t e s t s i g n a l ( 2 6 2 1 4 4
0.4 samples )
0.2
% Sampling r a t e f s = 44100
0
0 20 40 60 80 100 120
[ f , fs ] = gspi ;
a = 2 5 6 ; % Time s t e p
15 M = 2 0 4 8 ; % Number o f f r e q u e n c y c h a n n e l s
10 g = ' blackman ' ; % Blackman window
g t i l d e = { ' d u a l ' , g } ; % S y n t h e s i s window
5
% Compute t h e DGT c o e f f i c i e n t s c , Ls=numel ( f )
0 [ c , Ls ] = d g t r e a l ( f , g , a ,M, ' t i m e i n v ' ) ;
0 20 40 60 80 100 120 % Obtain t h e magnitude
s = abs ( c ) ;
Fig. 1: Examples of gm,n using Hann window (32 samples % R e c o n s t r u c t t h e phase
long) and a = 16, M = 32 and L = 128. Circular time c h a t = phase rec func ( s , ...
% Reconstruct the s i g n a l
shifts g0,n for n ∈ {0, 1, 2, 5} in the time domain (upper) and f h a t = i d g t r e a l ( chat , g t i l d e , a ,M, Ls , ' t i m e i n v ' ) ;
modulations gm,0 for m ∈ {0, 1, 2, 9} in the frequency domain
(lower).

and there exists an unique orthogonal projection onto C1 (see


Note that the only requirement is the invertibility of Fg F∗g . In
e.g. [8])
particular, it allows the support of the window g to be as long
as the signal even when the number of frequency channels M ∗


−1
is significantly lower than the signal length L. P C1 c vec = F g F g F g Fg cvec = F∗g Fge cvec = cCvec
1
, (9)
The LTFAT toolbox contain implementation of fast algo-
rithms for computing the equations mentioned in this section, which amounts to synthesis using the window g e obtained by
therefore no matrix is explicitly inverted or created. Depending (5) followed by analysis using the original window g. The
C1
on the length of the support of the window, the computational magnitude of c vec is also often called a valid or consistent
functions employ either the classical Portnoff algorithm [5] or spectrogram [9] since it is guaranteed that these exists a signal
the algorithm based on the Walnut factorization of the matrices with such spectrogram. The set C2 is a set of coefficients with
[6]. In LTFAT, an implementation of (1) is available as functions magnitude equal to the target magnitude svec
dgt and dgtreal, where the latter one accepts only real n o
input signals and returns only half of the coefficient. The C2 = cvec | cvec (p) = svec (p) . (10)
condition number of the matrix inverted in (5) can be obtained
from gabframebounds and the synthesis window itself from Since it is not a convex set, the projection is substituted
gabdual. Functions doing the inverse transform are called simply by forcing the magnitude of the coefficients to the
idgt and idgtreal respectively. Note that it is necessary target magnitude
to pass additional ’timeinv’ flag to all transform functions
in order to follow (1) exactly. See http://ltfat.github.io/doc for (PC2 cvec ) (p) = svec (p)cvec (p)/ cvec (p) . (11)
more details.
Listing 1 shows how to obtain the DGT coefficients and The composition of the projections then gives i-th iteration of
their magnitude from a signal and how to reconstruct it using GLA
functions from LTFAT. The function phase rec func is only
used as a placeholder and it can be replaced by any function cvec,i = PC2 PC1 cvec,i−1 . (12)
doing phase reconstruction mentioned in this paper.
given cvec,0 (p) = svec (p)eiφvec,0 (p) for all p, where svec is the
A. Griffin-Lim Algorithm – gla target magnitude and φvec,0 is the initial phase. A common
The Griffin-Lim algorithm [7] (GLA) is arguably the most approach is to choose zeros or random values, but it has
generic algorithm for phase reconstruction. It proceeds by been reported that the performance of the algorithm can be
projecting the coefficients iteratively onto two subsets of CM N improved significantly by choosing suitable initial phase [10].
denoted as C1 and C2 . The set C1 is identical with the range The algorithm has been also used for other transforms as
(column) space of F∗g i.e. well [11] and, furthermore, the nature of the algorithm allows
n o creating ad-hoc algorithms by imposing additional constraints
C1 = cvec |cvec = F∗g f for all f ∈ CL . (6) in time or time-frequency domains [12, 13, 14].
Perraudin et al. [15] proposed a fast version of GLA (FGLA)
Its orthogonal complement is the null space of Fg defined as
by introducing an acceleration step which combines the current
C1⊥ = cvec |Fg cvec = 0 .

(7) and the previous iterations with weight αk as shown in Alg. 1.
Any cvec ∈ CM N can be written as a sum of components in The optimal sequence αi for the FGLA remains an open
these two spaces as question. Constant αi = 0.99 was suggested by the authors.
Note that αi = 0 reverts the algorithm back to the regular
C1⊥
cvec = cCvec
1
+ cvec (8) GLA.
THIS IS A PREPRINT OF A PAPER PRESENTED AT THE AES INTERNATIONAL CONFERENCE ON SEMANTIC AUDIO, JUNE 2017, ERLANGEN, GERMANY 3

Algorithm 1: (Fast) Griffin-Lim algorithm (the q-th power is performed elementwise) and expressing its
Input: Initial coefficients cvec,0 , initial α0 . gradient explicitly as
Output: Coefficients with new phase cvec,i .
∇G(f ) =
1 tvec,0 = cvec,0 ;  " #
2 for i = 1, 2, . . . do
 q  q2 −1 (15)

2qR Fg Fg f − sqvec · F∗g f · F∗g f 

3 tvec,i = PC2 PC1 cvec,i−1 ;
4 cvec,i = tvec,i + αi−1 (tvec,i − tvec,i−1 );
5 Update αi ; (R denotes the real-part operator) allows usage of generic
6 end
algorithms for unconstrained minimization. The authors suggest
the l-BFGS algorithm as it does not require explicit second
order derivatives. Further, they suggested to pick q = 2/3. Note
B. Le Roux’s Modifications of GLA – legla that the function decolbfgs requires the implementation of
Le Roux et al. [9] proposed several modifications of the lBFGS from the minFunc package [18].
Griffin-Lim algorithm. First, they recognized a structure in the
projection (9) and proposed its approximation using a truncated
kernel. Second, since this way the projection (9) can be done D. Phase Gradient Heap Integration – pghi
coefficient-wise and so can be done (11), the authors proposed Průša et al. [10] proposed a phase reconstruction method
to do on-the-fly updates i.e. to reuse the just updated coefficient based on the relationship between the magnitude and the
for computing others within the same iteration. Third, they phase gradients. Assuming the Gaussian window is used, the
proposed a modified phase update (omitting contribution of the phase gradient can be estimated using the magnitude gradient
coefficient being updated). The projection (9) can be written and the phase is then recovered by employing an adaptive
in a form of (non-commutative) circular twisted convolution integration procedure. The estimate of the phase derivative in
[16] the frequency direction φω (m, n) and in the time direction
(cvec \hvec ) (m + nM ) = φt (m, n) expressed solely from the magnitude (pre-scaled for
M −1,N the integration) can be written as
X −1 (13)
c(j, k)h(m − j, n − k)ei2π(n−k)ja/M φω (m, n) =
j,k=0 γ 
− slog (m, n + 1) − slog (m, n − 1)
where the kernel is obtained as hvec = F∗g g e. Clearly, there 2aM
is only lcm(a, M )/a (least common multiple) unique kernel (16)
φt (m, n) =
modulations which can be precomputed. Typically, the kernel aM 
h is essentially supported around the origin and it can be slog (m + 1, n) − slog (m − 1, n) + 2πam/M

truncated with only a minor loss of precision. Further, the
kernel is conjugate symmetric, which can be exploited to reduce with γ being the “time-frequency” ratio of the Gaussian window.
the number of multiplications. Note that the modified phase Given phase at a single point φ(m, n), the phase of its four
update can be obtained simply by setting h(0, 0) = 0. The neighbors is computed as
algorithm is summarized as Alg. 2. Note that the acceleration 1 
technique from [15] can be used here as well. φ(m + 1, n) ← φ(m, n) + φω (m, n) + φω (m + 1, n) ,
2
1 
Algorithm 2: Le Roux’s version of GLA φ(m, n + 1) ← φ(m, n) + φt (m, n) + φt (m, n + 1) ,
2
Input: Initial coefficients cvec,1 . 1 
φ(m − 1, n) ← φ(m, n) − φω (m, n) + φω (m − 1, n) ,
Output: Coefficients with new phase cvec,i . 2
1 Initialize and order array I of indices p to be processed; 1 
φ(m, n − 1) ← φ(m, n) − φt (m, n) + φt (m, n − 1) .
2 for i = 1, 2, . . . do 2
3 for all p ∈ I do Further phase spreading is guided by the coefficient magnitude
4 cvec,i (p) ← such that the phase of significant coefficients along spectrogram
 
svec (p) cvec,i \hvec (p)/ cvec,i \hvec (p) ;

ridges is computed first. For non-Gaussian windows, the
5 end approximate γ can be obtained from pghi_findgamma.
6 end

III. R EAL - TIME ALGORITHMS

C. Unconstrained Optimization – decolbfgs When dealing with real-time algorithms, the finite length
signal model from the previous section is no longer suitable. In
Decorsiere et al. [17] proposed to take a direct optimization order to model streams of audio data, it is a common practice to
approach. Writing the objective function as work with infinite length discrete time signals. In this section,
2
we will reformulate STFT for such signals and use it in the

∗ q q

G(f ) = Fg f − svec
(14) description of the algorithms. Given a discrete time signal f
4 THIS IS A PREPRINT OF A PAPER PRESENTED AT THE AES INTERNATIONAL CONFERENCE ON SEMANTIC AUDIO, JUNE 2017, ERLANGEN, GERMANY

and a compactly supported window g, the STFT at time index Algorithm 3: SPSI
na denoted as cn is given by Input: Magnitude of n-th frame sn , phase of the previous
X frame φn−1 .
cn (m) = f (l + na)g(l)e−i2πml/M (17) Output: Phase of the current frame φn .
l∈I
1 Identify peaks sn (mpeak ) and their regions of influence

for m = 0, . . . , M − 1 and any n ∈ Z and where I denotes ROImpeak .;


the compact support of the window g. We again assume the 2 for all mpeak do

window is symmetric around the origin. We will further also 3 mtruepeak ←


slog,n (mpeak −1)−slog,n (mpeak +1)
use the following operator notation mpeak + 21 slog,n (mpeak −1)−2slog,n (mpeak )+slog,n (mpeak +1) ;
 4 φn (mpeak ) ← φn−1 (mpeak ) + 2πamtruepeak /M ;
Vg f n = cn (18) 5 φn (m) ← φn (mpeak ) for all m ∈ ROImpeak ;
6 end
and denote the magnitude of the coefficients of the n-th frame
as sn , the logarithm of the coefficients as slog,n and the phase
as φn . An individual time frame fn can be reconstructed using
C. Real-Time Iterative Spectrogram Inversion With Look-Ahead
M
X −1 – rtisila
fn (l) = ge(l) cn (m)ei2πml/M (19) Zhu et al. [21] proposed a real-time iterative algorithm based
m=0
on the idea of GLA. It employs NLA look-ahead frames and
for l ∈ I and zero elsewhere. If a < len(g) ≤ M (len(g) proceeds by doing GLA-type iterations on individual frames
being the length of the window support), the synthesis window in the order from the most recent one to the current one
ge is given by and repeats this procedure for a pre-determined number of
iterations. The fundamental problem of the approach is the
1 g(l)
ge(l) = P , (20) inherent asymmetry introduced by windowing the partially
M n∈Z g(l − na)2 reconstructed signal. In order to compensate this problem, the
authors proposed to use two additional asymmetric analysis
for l ∈ I and is zero elsewhere. Partially reconstructed signal windows g
NLA ,−1 and gNLA ,0 to be used with the most recent
up to frame N is given by look-ahead frame and to use the regular analysis window for
XN other frames. The additional windows are obtained as
feN (l) = fn (l − na). (21) X−1

n=−∞ gNLA ,−1 (l) = M g(na − l)eg (na − l), (22)


n=−NLB
In the following text, the function names will refer 0
X
to the offline implementation of the algorithms. The gNLA ,0 (l) = M g(na − l)e
g (na − l), (23)
true real-time implementation of the algorithms is n=−NLB
available in demo_blockproc_phaseret and  
for l ∈ I and zero elsewhere, where NLB = len(g)/a − 1 is
demo_blockproc_phaseret2.
the number of “look-back” frames. Window gNLA ,−1 is only
used in the very first iteration, while gNLA ,0 is used for all
A. Single Pass Spectrogram Inversion – spsi the following iterations. In the algorithm listing in Alg. 4 the
analysis windows gk are equal to g except for k = NLA for
Beauregard et al. [19] proposed a phase vocoder-like which the above applies.
technique for phase reconstruction. It employs peak picking
followed by quadratic interpolation of the magnitude in order D. Gnann and Spiertz’s RTISI-LA – gsrtisila
to precisely identify the instantaneous frequency. The inst.
Gnann and Spiertz [22] proposed an alternative way of
frequency gives the increment of the peak phase between two
dealing with the asymmetry of the partially reconstructed signal.
frames. The peak phase is also assigned to all coefficients
They proposed to use NLA + 1 additional analysis windows
within its region of influence. The steps of the algorithm are
g0 , . . . , gNLA computed as
summarized in Alg. 3.
g(l)
gk (l) = M (24)
gsum (l + ka)
B. Real-Time Phase Gradient Heap Integration – rtpghi for l ∈ I and zero elsewhere and
Průša and Søndergaard [20] proposed an adaptation of the NLA
X
PGHI algorithm to the real-time setting (RTPGHI). The core gsum (l) = g(l − na)eg (l − na). (25)
of the algorithm was kept, with the exception that the phase n=−NLB
is spread only from the previous to the current frame. From For this version of RTISI-LA, the listing in Alg. 4 applies
(16), it is clear that the algorithm requires access to slog,n+1 without changes.
i.e. it requires one look-ahead frame. Employing a causal A combination of GSRTISI-LA with RTPGHI proved to
differentiation scheme yields even zero look-ahead frames. give excellent results [23].
THIS IS A PREPRINT OF A PAPER PRESENTED AT THE AES INTERNATIONAL CONFERENCE ON SEMANTIC AUDIO, JUNE 2017, ERLANGEN, GERMANY 5

Algorithm 4: (GS)RTISI-LA [3] Søndergaard, P. L., Finite discrete Gabor analysis, Ph.D.
Input: Number of look-ahead frames NLA , magnitude of thesis, Technical University of Denmark, 2007, available
the current and the look-ahead frames from: http://ltfat.github.io/notes/ltfatnote003.pdf.
sn , . . . , sn+NLA [4] Søndergaard, P. L., “Gabor frames by Sampling and
Output: Time frame fn and its coefficients cn . Periodization,” Adv. Comput. Math., 27(4), pp. 355 –373,
1 Initialize fn+NLA to zeros.; 2007, doi:10.1007/s10444-005-9003-y.
2 for i = 1, 2, . . . do [5] Portnoff, M. R., “Implementation of the digital phase
3 for k = NLA , . . . , 0 do vocoder using the fast Fourier transform,” IEEE Transac-
4 Compute fen+NLA using (21).; tions on Acoustics, Speech, and Signal Processing, 24(3),
 pp. 243–248, 1976, ISSN 0096-3518, doi:10.1109/TASSP.
5 t ← Vgk fen+NLA ;
n+k 1976.1162810.
6 cn+k (m) ← sn+k (m) t(m)/ t(m) ; [6] Søndergaard, P. L., “Efficient Algorithms for the Dis-
7 Compute fn+k using (19); crete Gabor Transform with a long FIR window,” J.
8 end Fourier Anal. Appl., 18(3), pp. 456–470, 2012, doi:
9 end 10.1007/s00041-011-9210-5.
[7] Griffin, D. and Lim, J., “Signal estimation from modified
short-time Fourier transform,” Acoustics, Speech and
IV. OTHER ALGORITHMS Signal Processing, IEEE Transactions on, 32(2), pp. 236–
243, 1984, ISSN 0096-3518, doi:10.1109/TASSP.1984.
The toolbox is primarily focused on processing audio signals 1164317.
which consist of tens of thousands of samples per second. We [8] Strang, G., Introduction to Linear Algebra, Wellesley-
have intentionally not included implementation of algorithms Cambridge Press, 5 edition, 2016, ISBN 978-09802327-
which cannot efficiently handle signals at least few seconds 7-6.
long or which have some other serious drawback. In this section, [9] Le Roux, J., Kameoka, H., Ono, N., and Sagayama,
we mention other approaches and give a reason why they are S., “Fast Signal Reconstruction from Magnitude STFT
not practical for processing audio signals. Spectrogram Based on Spectrogram Consistency,” in Proc.
A popular approach is reformulating the problem as a convex, 13th Int. Conf. on Digital Audio Effects (DAFx-10), pp.
matrix rank minimization problem [24, 25, 26]. Doing so 397–403, 2010.
however squares the size of the original problem and only very [10] Průša, Z., Balazs, P., and Søndergaard, P. L., “A Nonitera-
short signals can be handled efficiently. The authors themselves tive Method for Reconstruction of Phase from STFT Mag-
evaluate the algorithms on signals shorter than 100 samples. nitude,” IEEE/ACM Trans. on Audio, Speech, and Lang.
An approach by Bouvrie and Ezzat [27] is based on solving a Process., 25(5), 2017, doi:10.1109/TASLP.2017.2678166.
non-linear system of equations for each time frame. It however [11] Nakamura, T. and Kameoka, H., “Fast signal reconstruc-
relies on using a rectangular window for analysis which is tion from magnitude spectrogram of continuous wavelet
known to have bad frequency selectivity. transform based on spectrogram consistency,” Proc. 17th
Worth mentioning is an algorithm by Eldar et. al. [28] which Int. Conf. Digital Audio Effects DAFx (DAFx-14), 2014,
is only applicable to signals which are sparse in the original pp. 129–135, 2014.
domain. Such assumption is not realistic when considering real [12] Dittmar, C. and Müller, M., “Towards transient restoration
world audio signals. in score-informed audio decomposition,” Proc. 18th Int.
Conf. Digital Audio Effects DAFx (DAFx-15), 2015, pp.
145–152, 2015.
V. ACKNOWLEDGEMENT
[13] Sturmel, N. and Daudet, L., “Iterative phase reconstruction
This work was supported by the Austrian Science Fund of Wiener filtered signals,” in Acoustics, Speech and
(FWF): Y 551–N13. Signal Processing (ICASSP), 2012 IEEE International
Conference on, pp. 101–104, 2012, ISSN 1520-6149, doi:
10.1109/ICASSP.2012.6287827.
R EFERENCES
[14] Sturmel, N. and Daudet, L., “Informed Source Separation
[1] Søndergaard, P. L., Torrésani, B., and Balazs, P., “The Using Iterative Reconstruction,” IEEE Transactions on
Linear Time Frequency Analysis Toolbox,” Interna- Audio, Speech, and Language Processing, 21(1), pp. 178–
tional Journal of Wavelets, Multiresolution Analysis and 185, 2013, ISSN 1558-7916, doi:10.1109/TASL.2012.
Information Processing, 10(4), pp. 1–27, 2012, doi: 2215597.
10.1142/S0219691312500324. [15] Perraudin, N., Balazs, P., and Søndergaard, P. L., “A
[2] Průša, Z., Søndergaard, P. L., Holighaus, N., Wiesmeyr, fast Griffin-Lim algorithm,” in Applications of Signal
C., and Balazs, P., “The Large Time-Frequency Analysis Processing to Audio and Acoustics (WASPAA), IEEE
Toolbox 2.0,” in Sound, Music, and Motion, Lecture Workshop on, pp. 1–4, 2013, ISSN 1931-1168, doi:
Notes in Computer Science, pp. 419–442, Springer 10.1109/WASPAA.2013.6701851.
International Publishing, 2014, ISBN 978-3-319-12975-4, [16] Werther, T., Matusiak, E., Eldar, Y. C., and Subbana, N. K.,
doi:{10.1007/978-3-319-12976-1\ 25}. “A Unified Approach to Dual Gabor Windows,” IEEE
6 THIS IS A PREPRINT OF A PAPER PRESENTED AT THE AES INTERNATIONAL CONFERENCE ON SEMANTIC AUDIO, JUNE 2017, ERLANGEN, GERMANY

Transactions on Signal Processing, 55(5), pp. 1758–1768,


2007, ISSN 1053-587X, doi:10.1109/TSP.2006.890908.
[17] Decorsiere, R., Søndergaard, P. L., MacDonald, E. N.,
and Dau, T., “Inversion of Auditory Spectrograms, Tradi-
tional Spectrograms, and Other Envelope Representations,”
Audio, Speech, and Language Processing, IEEE/ACM
Transactions on, 23(1), pp. 46–56, 2015, ISSN 2329-
9290, doi:10.1109/TASLP.2014.2367821.
[18] Schmidt, M., “minFunc: Unconstrained differentiable mul-
tivariate optimization in Matlab,” 2005, available at: http:
//www.cs.ubc.ca/∼schmidtm/Software/minFunc.html.
[19] Beauregard, G. T., Harish, M., and Wyse, L., “Single
Pass Spectrogram Inversion,” in Digital Signal Processing
(DSP), IEEE International Conference on, pp. 427–431,
2015, doi:10.1109/ICDSP.2015.7251907.
[20] Průša, Z. and Søndergaard, P. L., “Real-Time Spectrogram
Inversion Using Phase Gradient Heap Integration,” in Proc.
Int. Conf. Digital Audio Effects (DAFx-16), pp. 17–21,
2016.
[21] Zhu, X., Beauregard, G. T., and Wyse, L., “Real-Time Sig-
nal Estimation From Modified Short-Time Fourier Trans-
form Magnitude Spectra,” Audio, Speech, and Language
Processing, IEEE Transactions on, 15(5), pp. 1645–1653,
2007, ISSN 1558-7916, doi:10.1109/TASL.2007.899236.
[22] Gnann, V. and Spiertz, M., “Improving RTISI Phase
Estimation with Energy Order and Phase Unwrapping,”
in Proc. 13th Int. Conf. Digital Audio Effects (DAFx-10),
pp. 1–5, 2010.
[23] Průša, Z. and Rajmic, P., “Towards High Quality Real-
Time Signal Reconstruction from STFT Magnitude,” IEEE
Signal Processing Letters, 2017, doi:10.1109/LSP.2017.
2696970.
[24] Balan, R., “On signal reconstruction from its spectrogram,”
in Information Sciences and Systems (CISS), 44th Annual
Conference on, pp. 1–4, 2010, doi:10.1109/CISS.2010.
5464828.
[25] Sun, D. L. and Smith, J. O., III, “Estimating a Signal
from a Magnitude Spectrogram via Convex Optimization,”
in Audio Engineering Society Convention 133, 2012.
[26] Jaganathan, K., Eldar, Y. C., and Hassibi, B., “Recov-
ering signals from the Short-Time Fourier Transform
magnitude,” in 2015 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp.
3277–3281, 2015, ISSN 1520-6149, doi:10.1109/ICASSP.
2015.7178577.
[27] Bouvrie, J. V. and Ezzat, T., “An incremental algorithm for
signal reconstruction from short-time Fourier transform
magnitude.” in Spoken Language Processing (INTER-
SPEECH), 9th International Conference on, ISCA, 2006.
[28] Eldar, Y., Sidorenko, P., Mixon, D., Barel, S., and Cohen,
O., “Sparse Phase Retrieval from Short-Time Fourier
Measurements,” IEEE Signal Processing Letters, 22(5),
pp. 638–642, 2015, ISSN 1070-9908, doi:10.1109/LSP.
2014.2364225.

You might also like