Professional Documents
Culture Documents
AT
PHASERET is a Matlab/GNU Octave toolbox for phase (signal) reconstruction from the short-time Fourier transform (STFT) magnitude. The toolbox provides
an implementation of various, conceptually different algorithms including real-time capable algorithms which are also implemented in real-time audio demos running
directly in Matlab/GNU Octave. The toolbox depents on the Large Time-Frequency Analysis Toolbox – LTFAT (http://ltfat.github.io), it is well-documented,
open-source and it is available under the GPL3 license at
http://ltfat.github.io/
The discrete STFT (also known as the discrete Gabor Listing 1: Code example of a typical workflow
transform – DGT) is defined as
Phase retrieval algorithms reconstruct phase and φb
% Load t h e g l o c k e n s p i e l t e s t s i g n a l
L−1
signal bf from (possibly inconsistent) magnitude s. % S a m p l i n g r a t e f s = 44100 , 262144 s a m p l e s
[ f , fs ] = gspi ;
c(m + nM) = f(l + na)g(l )e−i2πml /M
X
5
a = 2 5 6 ; % Time s t e p
l =0 10
M = 2 0 4 8 ; % Number o f f r e q u e n c y c h a n n e l s
where n = 0, . . . , L/a − 1, m = 0, . . . , M − 1 and where 4.5
0 g = ' blackman ' ; % Blackman window
4
c ∈ CMN is the DGT coefficient vector, g t i l d e = { ' d u a l ' , g } ; % S y n t h e s i s window
Frequency (kHz)
−10
3.5
% Compute t h e DGT c o e f f i c i e n t s c , Ls=numel ( f )
f ∈ RL is the input signal, 3 −20
[ c , Ls ] = d g t r e a l ( f , g , a ,M, ' t i m e i n v ' ) ;
g ∈ RL is the analysis window, 2.5 −30 % O b t a i n t h e magnitude
a is the time step size, 2
−40 s = abs ( c ) ;
M is the number of frequency channels (FFT size).
1.5 % Reconstruct the phase
−50
1 c h a t = p h a s e r e c f u n c ( s , ...
0.5 −60 % Reconstruct the s i g n a l
Alternatively using matrix notation (analysis) 0 f h a t = i d g t r e a l ( chat , g t i l d e , a ,M, Ls , ' t i m e i n v ' ) ;
0 0.5 1 1.5 2 2.5 3 3.5 4
c= F∗gf, Time (s)
where columns of Fg ∈ CL×MN are in a form of Figure: A spectrogram of a piano and a solo singing voice. An Real-time Implementation
i2πm(l −na)/M excerpt from the song Gate 21 performed by Serj Tankian.
gm+nM (l ) = g(l − na)e Exploits the block processing framework for real-time audio
for l = 0, . . . , L − 1. The signal can be recovered using
5 π processing from LTFAT:
4.5
† −1 Listing 2: Minimal working example
f= F∗g c= FgF∗g Fgc = Fge c 4
% C r e a t e t h e GUI p a n e l
Frequency (kHz)
3.5
where 3
p = b l o c k p a n e l ({ 'GdB ' , ' Gain ' , −20 ,20 ,0 ,21}) ;
−1 % I n i t i a l i z e the audio device
e = FgF∗g
g g. 2.5
block ( ' playrec ' , ' loadind ' ,p) ;
2
% Start audio processing loop
1.5
The magnitude and phase components of the coefficients while p . flag
1
are obtained as g a i n = b l o c k p a n e l g e t ( p , 'GdB ' ) ;
0.5 f = blockread () ;
s(p) = |c(p)| , φ(p) = arg c(p) ,
0
0 0.5 1 1.5 2 2.5 3 3.5 4
0 b l o c k p l a y ( f ∗10ˆ( g a i n /20) ) ;
Time (s) end
for p = 0, . . . , MN − 1, where s is also referred to as the % C l ea n u p
consistent STFT magnitude (spectrogram) i.e. there Figure: Phase difference
pattern achieved by pghi blockdone (p) ;
exists some signal f with such STFT magnitude.
min φ − φb , 2π − φ − φb
Figure: JAVA control panel
Offline algorithms
Griffin & Lim algorithm (GLA) – gla [1] proceeds by projecting the coefficients iteratively onto two subsets of CMN denoted
as C1 and C2. The set C1 is identical with the range (column) space of F∗g i.e. demo blockproc phaseret – comparison of RTPGHI,
SPSI and RTISI-LA (from [7])
n o
C1 = c|c = F∗gf for all f ∈ C L
and its orthogonal complement is C1⊥ = c|Fgc = 0 .
demo blockproc phaseret2 – comparison of RTPGHI,
C1⊥
Any c ∈ C MN
can be written as a sum of components in these two spaces as c = c + c C1
and the unique orthogonal projection RTISI-LA, GSRTISI-LA and RTPGHI + GSRTISI-LA
onto C1 is −1 (from [10])
PC1 c = F∗g FgF∗g Fgc = F∗gFge c = cC1 , demo blockproc phaseretmix – comparison of
The set C2 is a set of coefficients with magnitude equal to the target magnitude s algorithms doing comb-filter free channel mixing (from [7])
C2 = c| |c(p)| = s(p) for all p .
Toolbox Organization and Documentation
Since it is not a convex set, the projection is substituted simply by forcing the magnitude of the coefficients to the target magnitude
Public GitHub repository
PC2 c (p) = s(p)c(p)/ |c(p)| for all p.
https://github.com/ltfat/phaseret
The composition of the projections then gives i-th iteration of GLA C backend library with MEX interfaces
ci = PC2 PC1 ci−1. https://github.com/ltfat/libphaseret
Web doc. pages generated from the m-file headers using
given c0(p) = s(p)eiφ0(p) for all p, where s is the target magnitude and φ0 is the initial phase. An acceleration technique proposed
home-brewed system mat2doc
in [2].
https://github.com/ltfat/mat2doc
Le Roux’s Modifications of GLA (leGLA) – legla [3] Modification 1) exploiting a structure in the projection PC1 and
introducing its approximation using a truncated kernel. Modification 2), since this way the projection PC1 can be done coefficient-
wise and so can be done PC2 , the authors proposed to do on-the-fly updates i.e. to reuse the just updated coefficient for computing References
others within the same iteration. Modification 3) Introduction of a modified phase update (omitting contribution of the coefficient [1] D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,”
IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 32, no. 2, pp. 236–243,
being updated). Apr 1984.
Decorsiere’s Unconstrained Optimization – decolbfgs [4] uses the lBFGS algorithm for minimizing a cost function [2] N. Perraudin, P. Balazs, and P. L. Søndergaard, “A fast Griffin-Lim algorithm,” in
2
ISF-INSTITUT FÜR SCHALLFORSCHUNG
metry of the partially reconstructed signal. It uses NLA + 1 additional analysis windows. A combination with RTPGHI proved to This work was supported by the Austrian Science Fund (FWF):
give excellent results [10]. Y 551–N13.