Are you sure?
This action might not be possible to undo. Are you sure you want to continue?
com
* The Ebook starts from the next page : Enjoy !
Complex Wavelet Based Image Analysis and Synthesis
This dissertation is submitted for the degree of Doctor of Philosophy Peter de Rivaz Trinity College October 2000 University of Cambridge Department of Engineering
de Rivaz, Peter F. C. PhD thesis, University of Cambridge, October 2000. Complex Wavelet Based Image Analysis and Synthesis Key Words Complex wavelets, multiscale, texture segmentation, texture synthesis, interpolation, deconvolution.
Copyright c P.F.C. de Rivaz, 2000. All rights reserved. No part of this work may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior permission.
All statements in this work are believed to be true and accurate at the time of its production but neither the author nor the University of Cambridge oﬀer any warranties or representations, nor can they accept any legal liability for errors or omissions.
P.F.C. de Rivaz Signal Processing and Communications Laboratory Department of Engineering Trumpington Street Cambridge, CB2 1PZ, U.K.
To Jenny
Summary
This dissertation investigates the use of complex wavelets in image processing. The limitations of standard real wavelet methods are explained with emphasis on the problem of shift dependence. Complex wavelets can be used for both Bayesian and nonBayesian processing. The complex wavelets are ﬁrst used to perform some nonBayesian processing. We describe how to extract features to characterise textured images and test this characterisation by resynthesizing textures with matching features. We use these features for image segmentation and show how it is possible to extend the feature set to model longerrange correlations in images for better texture synthesis. Second we describe a number of image models from within a common Bayesian framework. This framework reveals the theoretical relations between wavelet and alternative methods. We place complex wavelets into this framework and use the model to address the problems of interpolation and approximation. Finally we show how the model can be extended to cover blurred images and thus perform Bayesian wavelet based image deconvolution. Theoretical results are developed that justify the methods used and show the connections between these methods and alternative techniques. Numerical experiments on the test problems demonstrate the usefulness of the proposed methods, and give examples of the superiority of complex wavelets over the standard forms of both decimated and nondecimated real wavelets.
Except where indicated in the text. this dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. Acknowledgements I would like to thank my supervisor. Dr. for suggesting this topic and for his guidance during the research. No part of this dissertation has been submitted to any other university. .Declaration The research described in this dissertation was carried out between October 1997 and September 2000. Nick Kingsbury. Thanks to my parents for encouraging my curiosity and to my wife for keeping me calm. This work was made possible by an EPSRC grant. The dissertation does not exceed 65000 words and does not contain more than 150 ﬁgures.
.
. . . . . Original contributions . . . . . Directionality and Ridgelets .3. . . . . . .2 1. . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . Contributions based largely on previous work . . . . complex wavelets . . . Qshift Dual tree complex wavelets . . . . . . . directionally selective. . . . . . . . . . . 3 3 4 5 7 7 7 8 9 9 13 13 13 14 17 18 21 23 24 27 28 29 31 33 Organisation of the dissertation . . . . . .2 2. . . . . . . Justiﬁcation for the research . . . . . . . . . . . . . . . . . . . . . . Prolate spheroidal sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . .4 2. .4. . . . . . . . Medium importance contributions . . . . . . . . . . . . . . . 2.6 2. .2 Filter design and the product ﬁlter . . . . Least important contributions . 1.4 2. .3. . . Terminology . . . . . . . . . . . . . . . . . . . . . . . . .3 2.4. . . . . . . . . . . . . .5 2. . . . . . . .1 1. . . . . Introduction . . . Dual tree complex wavelet transform . . . . . .Contents 1 Introduction 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 1. . . 2 Wavelet transforms 2. . . . . . . . . . . . . . . . . . .4. . . . .4 Overview . . Single tree complex wavelets . . . . . . . . . . . . .3. . . . . . . . .2 2. . . . . .3 1. . . .7 2. . . . . . . . . . . . . . . . Bayesian and NonBayesian Approaches . . . . . . . . . . . . . . . . . 7 Redundant complex wavelets . . . . . . . The Wavelet Transform . . . . .3. . . Shift invariance and the Harmonic wavelet . . . . .5 Most important contributions . . . . . . . . . .3. . . . . . . . . . . . . . .1 2. . . . . Nonredundant. . . . . . . . . . . . . . . . . . . .2 1. . . . . .1 1. .4. . .1 2. . .4 1. . . . .1 2. . . .3 Summary . . . . . . . . .4. . .4. . .
. . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . .5. . . . . . . . . .1 4. . . . . . .2 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. .2 2. . . . . . . Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extracted features . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gabor based texture synthesis . . . . . . . . Conclusions . . . . . . . . . . . . . . . . .3 Summary . . . . . . . . . . . . . . . . . . Discussion . . . . . Classiﬁcation . . . . . . . . . . .5. . . . . . .1 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Summary . . . . . . . . .4. . . . . . . . . . . . . .5. . . . . . . . . . . . . . . . Multiwavelets . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . .3 2. . . . .4. . . . . . . . . . . . . . . . . . . . . . . .5. . . . . . . Denoising . . . . . . . . . . . . . . . . .6 Steerable transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 Method . . . . . . . 3 Previous applications 3. . .1 4.2. . . . . . . . . . . . Texture features . . . . . . . . . . . . . . . . . . .2 4. . . . . . . . . . . . . . . . Introduction . .3 3. . . Results and applications . . . . . . .4 2. . . . . . . . . . . . . .4 3. . . .3. . . .6 4. .2 3. . . . . . .3. . . . . . . . . . . . . . Pyramidbased texture analysis/synthesis . .5 4. . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. .1 3. . . . . . . . . . . . .3 2. . . . . . . . . . . . .1 4. . . .4 2. . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . Theoretical results . . 4 Complex wavelet texture features 4. . . . . .3 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 36 37 37 40 41 42 43 45 45 45 46 47 51 53 53 53 54 54 55 56 56 57 57 59 60 60 60 62 65 Noise ampliﬁcation theory . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . .5 2. .4 4. .2 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical results . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . .7 4. . . . . .8 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preliminaries . . . . . . . . . . . . . . . . . . . . . . Algorithm . . Filters . . .4 4. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . 102 Conclusions . . . . . . .3 5. . . . . . . . . . . . . . . . . . . . . . Original Classiﬁcation Method . .9. . . . . . . . . . . . . . . . . . . . . Autocorrelation Method . Detailed description of method . 108 Bayesian image modelling . 109 7. . . . . . . . . . . . . . . . .3 5. . . . . . . . . . .1 5. . . . . . . . . . . . . . . . . . . .4 5. . . . . . . . . . . . . . . . . . . . . . . . . . 67 67 68 69 70 71 73 77 83 86 86 87 88 88 89 93 93 93 96 98 99 5. . .8 5. . . . . . . . . . . . . . . . . . . . . . . . Multiscale classiﬁcation . . . . . . . . . . .3 Introduction . . . . . . . . . . . . . . . Discussion of DTCWT performance . . . . . . . . . . . . . . . . . . . .10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . Training simpliﬁcation . . . . . . . . . . . . . . . . . . .7 5. .5 6. . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . .4 6. . . . . . .8 Summary . . . . . . . . . Autocorrelation Results . . . . . . . . . .2 5. . . . . 5. . . .3 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Wavelet generative speciﬁcation . . . . .5 Texture segmentation 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 7. . . . . . . . . . . . . . . Crosscorrelation results and discussion . . . . . . . . . . . Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9. . . . . .2 6. . . . . . Discussion . .1 7. . . . . . 114 . . . . .4 Texture features for multiscale segmentation . . . . . . . . . Crosscorrelation method . . . . . . . . . . . . . . .2 7. . . . . . . . . 6 Correlation modelling 6. . . . . . . . 111 Wavelet direct speciﬁcation . . . . . . . Multiscale results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Fourier model . . . . .1 6. . . . . . . . . . . . . . . . . . . . . 105 107 7 Bayesian modelling in the wavelet domain 7. . . . . . . . . . Introduction . . . .6 5. . . . . . . . . . . . . . . . . . . Choice of parameter values . . . . . . . . . . . . . . . . . . . . . . . . . . . .9.6 6. . . . .4 Filter model . . .2 7. . Multiscale Segmentation . . . .2 5. . . . . . . . . . . . . . . . . . . . . . . . 107 Introduction to Bayesian inference . . . . . .9 Summary . . . . . . . . . . . . . . . . 100 Large feature set segmentation . . . . . . . . . . . . . . . .3 7. . . . . .1 5. . . . . .9. . . . . . . .5 5. . . . . . . . . . . . . . . . . . .3. .7 6. . . . Discussion of relative performance . . . . . . . .
. . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 8. . . . . . . . . . . . . . . . . 146 Important wavelet coeﬃcients . . . . . . . . .4. 129 Introduction .3 8. . . . . . . 156 . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Kriging . . . . . . . . . . .3 7. . . 132 Radial Basis Functions . .4.4. . . . . . . . . . . . . . . . . .5 8. . . . . . .6 Wavelet posterior distribution . . . . . . . . . . . . . . . .7. .1 8. . 145 8. . . . . .7. . . . . . . . 134 Bandlimited interpolation . . . . 119 Twodimensional shift dependence . . . . . . . . . . . . . .4.3 8. . .2 8. . . . . . . . . . . . . . . . .4. . . . . . . . .2 8.4 8.6. . . . 127 129 8 Interpolation and Approximation 8. . . 147 Shift Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7. . . . . . . . . . . . . . . . . . . 115 7. . .4 8. . .3 8. . . . . . . . . . . 148 Experiments on shift dependence . . . . .1 8. . . . . . . . . . . . . . . 153 8.7. . . . . . . . . . .4 Choice of wavelet . . . . . . . 127 7.4 Summary . . . . . . .6. . . . .5 8. . . . . .1 8. . . . . . . . . . . . . .4 7. . 147 Speed . . . . . 140 Comparison with Spline methods . . .2 8. . . . . . . 118 One dimensional approximation . . . . . . . . . . . . .5 7. . . . . . . . . .6. . .4. . . . . . . . . . . . . . . . . 123 Summary of results .5 Conclusions .6.8 Extensions . . . . . . . . . . . . .4. . . . 132 8. . . . .2 7. . . . . . . . . . . . . . . .7. . . .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. .1 7. . . . . . . . . . . . . . .3 8. . . 140 8. . 130 Posterior distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8. . . 147 Solving for wavelet coeﬃcients . . . . .5 Estimating scale energies . . .7 Choice of wavelet .2 8. . . . . . . . . . . . . . . . . . .6. . 120 Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Large spatial prediction for nonstationary random ﬁelds . . . . . . . . 116 Shift Invariance . . . . . . 147 Reconstruct image estimate . . . . . . . . . . . . . 144 Method for wavelet approximation/interpolation . .6 Possible Basis functions . . . . . . .1 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Approximation techniques . . . . . . . . . . . . . . . 135 Minimum smoothness norm interpolation . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Impulse responses . . . 152 Discussion of the signiﬁcance of shift dependence .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 199 201 10 Discussion and Conclusions 10. . . . . . .dN . . . . . . . .1 9. . . . . . . . . . . . . . .2 Texture synthesis . . . . . . . .4 Other possibilities . .. . . . . . . . .1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . .2 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 9 Deconvolution 9. . . . . 169 Discussion . . . . . .2 Implications of the research . . . . . . .2 9. . . . 182 9. . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Ampliﬁcation of white noise by a transform . . . . . . . . . . . .5. . . .3 Variance estimation . . . . . . . . . . . . . . . . . . . 170 Image model . . . . . . . . . . . . . . . . . .1. . . . . . . . .. . . . . . . . . . . . . . . . . . 172 9. . . . . . . . . .1 8. . 205 11. . . . . . . . . . .2 Deﬁnition of d1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Conclusions . . . . . . . . . . . . .3 Bayesian framework 165 . . . . . . . . . .10 Conclusions . . .2 8. . . . . .1 9. . . . . . . . . . . . . . . 157 Trading accuracy for speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Discussion . . . 189 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 10. . . . . . . . . . .3 8. . . . . .1 9. . . . . 207 A Balance and noise gain 209 A. . . . . 206 11. . . . . . . . . . . .8. . . . . . . . . . . . . . . . . . . . .1.9 Background . . .8. . . . . . . . .5 9. . .3 Deconvolution . . . . . 174 Choice of search direction . . . . . 161 8. . . . 209 A. . . . . . . . . . . .8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . .8. . . . . . . . . . . . . . .6 Convergence experiments and discussion . . . . . . . . . . . . . . . . .4 9. . . . . . . . . . . . . . . . . . . . . . 203 11 Future work 205 11. .3. . . . .1. . . . . .1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Discussion of model . . . . . . . . . . . . 210 . . 206 11. . . . . 165 Summary of review . . . . .2 9. . . . . . . . . 167 Introduction . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . 178 One dimensional search . . . . . . . . . . . . . . 171 Iterative Solution . . . . . . . . 182 Comparison experiments . . . . . . . . . . . . 156 Proposal . . . . .3 9. . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . 234 C. . . . . . . . .6 Constrained Least Squares . . . . . . . . . . . . 226 C. . . . . . . .9. . . . . .5 Reconstruction noise gain bound . . . . . . . . . . 241 . . . . . . . . . . . . . . . . . .3 Determining frame bounds . . . . . . . . . . . . . . . . . . . . . . . . . 238 C. . . . . . . . . . . . . . . .1 Summary . . . . . . . . . . .7 Total variation and Markov Random Fields . . . . . . . . . . . . . . . . .4 Wiener ﬁltering . . . . .1 CLEAN . . . . . . . . .8 Minimax wavelet deconvolution . . . . . . . . . . . . . 228 C. . . . . .4 Signal energy gain . .5 Iterative methods . .10 Waveletbased smoothness priors . . . 213 A. . . . . 234 C. . . . . . . . . . . . . . . . .7 Relation between noise gain and unbalance . . . . . . . . . . .2 Bayesian point inference . . . 235 C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A. . . . . . . . . . . . . . . . . . . . . . 225 C. . . . . . . . . 239 C. . . . . . . . . . . . .9. . . . . . . 217 B. . . . . . . . . . . . . . . . . . .2 Deconvolution Algorithm . . . . . . . .2 Maximum Entropy . . . . . . . . . . 212 A. . . . . . . . . . . . .3 Projection methods . . . . . . . . . . . . . . . . . . . . . . .9 Mirror wavelet deconvolution . 237 C. . . . . . . . . . . . . . . . . . 223 C. . . . 214 A. . . . . . . . . . . . . 211 A. . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 C Review of deconvolution techniques 223 C. . . . . . . . 214 B Useful results 217 B. . . . . . .1 Mirror Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Consequences of a tight frame . . . . . . . . . . . . . . . . . . . 227 C. . . .
. .1 2. .2 Top left: Original image. . . . Contours of 70% peak magnitude of ﬁlters at scales 3 and 4 . Bottom right: NDWT results. .2 4. . . . . . Bottom left: DWT results. . . . . . . . . . . . . . Subband decomposition tree for a 4 level wavelet transform . . . . . . . . . . . . Top right: Noisy image. . . . . . Model of wavelet domain processing. . Results of using diﬀerent methods on a strongly diagonal texture . . . . . . . Kingsbury.10 Comparison of noise gain for diﬀerent transforms . . . . . . . . . .3 4. PSNRs in dB of images denoised with the HMT acting on diﬀerent wavelet transforms. . . . . . . . . . . . This ﬁgure was provided by Dr N. . .2 2.3 2. . . . . . . . . This ﬁgure was provided by Dr N. . 3. 34 10 14 15 15 17 30 31 2. . . . . . Energy before rescaling for diﬀerent subbands during the histogram/energy synthesis algorithm. . Results of using histogram/energy synthesis on a wood grain texture.9 Guide to the dissertation . . . . . . . . . The results in normal type are published in the literature [24] while the results in bold come from our replication of the same experiment. . Building block for wavelet transform . . . .List of Figures 1. . . .4 2. . . . . . . . . . . . . . . Bottom middle: DTCWT results.5 2. . . . . . . . .1 4. . . . . . .1 3. .5 Results of using histogram/energy synthesis . . . . . . . . . . . . . . . . Kingsbury. .7 2. Alternative structure for a subband decomposition tree . . . . . . . . 32 38 42 The Qshift dual tree structure.8 2. . . . . . 4. . . . Horizontal lines represent the target energy values.1 2. . The complex wavelet dual tree structure. . Building block for inverting the wavelet transform .4 4. . . . . . . . . . . . . . . . . . . . 13 49 50 62 63 64 65 66 . . .6 2. . . . . . . . . . . . Results of using energy synthesis . . . . . . Contours of halfpeak magnitude of ﬁlters at scales 3 and 4 . . . . . . . . . . . .
A dashed contour at the 25% peak level for the original 45◦ scale 2 Comparison of diﬀerent synthesis methods. . . . . Rectiﬁed decimated scale 4 wavelet coeﬃcients . . . . . . . . . . . . . . . . . . . .1 6. . . .8 5. . . . . .1 7. . .4 6. . . . . 125 . . . . . . . . Comparison of segmentation results for altered DTCWT . . .5 5. . . . . . . .4 7. multiscale DTCWT. . . . . . Contours are plotted at 90%. . . 75%. . . . Nondecimated scale 4 wavelet coeﬃcients . . .6 5. . . . . . . . .2 6. . . Performance measure for diﬀerent methods . . 5. . . . . . . . . . . . . . . .9 Mosaics tested . . . . .12 Comparison of segmentation results for multiscale methods . Percentage errors for (DWT. . . . . . . . .7 5. . . . . . .2 7. . . . .2 5. . . . . . . . . . . 124 Covariance structure for a nondecimated real wavelet. . . . . . multiscale DWT. . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Crosses show location and values of measured data points. 122 Covariance structure for an orthogonal real wavelet.11 Segmentation results for mosaic “f” using the DTCWT . . . . . . .NDWT. . . . . . . . . . .10 Percentage errors for (HalfCWT. . . . . Sine wave input .5 6. . . . . . . . . . . . 50%. . .1 5. . . . . . . . . . . . Rectiﬁed nondecimated scale 4 wavelet coeﬃcients . . . . . . . . . . . . . . . . . . . . . . . 117 One dimensional approximation results for diﬀerent origin positions. . . .5 Comparison of segmentation results for diﬀerent transforms . 103 6. . . . . . . . . .5. . . . . 25% of the peak amplitude. 2D frequency responses for the four subbands derived from the level 2 45◦ subband. . . . . . . . . . . . . . Results of matching 5 by 5 raw autocorrelation values . . . . . . . . . . . . . .3 6. .3 7. . . . . . . . .6 6. . . . . . . . . . . . . .14 Segmentation results for mosaic “f” using the multiscale DTCWT . . . . . . . . . . . Results of matching 5 by 5 magnitude autocorrelation values . . . . . . . . .8 7.DTCWT) 5. . . . . . . . . 5. . . . .RealCWT. . . . . . . . . . . . 6. . . . . . .7 Results of original energy matching synthesis . . . . Results of matching 3 by 3 raw autocorrelation values . . .13 Percentage errors for single scale DTCWT. . . . . . . . .3 5. . . . . . . . . . . .4 5. 74 75 76 78 79 80 80 82 83 84 85 89 90 91 96 97 97 98 99 5. . . . . Comparison of segmentation results for diﬀerent transforms . . . Results of matching 5 by 5 raw autocorrelation values . . 120 Shift dependence for diﬀerent scales/dB. 105 Sequence of operations to reconstruct using a Gaussian Pyramid . . . . . . . . .DTCWT) . . . . 101 subband is also shown in each plot.
194 9. . . . . . . .2 8. . . . . . . . . . 190 9. .5 9. . . . . . . . . . 186 Performance of diﬀerent search directions over 100 iterations . . . . . .6 7. . .7 7. . . . . . . . . .11 Comparison of diﬀerent published ISNR results for a 9 by 9 uniform blur applied to the Cameraman image with 40dB BSNR.10 Comparison of ISNR for diﬀerent algorithms and images /dB . . . . . . . . . . 125 Covariance structure for the W transform. . . . . . . . 151 Aesthetic quality for DWT(o) and DTCWT(x) /dB . . . . . . . 173 Block diagram of deconvolution estimation process. . . . . .6 9. . . . . . . . . . .7. . . . .8 9. .8 7. . . . . . . . . 184 Performance of diﬀerent search directions using the steepest descent (x) or the conjugate gradient algorithm (o) starting from a WaRD intialisation. . . . . 187 Value of the energy function over 100 iterations . . . . . . . . . . 169 Flow diagram for the proposed wavelet deconvolution method. . . . . . . . . . . . . . . .4 8. . . . . . . . . . . . . . . .4 9. . . . . . . . . . . . . . 187 Test images used in the experiments. . .9 Covariance structure for the Gaussian pyramid. . . .2 9. . . .14 Deconvolution results for a Gaussian blur applied to the Mandrill image with 30dB BSNR using the PRECGDTCWT method with WaRD initialisation. 154 Relative statistical quality for DWT(o) and DTCWT(x) /dB . . . . . . . . . . . . . . . . . . . .6 9. . . . . . . 195 9. . 126 Covariance structure for the DTCWT . . . . . 127 7. . . .9 Count of important coeﬃcients for diﬀerent transforms .13 Comparison of diﬀerent published ISNR results for a Gaussian blur applied to the Mandrill image with 30dB BSNR. . . . . . . .1 8. . . . 154 Computation time versus SNR (128 measurements). . . 126 Covariance structure for a translated orthogonal real wavelet. . . . . . . . . . . . .10 Summary of properties for diﬀerent transforms . . . 189 Alternative PSF used in experiments. . . . . . . . . . . . . . . . . . . . . 193 9. . . . 128 8. 196 . . . . . . . 160 Prior cost function f (x) expressions for standard deconvolution techniques. . . . . . . . 148 Shift dependence for diﬀerent scales.7 9. . . . . . . . . . . . . . . . . . . . . . . . . .3 9.5 8. 175 Performance of diﬀerent search directions using the steepest descent (x) or the conjugate gradient algorithm (o). . . . . . . . . . . . . . . . . . . . .3 8. . . .12 Deconvolution results for a 9 by 9 uniform blur applied to the Cameraman image with 40dB BSNR using the PRECGDTCWT method with WaRD initialisation. 160 Computation time versus SNR (256 measurements). . . . . .1 9. 191 9. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 197 10. . . . . . . . . . . . . . . . . . . . . .1 Eﬀective assumption about SNR levels for Van Cittert restoration (K = 3. .9. . 238 C. 203 C. . . . . . . . .3 The mirror wavelet tree structure . . . . . . . . . . . . . . . . . . . . . . . .15 Comparison of the PRECGDTCWT and Wiener ﬁltering with published results of Modiﬁed Hopﬁeld Neural Network algorithms for a 3 × 3 uniform blur applied to the Lenna image with 40dB BSNR. . . . . . . 232 C.4 2D frequency responses of the mirror wavelet subbands shown as contours at 75% peak energy amplitude. . . . . . . . . . . . . . . .α = 1).2 Eﬀective assumption about SNR levels for Landweber restoration (K = 3. . . . . . 231 C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .α = 1). . . . . . . 240 . . . . .1 Summary of useful properties for diﬀerent applications . . . . . . .
Inﬁnite Impulse Response. Nondecimated Discrete Wavelet Transform. W(Wavelet)Transform. Discrete Time Fourier Transform. Bandwidth. Fast Fourier Transform. probability density function. Gaussian Pyramid Transform. Discrete Wavelet Transform. Finite Impulse Response. DTCWT Qshift Dual Tree Complex Wavelet Transform. decibels. Perfect Reconstruction. Signal to Noise Ratio. Improved Signal to Noise Ratio. Power Spectral Density. .Abbreviations and notation BSNR BW dB DTFT DWT FFT FIR GPT IIR ISNR NDWT pdf PR PSD SNR WWT Blurred Signal to Noise Ratio.
Hermitian transpose of A. C) . a diagonal matrix normally containing weights for wavelet coeﬃcients. A AH A−1 a A a diag {a} D ei E {X} G0 (z) G1 (z) H0 (z) H1 (z) {a} Ik j M N N (µ. inverse of A. multivariate Gaussian distribution with mean µ and covariance C. Ztransform of a wavelet highpass analysis ﬁlter. absolute value of a scalar a. downsampling by a factor of k. the imaginary part of a. diagonal matrix containing elements of a along the diagonal. kdimensional identity matrix. upsampling by a factor of k. matrix. the largest integer not greater than a. √ −1. the value for a within set S that maximises f (a). Ztransform of a wavelet lowpass synthesis ﬁlter. a belongs to a set S. number of wavelet and scaling coeﬃcients produced by a transform. conjugate of A. Ztransform of a wavelet highpass synthesis ﬁlter. Frobenius norm of a. the value for a within set S that minimises f (a). the smallest integer not less than a. a vector containing zeros everywhere except for a 1 in position i. transpose of A. the expected value of X. number of input samples in a data set. determinant of a matrix A. Ztransform of a wavelet lowpass analysis ﬁlter. the ith element of vector a.∀ ↓k ↑k a∈S a a [a]i argmaxa∈S f (a) argmina∈S f (a) A A T ∗ for all.
the trace of matrix A. p(θφ) pdf for θ. vector space of N × 1 real vectors. the supremum of set S. M × 1 vector containing wavelet coeﬃcients. a vector of random variables. the real part of a.P P (z) p(θ) {a} N × M matrix representing an inverse wavelet transform. M × N matrix representing a wavelet transform. N × 1 vector containing all the input samples. RN sup S tr(A) W w x Z . product ﬁlter. conditioned on φ. joint pdf for the elements of θ.
2 .
The aim of this dissertation is to discover when the DTCWT is a useful tool by developing complex wavelet methods to address a number of image processing problems and performing experiments comparing methods based on the DTCWT to alternative methods. Traditional formulations of complex wavelets are seldom used because they generally suﬀer from either lack of speed or poor inversion properties. Complex wavelets can be used for both Bayesian and nonBayesian image processing and we have applied complex wavelet methods to a wide range of problems including image database retrieval [33]. and edge detection. fast adaptive contour segmentation [34]. two Bayesian and two nonBayesian. A recently developed dualtree complex wavelet transform (DTCWT) has solved these two fundamental problems while retaining the properties of shift invariance and additional directionality that complex wavelets provide. In particular. We also consider images for which the simple model is inadequate.1 Overview This dissertation investigates the use of complex wavelets in image processing.Chapter 1 Introduction 1. we aim to compare complex wavelets with standard decimated and nondecimated real wavelet transforms. We show how 3 . The features are then experimentally tested by addressing the problem of image segmentation to show the performance relative to many other feature sets. In this dissertation we restrict our attention to four main examples of particular interest. A qualitative feel for the power of the description is given by texture synthesis experiments. We ﬁrst consider nonBayesian applications and explain how to use the DTCWT to generate texture features.
We use this Bayesian model to deal with irregularly sampled data points. give an introduction to the diﬀerence between Bayesian and nonBayesian approaches. it provides information about the kind of images that are too simple for the complex wavelet model – in the sense that although complex wavelets provide a good answer there is a more basic technique that is signiﬁcantly faster. In this case we assume that the data is a realisation of a stationary Gaussian process. splines. Second. and the kind that are too complicated. it demonstrates how a wavelet method can be much more eﬃcient than standard methods. This is of interest as it provides information about the kind of images that are well modelled by complex wavelets. The usual solution to the aliasing is to eliminate all subsampling from the transform (to give the nondecimated wavelet transform) but this greatly increases the computation and amount of storage required (especially in multiple dimensions). Second we demonstrate how complex wavelets can be used for Bayesian image processing. INTRODUCTION the simple model can be enhanced to handle longerrange correlations in images for better texture synthesis. The following sections explain the justiﬁcation for the research. It has been claimed that a recently developed complex wavelet transform gives a useful compromise solution that removes almost all of the aliasing problems for only a slight increase in computational cost. Previous complex wavelet transforms have generally suﬀered from problems with eﬃciency and reconstruction. First.4 CHAPTER 1. The ﬁnal extension is concerned with image deconvolution. and explain the organisation of the work. Third.2 Justiﬁcation for the research The wavelet transform has become a widely used technique but the fastest and most popular formulation (the fully decimated wavelet transform) suﬀers from aliasing problems. The wavelets are used to deﬁne a prior probability distribution for images and solution methods are developed based on these models. and radial basis functions. In this case we develop an enhanced nonstationary wavelet model. 1. we can develop theory that relates the wavelet methods to a variety of alternative techniques such as Kriging. It has also been claimed that the new . This is of interest for a number of diﬀerent reasons. describe the original contribution of the dissertation. This is of interest because it demonstrates how wavelet methods give better results than standard approaches such as Wiener ﬁltering and how a Bayesian approach to inference can further increase the accuracy.
Instead numerical techniques such as MCMC (Markov Chain Monte Carlo) must be used. A typical nonBayesian approach might be to apply a median ﬁlter1 . (Chapter 7 explains the terminology used in Bayesian signal processing. This is a very powerful way for performing inference even for complicated models. In this context best is deﬁned in terms of a cost function that measures the penalty for errors in the estimate. The ﬁrst problem is that it is often hard to construct an appropriate model for the data. There are two main problems with this approach. By testing these claims on a range of practical tasks we discover the signiﬁcance of the diﬀerences and demonstrate the simplicity of designing and implementing methods based on complex wavelets. However. To sharpen the discussion consider restoring an image that has been corrupted in some way. currently the new transform is not generally used: this may be due to doubts about the importance of the diﬀerences.) Inference is performed by the application of Bayes’ theorem in order to ﬁnd the posterior distribution for the parameters of the model conditional on the observed data.1. BAYESIAN AND NONBAYESIAN APPROACHES 5 transform solves both these problems.3. In a trivial sense nonBayesian approaches are all other approaches. 1. Armed with this model an estimate for the original image can be calculated from an inferred posterior distribution. the best answer is the one that minimises the expected value of the cost function. . The second problem is that the inference is usually impossible to perform analytically.3 Bayesian and NonBayesian Approaches The Bayesian approach to a problem involves constructing a model and then performing inference using the model. or simply because the new transform is relatively unknown. The model consists of specifying ﬁrstly the prior probability distributions for the model parameters and secondly the conditional probability distribution for the data given the parameters (also called the likelihood). If the model correctly describes the data then the best possible answers can be calculated from the posterior distribution. The Bayesian approach is to try and construct an accurate prior probability distribution for images that are likely to occur plus a model for the type of image degradation that is believed to have occurred. We now list a number of features of the median ﬁlter that are often 1 A median ﬁlter replaces every pixel by the median of a set of pixels from the local neighbourhood. It is usually easy to ﬁnd a slow technique that will solve the problem but much more diﬃcult to ﬁnd a fast technique. concerns about the complexity of methods based on complex wavelets.
for interpolation and image deconvolution we attempt a Bayesian approach. and to facilitate comparison with alternative transforms. such as line drawings. Currently Bayesian methods are not often used in realworld applications of image processing because of their lack of speed and the problem of modelling images. . However. The fact that the approach works without needing the problem to be accurately speciﬁed is the source of both strength and weakness. It requires no knowledge about the type of degradation. This is a large research topic by itself and there are encouraging results based on complex wavelets in the literature [59].6 CHAPTER 1. we select a nonBayesian approach for segmentation (chapter 5). 3. A Bayesian approach to segmentation is made diﬃcult by the need to specify a prior for the shapes of segments. The beneﬁts consist not only of improved experimental results but additionally the mathematical framework permits a theoretical comparison of many Bayesian and nonBayesian techniques. It is very fast. the degradation might be that the image is upside down). The strength is that the method can work reasonably even for the very complicated images that are seen in the real world. This requires simplifying assumptions to be made about the problem but the beneﬁts of the Bayesian approach can be seen. For these reasons. 2. The weakness is that it is also possible that the method will be totally inappropriate (for instance. 5. For images with certain structures. It is therefore crucial that the eﬀectiveness of the method is experimentally determined. it performs very badly (if the size of the median ﬁlter is large compared to the width of the line then every pixel will be set to the background colour). 4. It is not obvious what assumptions are implicitly made about the expected structure of images or the degradation. INTRODUCTION characteristic of nonBayesian approaches: 1. In practice it is often eﬀective.
1 Most important contributions These are the results of the most general interest. We derive an expression for the noise gain of a transform in terms of the unbalance between analysis and reconstruction (2.2 cessing. 1. 3.4).4 Original contributions This section describes the contributions to learning made by this dissertation from most to least important.3).4. We develop theoretical links between a number of interpolation and approximation techniques for irregularly sampled data (8. The references are to the corresponding sections in the dissertation.5. 1.3.5).2) and measure the accuracy of these predictions (8.6. 2. 3.5). This classiﬁcation is.4) and identify the autocorrelation of the associated process. 5. Medium importance contributions These are the results of interest principally to researchers in a speciﬁc area of image pro 1.4.3.4). ORIGINAL CONTRIBUTIONS 7 1. 1. 4.5). We explain why it is impossible to have a useful singletree complex wavelet or a shiftinvariant nonredundant wavelet transform based on short support ﬁlters (2.3. We describe how Bayesian interpolation can be implemented with wavelet methods (8. subjective and merely represents the author’s current opinion. 2.9.7.4. We describe shift invariant wavelet models in terms of a Gaussian random process (7. We experimentally compare feature sets for segmentation (5.1. We describe how the DTCWT can be used for image deconvolution(9) and provide experimental comparisons with other techniques (9. .7. of course.2). We calculate theoretical predictions of aesthetic and statistical quality of solutions to interpolation problems (8.
2).5).5.8.3. 3. We characterise minimum smoothness norm interpolations and prove that shift dependence is always expected to decrease the quality (8.7). We perform segmentation experiments to measure the eﬀect of some additional autocorrelation based features (6.2).3).3). We perform experiments to compare the noise gain performance for certain complex Daubechies wavelets and the DTCWT (2.8 CHAPTER 1.2).3).7.5. 8.8. 7. We propose and compare a number of methods for calculating search directions within the deconvolution method (9. and the noise gain in reconstruction (2. 1. We experimentally compare the shift dependence of interpolation methods based on the DTCWT and alternative wavelet transforms (8. 5. 1.4. We show how the speed of wavelet interpolation can be signiﬁcantly increased by allowing a small amount of error (8. We use Fourier transforms to explain why Van Cittert and Landweber methods (without a positivity constraint) will always perform worse than oracle Wiener ﬁltering (C. We review the main deconvolution techniques from a Bayesian perspective (appendix C). 4.4.3 Least important contributions These are speciﬁc experimental or theoretical results of less general interest but which give supporting evidence for the thesis. We derive connections between the singular values of a wavelet transform matrix. . 7. 6. We explain a method for fast conditional simulation for problems of interpolation and approximation (8. 2.4). the frame bounds for the transform. INTRODUCTION 6.
A technical report [35] has been published containing the results of a collaboration with an expert in seismic surveying.1 illustrates the organisation. 1.5 Organisation of the dissertation Figure 1.3) and display some experimental results (4. all the original research presented here is the work of the sole author. The next three chapters propose and test a number of nonBayesian complex wavelet image processing methods.4. image classiﬁcation. 1. During the collaboration we applied the results of chapter 8 to the problem of using seismic measurements to determine subsurface structure.5. Nevertheless. We develop pixel by pixel (5. Chapter 4 proposes texture features based on complex wavelets .4). The purpose of the ﬁrst two chapters is mainly to provide background information that will be useful in the subsequent chapters. ORGANISATION OF THE DISSERTATION 9 1. The report itself contains contributions from the expert but all the material and results of chapter 8 are the work of the sole author. Chapter 3 reviews some relevant work done in motion estimation.8). 2.1. Chapter 2 reviews the principles of standard wavelet transforms from a ﬁlter design viewpoint and describes the properties of complex wavelet transforms.9) based on complex wavelets.5) and multiscale segmentation methods (5. 4. We explain how the autocorrelation of DTCWT subbands can be used to improve the quality of texture synthesis (6.3). We show how the DTCWT can be used for texture synthesis (4.4 Contributions based largely on previous work These are straightforward extensions of existing work to use the complex wavelet transform.2) and display some experimental results (6. The author has chosen to use the pronoun “we” to represent himself in order to avoid both the jarring eﬀect of “I” and the awkwardness of the passive tense. 3.4. We perform some experiments using the Hidden Markov Tree for image denoising (3. and image denoising that uses complex wavelets.
1: Guide to the dissertation .10 CHAPTER 1. INTRODUCTION Background 2: Wavelet transforms 3: Previous applications NonBayesian processing 4: DTCWT texture features Example applications 5: Segmentation 6: Correlation modelling Bayesian processing 7: Bayesian modelling Example applications 8: Interpolation and approximation 9: Image deconvolution Final remarks 10: Conclusions 11: Further possibilities Figure 1.
Chapter 5 compares a DTCWT segmentation method with the results from alternative schemes for a variety of image mosaics. Chapter 7 introduces a Bayesian framework for image modelling. Chapter 11 suggests directions for future research. Chapter 9 uses a similar Bayesian method to address the problem of deconvolution.5. The ﬁnal two chapters summarise the ﬁndings and discuss future possibilities.1. At the end of the dissertation are the references and appendices. Chapter 10 discusses the impact of the research and summarises the main conclusions of the dissertation. The following three chapters propose and test Bayesian approaches to image processing. Chapter 6 extends the texture set to include longerrange correlations. Chapter 8 describes the application of Bayesian methods to approximation and interpolation. ORGANISATION OF THE DISSERTATION 11 and examines their properties by means of texture synthesis experiments. .
12 CHAPTER 1. INTRODUCTION .
This importance of this result is demonstrated by an experiment comparing a singletree complex wavelet with a dualtree complex wavelet. The chapter ﬁrst gives a short introduction to the terminology and construction of real wavelet transforms and then a review of a number of complex wavelet transforms. The main original contribution of the chapter is an equation relating the balance of a transform to the amount of noise ampliﬁcation during reconstruction which shows why a balanced complex wavelet transform is preferred.Chapter 2 Wavelet transforms 2. 2. 13 . This dissertation is concerned with the application of wavelets to image processing problems making the engineering perspective the most useful. We explain why useful singletree complex wavelets will suﬀer from poor balance (we deﬁne balance as a measure of similarity between the ﬁlters in the forward and inverse transforms) and why most nonredundant transforms will suﬀer from aliasing. but the equation 2. and mathematics.15 mentioned above is original. physics. Mallat [75]. The principal sources for this chapter are books by Daubechies [30].1 Summary The purpose of this chapter is to introduce and motivate the use of a complex wavelet transform.2 Introduction The concepts behind wavelets have been independently discovered in many ﬁelds including engineering. The description of wavelets and complex wavelet systems is based on the material referenced.
. . In contrast. averaged over all time. . Downsample the ﬁlter output by 2 to give output coeﬃcients y0 (n). b(1). Filter the input signal x(n) with the ﬁlter whose Ztransform is H1 (z). .↓ 2 .1) . and the output is a series of M numbers that describe the timefrequency content of the signal.↓ 2 . The wavelet transform is based upon the building block shown in ﬁgure 2. The Fourier transform uses each output number to describe the content of the signal at one particular frequency. Downsampling by k is a common operation in subband ﬁltering.H1(z) . Filter an input signal (whose value at time n is x(n)) with the ﬁlter whose Ztransform is H0 (z). and Vetterli and Kova˘evi´ [122].1. (2.1: Building block for wavelet transform This diagram is to be understood as representing the following sequence of operations: 1. 2. b(M − 1) by retaining only one of every k coeﬃcients b(n) = a(kn). This operation converts a sequence of kM coeﬃcients a(0). a(1). . We follow the notation of Vetterli [121]. WAVELET TRANSFORMS Strang and Nguyen [113]. . .14 CHAPTER 2. say. 4. 2.3 The Wavelet Transform We will ﬁrst describe the one dimensional dyadic discrete time wavelet transform. This is a transform similar to the discrete Fourier transform in that the input is a signal containing N numbers. the outputs of the wavelet transform are localised in both time and frequency.y0 x .y1 Figure 2. We assume familiarity with c c the Ztransform. . a(kM −1) to a sequence of M coeﬃcients b(0). It is represented by the notation ↓k . . Downsample the ﬁlter output by 2 to give output coeﬃcients y1 (n). 3.H0(z) . This block is crucial for both understanding and implementing the wavelet transform.
Filter the upsampled signal with G0 . Figure 2.↑2 .↓ 2 .y01 . .↓2  .2 shows an example of a 4level subband decomposition tree.↓2  H0 H1 .↓ 2 . A full wavelet transform is constructed by repeating this operation a few times. The idea is to split the original signal into two parts.y1 .↓ 2 . 3.3.y001 H1 Figure 2.y0000 . The downsampling is a way of preventing redundancy in the outputs. Upsample the lowpass coeﬃcients y0 by 2. 2. Each time the basic splitting operation is applied to the scaling coeﬃcients just generated.2. Figure 2.↓ 2 . the general trends given by y0 . This represents the forward wavelet transform.3 shows the building block for the reconstruction. Upsample the highpass coeﬃcients y1 by 2.2: Subband decomposition tree for a 4 level wavelet transform The ﬁlters are carefully designed in order that the wavelet transform can be inverted. and the lowpass ﬁltered coeﬃcients in y0 are known as scaling coeﬃcients. and the ﬁne details given by y1 .↑2  G0 G1 + ?z 6 Figure 2.↓ 2 .y0001  . The detail coeﬃcients (highpass ﬁltered) in y1 are known as wavelet coeﬃcients.3: Building block for inverting the wavelet transform following operations: 1. THE WAVELET TRANSFORM 15 For wavelet transforms H0 (z) will be a lowpass ﬁlter and H1 (z) will be a highpass ﬁlter. y0 y1 This block represents the .↓2  H0 H1 . Level 3 Level 2 Level 1 y0 y00 Level 4 y000 H0 x H0 H1 .
Add the two ﬁltered signals together. Filter by H(z k ) then downsample by k.2 past the ﬁlters to produce the equivalent structure shown in 2. a(kM − 1) by inserting k − 1 zeros after every coeﬃcient. Filter the upsampled signal with G1 . . This operation converts a sequence of M coeﬃcients b(0). The impulse responses of these combined ﬁlters are called analysis wavelets (for any combination including a highpass ﬁlter) or scaling functions (for a combination of only lowpass ﬁlters). . We can use this block repeatedly in order to recover the original sequence from the wavelet transform coeﬃcients: 1. 3. Downsample by k then ﬁlter by H(z). 2. b(1). We now describe an alternative structure that is less eﬃcient but often more convenient for theoretical results. a(1). 2. 5. . . Reconstruct y0 from y00 and y01 . . WAVELET TRANSFORMS 4.16 CHAPTER 2. Upsampling by k is represented by the notation ↑k . b(M −1) to a sequence of kM coeﬃcients a(0). . The alternative relies on the equivalence of the following two transforms: 1. The ﬁlters are usually designed to ensure that when this reconstruction block is applied to the outputs of the analysis block. Reconstruct y00 from y000 and y001 . Reconstruct x from y0 and y1 . More precisely: if n is a multiple of k then a(n) = b(n/k).4 in which all the downsampling operations have been moved to the right. otherwise a(n) = 0. The structure described above is an eﬃcient way of implementing a wavelet transform. the output sequence z(n) is identical to the input sequence x(n). Each subband is now produced by a sequence of ﬁlters followed by a single downsampling step. This equivalence allows us to move the downsampling steps in ﬁgure 2. the y001 coeﬃcients are produced by ﬁltering with W001 (z) = H0 (z)H0 (z 2 )H1 (z 4 ) followed by downsampling by 8. . Use the block to reconstruct y000 from y0000 and y0001 . . 4. . For example. This is known as perfect reconstruction (PR).
1 Filter design and the product ﬁlter Vetterli showed that only FIR (Finite Impulse Response) analysis and synthesis ﬁlters lead to perfect reconstruction without implicit pole/zero cancellation [121].2.↓ 16 .↓ 4 .y001 Figure 2.y0001 . In a similar way the reconstruction can be represented as an upsampling step followed by a single ﬁltering step.or simply the scale 3 wavelet. In this case the LHS of the ﬁrst equation is of the form 2z −k for some integer k.↓ 8 .3.↓ 2 .3) (Sometimes the perfect reconstruction condition is relaxed to mean that the reconstructed signal is identical to a shifted version of the original.y0000 H1(z8) .3.4) .4: Alternative structure for a subband decomposition tree The impulse response of W001 (z) is called the analysis wavelet for scale 3 .2) (2. Suppose that we want to design analysis and reconstruction FIR ﬁlters that can perfectly reconstruct the original sequence.H0(z) x(n) H0(z2) H0(z4) H0(z8) .y01 H1(z4) .H1(z) .y1 H1(z2) . 2. The impulse response in this case is called the reconstruction or synthesis wavelet.↓ 16 . THE WAVELET TRANSFORM 17 Level 4 Level 3 Level 2 Level 1 .) It can be shown that solutions are given by any FIR ﬁlters that satisfy the following equations H1 (z) = z −1 G0 (−z) (2. It can be shown [121] that a necessary and suﬃcient condition for perfect reconstruction is that 2 = H0 (z)G0 (z) + H1 (z)G1 (z) 0 = H0 (−z)G0 (z) + H1 (−z)G1 (z) (2.
If we ignore such trivial changes then it can also be shown that any FIR ﬁlters that achieve perfect reconstruction must also satisfy the design equations given above1 . If we have a solution then we can produce additional solutions by (2. or adding a time delay to the analysis ﬁlters and a time advance to the reconstruction ﬁlters (or the other way around). . Analysis ﬁlters The ﬁlters used in the forward wavelet transform (H0 (z) and H1 (z)). Finally. substituting this into equation 2.) Balanced ﬁlters will therefore have equal magnitude frequency responses Ga (ejθ ) = Ha (ejθ ).3 shows that H1 (z) = −(1/r)(−z)−k G0 (−z). (For complex ﬁlters balance requires equivalence of the reconstruction ﬁlters with the conjugate time reverse of the analysis ﬁlters.18 CHAPTER 2. Perfect reconstruction (PR) A system has the perfect reconstruction property if the combination of a forward and reverse wavelet transform leaves any signal unchanged. Reconstruction ﬁlters The ﬁlters used in the reverse wavelet transform (G0 (z) and G1 (z)).2 is true then it is clear that H0 (z) and H1 (z) cannot share any nontrivial zeros (we call zeros at the origin trivial).5) (2. A similar argument shows that any nontrivial zeros of G1 (z) belong to H0 (−z) and hence G1 (z) = rz k H0 (−z) where k is an integer delay and r is a scaling factor. Balanced For real ﬁlters we deﬁne the system to be balanced if G0 (z) = H0 (z −1 ) and G1 (z) = H1 (z −1 ). 2. These simple changes do not change the wavelet transform in any signiﬁcant way.3 is true then any nontrivial zeros of H0 (−z) must therefore belong to G1 (z).2 Terminology This section deﬁnes a number of terms that will be used in the following discussion. either multiplying all the coeﬃcients in an analysis ﬁlter by r and all the coeﬃcients in the corresponding reconstruction ﬁlter by 1/r. If equation 2. WAVELET TRANSFORMS G1 (z) = zH0 (−z) P (z) + P (−z) = 2 where P (z) is known as the product ﬁlter and is deﬁned as P (z) = H0 (z)G0 (z). For convenience the important deﬁnitions from the previous sections are repeated here.6) 1. 1 If the equation 2.3. 2.
and in particular that there is no conjugation. Nonredundant A transform is nonredundant if the redundancy is 1. For even length ﬁlters we also allow a time delay. the concept is crucial and within this dissertation we will exclusively use the deﬁnition given here. We will use antisymmetric as another way of saying symmetric with odd symmetry. the term balance has been used for other purposes within the wavelet literature (Lebrun and Vetterli use it to measure the preservation of a polynomial signal in the scaling coeﬃcients during reconstruction [70]) and the reader should be aware that our usage is not standard. Near balanced The system is near balanced if the analysis ﬁlters are close to the conjugate timereverse of the reconstruction ﬁlters. a method . Nevertheless. Redundancy The redundancy of the transform is the ratio of the number of outputs to the number of inputs.2. BiOrthogonal A PR system is biorthogonal if the transform is nonredundant but not balanced. Orthogonal A PR system is orthogonal if the transform is nonredundant and balanced. odd symmetry means H(z) = −z −1 H(z −1 ). In other words. A complex coeﬃcient is counted as two outputs. THE WAVELET TRANSFORM 19 This deﬁnition is not normally used in wavelet analysis because for critically sampled systems it is equivalent to orthogonality. Ideal ﬁlter We say a ﬁlter is ideal if its frequency response H(f ) takes the value 1 on a set of frequencies. and 0 on all other frequencies. Shift invariant We call a method shift invariant if the results of the method are not aﬀected by the absolute location of data within an image. or symmetric with odd symmetry if H(z) = −H(z −1 ).3. Symmetric We say that an odd length ﬁlter with Ztransform H(z) is symmetric with even symmetry if H(z) = H(z −1 ). Note that these deﬁnitions are for both real and complex signals. In fact. Product ﬁlter The product ﬁlter P (z) is deﬁned as the product of the lowpass analysis and reconstruction ﬁlters P (z) = H0 (z)G0 (z). even symmetry means H(z) = z −1 H(z −1 ).
It is often useful in developing theoretical results to use vector and matrix notation to describe the transform. treating the output as complex coeﬃcients we can write wC = WC x .20 CHAPTER 2. For example. We will use x to be a N × 1 column vector containing all the input signal values. We will call a transform shift invariant if it produces subbands such that the total energy of the coeﬃcients in any subband is unaﬀected by translations applied to the original image. Let w denote a M × 1 column vector containing the wavelet coeﬃcients and let W denote a M × N matrix representing the wavelet transform such that w = W x. WAVELET TRANSFORMS that gives the answer b when applied to data a is called shift invariant if it gives a translated version of b when applied to a translated version of a. As it is often convenient to express algorithms using this matrix notation we adopt the convention that whenever such a multiplication has to be numerically evaluated it is tacitly assumed that a fast ﬁlterbank implementation is used. For complex wavelet transforms we will sometimes use complex coeﬃcients but other times it is more useful to consider the real and imaginary parts separately. When confusion is possible we will use the subscripts R and C to denote the separate and complex forms. We also deﬁne a N × M matrix P to represent the inverse wavelet transform such that (for perfect reconstruction wavelet systems) x = P w. Shift dependent We call a method shift dependent if the results of the method are aﬀected by the absolute location of data within an image. Let N be the number of input samples and M the number of output coeﬃcients. We call a transform shift dependent if it produces a subband such that translations of an image can alter the total energy in the subband. Matrix multiplication is a very ineﬃcient way of calculating wavelet coeﬃcients and such an operation should always be implemented using the ﬁlter bank form described earlier.
The explanation has so far been restricted to the wavelet transform of one dimensional signals but we will use the same deﬁnitions and notation for two dimensional wavelet transforms of images. In other words. It is this complete transform (including edge eﬀects) that is represented by the matrices W and P .3. it is convenient to retain the same matrix and vector notation so that a N ×1 vector x represents an image containing N pixels and W x represents computing the two dimensional wavelet transform of x. THE WAVELET TRANSFORM 21 and then use these complex coeﬃcients to deﬁne the separated form wR = {wC } {wC } or equivalently we can calculate the separated form directly by wR = WR x. 2. A sample from just before the beginning is assumed to have the same value as a sample just after the beginning. The eﬃcient implementation of such a two dimensional wavelet transform requires the alternation of row and column ﬁltering steps [113]. The problems occur when a ﬁlter requires samples outside the deﬁned range. A natural approach is to assume that such values are zero (known as zero extension) but this will not produce a perfect reconstruction system (except for ﬁlters with very short support such as the Haar ﬁlters). Any of these edge treatments still results in an overall linear transform. For example. In actual wavelet implementations the data sets are ﬁnite and care must be taken when processing coeﬃcients near the edges of the data set. With a careful design symmetric extension can also result in a perfect reconstruction system.3 Single tree complex wavelets Complex wavelets can be produced by using ﬁlters with complex coeﬃcients. A third method (known as symmetric extension) that avoids discontinuities at the edge is based on reﬂections of the original data. a sample from just before the beginning of the data set is assumed to have the same value as a sample just before the end.3. This allows greater freedom in the design. The easiest way to treat the edges and preserve the PR property is to assume that the signal is periodic (known as periodic extension).2. Methods diﬀer in whether the edge samples are doubled up. In particular. This has the drawback that discontinuities are normally created at the edges of the dataset. if we want to construct an orthogonal wavelet .
The ﬁrst reason only applies to multidimensional data sets. WAVELET TRANSFORMS transform with symmetric wavelets then the only possible choice with real coeﬃcients is the Haar wavelet (the Haar wavelet has very simple analysis ﬁlters H0 (z) = (1/ (2))(1 + z −1 ). For a useful transform we want P (z) to be within the low frequency half of the spectrum (more on this assumption later) and hence the passband of P (z) must cover all of the low frequencies. Unfortunately this leads to very bad noise ampliﬁcation: small changes made to the wavelet coeﬃcients result in large changes in the reconstructed signal. We now explain why it is impossible to get a useful single tree complex wavelet transform (with either orthogonal or biorthogonal ﬁlters) that will both be able to distinguish positive and negative frequencies and have good reconstruction properties. Wavelet methods are often shift dependent due to aliasing caused by the downsampling. 64]. both positive and negative. if complex coeﬃcients are allowed then many possible solutions are allowed. Any symmetric wavelet (with either even or odd symmetry) will have an equal magnitude response to positive and negative frequencies.22 CHAPTER 2. but the second reason is always important. From the equation P (z) + P (−z) = 2 we see that any frequency must be contained in the passband of either P (z) or P (−z) and that therefore the passband of P (z) must cover at least half the spectrum. such as certain cases of the complex Daubechies wavelets[73]. It is of particular interest to construct a (necessarily complex) wavelet transform that is able to distinguish positive and negative frequencies. We deﬁne the passband as the frequencies for which the magnitude of the frequency response is above 1. H1 (z) = (1/ (2))(1 − z −1 )) . This permits methods to distinguish features near 45◦ from those near −45◦ . There are two main reasons for this: 1. When images are analysed complex ﬁlters can separate the information in the ﬁrst and second quadrants (of 2D frequency space). Now consider the asymmetric case. 2. However. Therefore if H(z) is biased towards positive frequencies then G(z) must be biased towards negative frequencies. Section 2.5 contains theoretical results linking noise ampliﬁcation to the balance between analysis and . Real ﬁlters have both negative and positive frequency passbands and usually an aliased version of the positive passband will have a signiﬁcant overlap with the negative passband. By removing the negative passband the aliasing can be greatly reduced [63.
but that the product ﬁlter passband for a single tree complex wavelet transform must necessarily cover half the spectrum. 4. Complex wavelets only reduce shift dependence problems when the overlap between aliased passbands is reduced [63]. the wavelet transform in section 2. the system will still possess the same aliasing problems of the original real wavelet. Separable ﬁltering means that we can apply a 2D ﬁlter by ﬁrst ﬁltering all the . The fundamental conﬂict is that we require narrow passbands in order to reduce shift dependence. By.3. We conclude that a complex wavelet transform based on a single dyadic tree cannot simultaneously possess the four following properties: 1. 2. A lowpass product ﬁlter. However. for example. The last property merits a little further discussion.2.3. THE WAVELET TRANSFORM 23 reconstruction ﬁlters. applying a phase rotation to the ﬁlter coeﬃcients of an orthogonal real wavelet transform hk → hk exp {jθk} (where θ is a real number but not a multiple of π) it is certainly possible to construct a complex wavelet transform with balanced ﬁlters and perfect reconstruction (this operation corresponds to frequency shifting all the ﬁlter frequency responses and hence the product ﬁlter is no longer lowpass).4 Directionality and Ridgelets The previous section attempted to motivate the use of complex wavelets by their ability to give shift invariance and better directionality properties. The ability to distinguish positive and negative frequencies. 2. 3. We will show that the frequency responses of H0 (z) and G0 (z) must be close for the wavelet transform to achieve low noise ampliﬁcation. Perfect reconstruction.6) it is not generally useful. Balanced ﬁlters leading to low noise ampliﬁcation during reconstruction.3. More importantly. Real ﬁlters must respond equally to positive and negative frequencies and hence transforms based on separable ﬁltering with real ﬁlters will not diﬀerentiate between adjacent quadrants in the 2D frequency plane. such an operation will have the scaling coeﬃcients tuned to some nonzero frequency and while this may be appropriate for some specialised application (for example.
and skewing the frequency content of the image. 2. Bamberger and Smith proposed a directional ﬁlter bank [7] that generalises the notion of separability by modulating. Cand`s and Donoho have developed a mathematical framework for a rigorous treate ment of a continuous ridgelet transform[19] and notions of smoothness associated with this transform[20] but we shall only discuss the discrete version. This leads to eﬃcient computation of wavelet transforms but it is not the only possibility. angles of the form 2πl2−j are used where j and l are integers. It would be interesting to construct a complex ridgelet transform by replacing the DWT as this might add shift invariance to the other properties of the ridgelet. Later in this chapter we discuss examples of useful complex wavelets that do reduce aliasing.3. 20]. as the ridgelet transform is based upon a 1D DWT. A Radon transform computes the projection of the image intensity along a radial line oriented at a speciﬁc angle. the transform naturally inherits the aliasing of the 1D DWT. For the ridgelet transform. and then ﬁltering all the columns with a second 1D ﬁlter. rotating. in this dissertation we have chosen to extensively test a single representative complex wavelet transform rather than exploring the range of construction options. However. there is one important disadvantage: any nonredundant wavelet transform based on FIR ﬁlters will produce shift dependent . However. This has a number of advantages including low storage requirements and fast computation. WAVELET TRANSFORMS rows with one 1D ﬁlter.5 Shift invariance and the Harmonic wavelet The tree structure described for the DWT is nonredundant as it produces the same number of output coeﬃcients as input coeﬃcients. A particular basis function of the ridgelet transform has a constant proﬁle (equal to a 1D wavelet) in some speciﬁc direction depending on the associated angle in the Radon transform.24 CHAPTER 2. Unfortunately. The large range of angles used means that the ridgelet transform has good directionality properties and is much better suited than the standard real wavelet for analysing straight lines and edges of arbitrary orientations. The ridgelet transform acting on images can be implemented by a Radon transform followed by the application of a onedimensional wavelet transform to slices of the Radon output. Another important alternative is known as a Ridgelet transform [19. This results in a transform that splits the spectrum into a number of fan shaped portions but is shift dependent and does not diﬀerentiate between diﬀerent scales.
Now consider the elementary processing step that reconstructs from just the coeﬃcients in one subband. Any ﬁlter localised in both space and frequency should continue to change a signal when repeated. This shows that no matter what games are played with sampling structures and ﬁlters it is always impossible to avoid shift dependence (for a linear nonredundant PR transform) without constructing ideal band pass ﬁlters.3. Therefore P = W −1 and W P = IN . while all the other entries in T are zero. Essentially the problem is that the subbands are critically sampled and hence there will always be aliasing unless the ﬁlters have an ideal band pass response. PR means that P W = IN . In other words.8) (2. This means that a inverse transform followed by the forward transform will give identical wavelet coeﬃcients. As the transform is nonredundant both W and P are square matrices. If the transform is to be shift invariant then this operation must represent a stationary ﬁltering operation. This is elementary both in the sense that it is simple and in the sense that more complicated operations can often be viewed as a combination of such steps. Suppose we have some linear PR nonredundant transform represented by the matrices W for the forward transform and P for the inverse transform. if we repeat the ﬁltering the output is PTWPTWx = PTTWx = PTWx (2. We now consider the performance of such a system. and we conclude that the transform either possesses ideal bandpass ﬁlters or that it results in shift dependent processing. The ﬁltering can therefore be represented as z = PTWx However. It may be thought that by constructing some new tree system with carefully chosen ﬁlters and degrees of downsampling that produce oversampling in some subbands and undersampling in others it may be possible to get round this problem and produce a linear transform with a negligible amount of aliasing while still using short support ﬁlters. However. . if wi is an output coeﬃcient in the chosen subband then Tii = 1.2. FIR ﬁlters always have a nonideal response. THE WAVELET TRANSFORM 25 methods.7) and we conclude that repeating the ﬁltering does not change the output. Let T be a diagonal matrix whose diagonal entries select out a chosen subband. The downsampling introduces aliasing and so the results of processing will depend upon the precise location of the origin.
Harmonic wavelets were proposed by Newland [85] and are particularly suitable for vibration and acoustic analysis [86]. this argument suggests the stronger result that the amount of shift dependence is directly related to the amount the ﬁlters diﬀer from an ideal bandpass response. 2. orthogonal harmonic wavelets provide a complete set of complex exponential functions whose spectrum is conﬁned to adjacent nonoverlapping bands of frequency. The drawback . it is not true that signiﬁcant aliasing necessarily leads to worse results. If the reconstructed (degraded) image is now coded again with the same lossy image coder then the result of equation 2. Subband k is formed from the mk point inverse FFT of mk consecutive Fourier coeﬃcients. The easiest way to describe orthogonal harmonic wavelets is to give an account of an eﬃcient algorithm for their construction [86]: 1. WAVELET TRANSFORMS In fact. In the discussion above we have always needed to exclude ﬁlters with ideal responses. Therefore one of the aims of the dissertation is to experimentally test the importance of aliasing in diﬀerent applications. suppose we wish to implement a very simple lossy image coder by just transmitting a few of the largest coeﬃcients. The reconstructed image will be the z of equation 2. From this construction it is easy to construct a perfect reconstruction inverse by inverting each step. the second example is the wellknown Fourier transform that also results in a shift invariant system. It is impossible to strengthen the results because of three important counter examples.7 (where T is now deﬁned to preserve the transmitted coeﬃcients).8 holds and proves that no additional errors are introduced by the repeated coding. However. For example. There is no restriction on the number mk of coeﬃcients for each subband except that together each coeﬃcient must be associated with exactly one subband. This results in shift invariant processing in a trivial sense by means of doing nothing.26 CHAPTER 2. Of course. It is also clear that the ﬁlters corresponding to each subband have an ideal bandpass response and hence result in a shift invariant system. Less trivially. Thirdly and most interestingly is the example of orthogonal harmonic wavelets. Compute the N point FFT (Fast Fourier Transform) of the data x. in practice there will be quantisation errors and a more sophisticated choice of which coeﬃcients to keep but it is certainly feasible that the aliasing is beneﬁcial. The ﬁrst example is a transform consisting of the ﬁlter H(z) = 1 that produces a single subband (containing the original data). In particular.
for the reasons given in section 2. Furthermore. From a computational perspective there is not much diﬀerence between this method and a complex wavelet implemented with a tree structure. (Recall that we cannot hope to reconstruct from just a single branch. directionally selective.3. but as it is based on the shift dependent DWT subbands it naturally retains the DWT shift dependence. THE WAVELET TRANSFORM 27 of an ideal bandpass response is that the associated wavelets have a poor localisation in time. Using increased redundancy in this method could reduce this additional shift dependence but will never remove the shift dependence caused by basing the transform on the output of a standard decimated wavelet transform. such as the topright quadrant. These more practical systems are known as harmonic wavelets. We have actually selected a diﬀerent form of complex wavelet transform as a representative for the experiments but we would expect the results to be very similar for the harmonic wavelets. This type of complex wavelet transform has increased directionality. The resulting transform would be a redundant complex wavelet system.) We proposed two main reasons for wanting to use complex wavelets. The design freedom of harmonic wavelets makes them well suited for analysis and a careful design would also permit a stable reconstruction transform to be generated.6 Nonredundant. . complex wavelets Recently a new complex wavelet transform has been proposed [120] that applies ﬁlters diﬀerentiating between positive and negative frequencies to the subbands from a standard wavelet transform.3. The complexity of the Fourier transform is order N log N while a wavelet transform is order N but for signals of modest length the harmonic wavelet may well be quicker to compute. increased directionality. The multiple subbands produced by this complex ﬁltering can be recombined to give a perfect reconstruction. In practice. the additional ﬁltering discriminating between the diﬀerent quadrants of frequency space will cause additional shift dependence errors.2.3. and reduced shift dependence. the boxcar spectrum of the orthogonal harmonic wavelets is smoothed to improve this localisation and the spectra of adjacent wavelet levels are overlapped to give oversampling to improve the resolution of time frequency maps generated from the wavelets. The outputs of these ﬁlters are again subsampled so that the complete complex transform is nonredundant. 2.3.
WAVELET TRANSFORMS 2.28 CHAPTER 2. Wilson found condition numbers ranging between 1.7 Prolate spheroidal sequences Slepian introduced the use of the prolate spheroidal wavefunctions for signal processing [109]. The condition number µ of this matrix is measured. In this case critically sampled means that if we have an input containing N complex numbers.3. The eigenvector corresponding to the largest eigenvalue can then be translated and frequency shifted to construct a set of basis functions that tile the timefrequency plane. The condition number is deﬁned as µ(Q) = σ0 /σN −1 (2.9) where σk is the (k + 1)th eigenvalue. The signiﬁcance of the condition number will be seen in section 2. For practical application it is useful for the functions to be deﬁned on ﬁnite 2D lattices [126]. There is no direct consideration of the reconstruction ﬁlters or of the eﬀect of constructing a wavelet pyramid from the functions. Wilson has examined the properties of a critically sampled FPSS [126]. The frequency responses of the functions tend to be Gaussian shaped and well separated. there is no guarantee that it will be possible to have ﬁnite impulse response (FIR) reconstruction ﬁlters. As this is greater than 1 the reconstruction ﬁlters will not be balanced. We will call these basis wavefunctions the wavelets of this transform.406 [126].8 (the square of the condition number).348 and 1. Such functions are called ﬁnite prolate spheroidal sequences (FPSSs). For perfect reconstruction we need an overall frequency response that sums to unity for . If the wavelets were orthogonal then Q would be equal to the identity matrix. calculating the correlation of the signal with each wavelet) and W H represents the Hermitian transpose of W . This means that the ratio of the eigenvalues varies by more than a factor of 1. For example. These functions are the solutions of a simple energy form of the uncertainty principle.5.e. of Q. The main problems with the FPSS occur when we need to reconstruct the signal from the transform coeﬃcients. He deﬁnes a matrix Q = W H W where W represents the transform (i. Although this is good for reducing aliasing. in descending numerical order. The only design criteria for the wavelets is good timefrequency energy concentration. It is suggested [126] that truncating the inverse ﬁlters results in an almost perfect reconstruction.2. then the output also contains N complex numbers. A FPSS can be deﬁned as the eigenvectors of a linear transform that ﬁrst bandlimits a signal and then truncates the signal. but errors will be magniﬁed by the tree structure or by iterative techniques. it leads to problems during reconstruction.
They were originally proposed in 1980 by Marcelja [77] for 1D and Daugman [31] for 2D in order to model the receptive ﬁeld proﬁles for simple cells in the visual cortex. REDUNDANT COMPLEX WAVELETS 29 all frequencies. y ) x y = a−m (x cos θ + y sin θ) = a−m (−x sin θ + y cos θ) (2.13) where θ = nπ/K and K is the total number of orientations. y) = 1 2πσx σy exp − 1 2 x2 y2 + 2 2 σx σy + 2πjW x (2. This section describes the choice of Gabor wavelets that Manjunath and Ma found to be best in their texture processing experiments [76]. The term Gabor wavelets is used for several diﬀerent systems that diﬀer in the choice of positions. This function can then be dilated and rotated to get a dictionary of ﬁlters by using the transformation gmn (x. This results in large ampliﬁcation being necessary for some frequencies and hence bad noise ampliﬁcation properties due to the unbalance between analysis and reconstruction ﬁlters. 2. Manjunath and . Figure 2.4 Redundant complex wavelets We ﬁrst describe the traditional formulation (called Gabor wavelets) and problems of redundant complex wavelets and then explain recently developed solutions.12) (2. Given a certain number of scales and orientations. the scaling factor a and the bandwidths of the ﬁlters are chosen to ensure that the halfpeak magnitude support of the ﬁlter responses in the frequency spectrum touch each other. The low pass ﬁlter does not necessarily have a zero at a frequency of π and therefore the coarse level scaling functions can develop oscillations [30].10) where σx and σy are the bandwidths of the ﬁlter and W is the central frequency. y) = a−m g(x .2.4.5 shows these halfpeak contours.11) (2. and scales for the ﬁlters. A twodimensional Gabor function centred on the horizontal frequency axis can be written as g(x. orientations. Another problem occurs if we try and use the FPSS wavefunctions as the low and high pass analysis ﬁlters of a wavelet transform.
5: Contours of halfpeak magnitude of ﬁlters at scales 3 and 4 They can be implemented using the Fast Fourier Transform (FFT) to perform the ﬁltering. Some attempts have been made to reduce the amount of redundancy.08 0. They are ineﬃcient.06 0. The problem is expressed in terms of a neural network that must be trained for each new image to be processed. 2. This requires one forward transform. and the same number of inverse transforms as there are desired subbands in the image. They are hard to reconstruct. Their exact reconstruction takes 900 times longer than the approximate method.04 −0.04 0. The main advantage is that the frequency responses can be chosen to achieve perfect reconstruction. He developed a complex wavelet transform based on short 4tap ﬁlters that had responses very close to Gabor . The Gabor wavelet coeﬃcients are found by calculating the full transform and discarding the unwanted coeﬃcients. while P¨tzsch et al [95] o use sets of Gabor wavelets (that they call jets) centered on a small number of nodes.05 0 0. P¨tzsch et al propose an approximate reconstruction that ignores the interaction o between jets at diﬀerent nodes [95].1 0. Daugman achieves his reconstruction by a process of gradient descent during analysis that ﬁnds weights for the Gabor wavelets to allow simple reconstruction by ﬁltering [32]. Such methods have two main problems: 1. For example Daugman [32] uses a subsampled set of Gabor wavelets on a regular grid.1 Figure 2.1 −0. This process gives a very high redundancy (equal to the number of subbands) and is therefore slow to compute.05 horizontal frequency (/sample freq) 0. WAVELET TRANSFORMS Ma found that a choice of 4 scales (with a scaling factor of a = 2) and 6 orientations at each scale was best.12 vertical frequency (/sample freq) 0. 0. This is a slow process.02 0 −0. An alternative approach was developed by Magarey [74].02 −0.30 CHAPTER 2.
08 0. and hence the energy.4. The extra redundancy allows a signiﬁcant reduction of aliasing terms and the complex wavelets are approximately shift invariant [63]. By using even and odd ﬁlters alternately in the trees it is possible to achieve overall complex impulse responses with symmetric real parts and antisymmetric imaginary parts.1 0. As for the Gabor wavelets.06 0. 2.6: Contours of 70% peak magnitude of ﬁlters at scales 3 and 4 The main advantages as compared to the DWT are that the complex wavelets are approximately shift invariant and that the complex wavelets have separate subbands for positive and negative orientations. 0. . Conventional separable real wavelets only have subbands for three diﬀerent orientations at each level and cannot distinguish lines near 45◦ from those near −45◦ . while the other produces the imaginary parts. This eﬃcient system was successfully used for motion estimation (as described in section 3. Complex coeﬃcients only appear when the two trees are combined. but the magnitude.4.2) but did not possess a simple set of reconstruction ﬁlters. but the number of orientations is built into the method). Note that all the ﬁlters in the dual tree are real.04 −0.02 −0. The frequency responses for the 2D transform are shown in ﬁgure 2.7. 63] have similar shapes to Gabor wavelets. REDUNDANT COMPLEX WAVELETS 31 wavelets.1 Dual tree complex wavelet transform Kingsbury’s complex wavelets [62.02 0 −0. translations cause large changes to the phase of the wavelet coeﬃcients.04 0.1 −0. there are 6 orientations at each of 4 scales (any number of scales can be used. is much more stable.05 horizontal frequency (/sample freq) 0. The complex wavelet transform attains these properties by replacing the tree structure of the conventional wavelet transform with a dual tree shown in ﬁgure 2.6.1 Figure 2.12 vertical frequency (/sample freq) 0. At each scale one tree produces the real part of the complex wavelet coeﬃcients.05 0 0.2.
.x01b . This ﬁgure was provided by Dr N.x1b .x1a x0b ..↑ 2 .↓ 2 .G.H000a ...G.↓ 2 .↓ 2 .x01a x00b .↓ 2 .x0000b ...x001b ..H0000b .H000b .↓ 2 odd H0000a .1 + ?. 6 Figure 2.x0001b odd  .↓2 .↓ 2 even .↓2 .H001b .0 x.y .x0001a even  .↓ 2 .↑ 2 . Kingsbury.1 . WAVELET TRANSFORMS Level 4 Level 3 Level 2 Level 1 x0a x00a x000a Tree a H0a odd H1a .↓ 2 . .H00b .↓ 2 .H.32 CHAPTER 2..↓ 2 .x001a x000b x Tree b H0b odd H1b .H.H0001b .↓ 2 ..H01a ..x0000a H0001a ..H00a .↓ 2 ..H01b .0 .H001a . 2band reconstruction block .7: The complex wavelet dual tree structure.↓ 2 .↓ 2 even .↓ 2 odd .↓ 2 .
The ﬁlter sets must be biorthogonal because they are linear phase. the transform has a computational order of N2d . so that the reconstruction ﬁlters are just the time reverse of the equivalent analysis ﬁlters. Instead they are designed to have a group delay of approximately 1 . The PR ﬁlters used are chosen to be orthonormal.4. 2. In d dimensions with N samples. This important property means that in a 2D version of the dual tree separable ﬁlters can be used to ﬁlter an image and still distinguish the information in the ﬁrst and second quadrants of the twodimensional frequency response . 3. The ﬁlters are nearbalanced and permit perfect reconstruction from either tree. The two trees have slightly diﬀerent frequency responses.information that allows us to distinguish features at angles near 45◦ from those near −45◦ .8. Note that it is impossible to discriminate positive and negative frequencies when using conventional real wavelets. For comparison the fully decimated transform has order N and the nondecimated wavelet transform has order N((2d − 1)k + 1) where k is the number of scales.2. the ﬁlters at level 1. More precisely. 2. REDUNDANT COMPLEX WAVELETS 33 The ﬁlters are designed to give a number of desired properties including strong discrimination between positive and negative frequencies. The required delay diﬀerence of 4 1 2 sample is achieved by using the time reverse of the tree a ﬁlters in tree b. The ﬁlters beyond level 1 have even length but are no longer strictly linear phase. There are a number of choices of possible ﬁlter combinations.2 Qshift Dual tree complex wavelets In each subband one tree produces the real part and the other the imaginary part of the complex wavelet coeﬃcient and so the ﬁlters in the two trees cannot be identical but must be designed to produce responses that are out of phase. The main problems with the odd/even ﬁlter approach to achieving this delay are that [65]: 1. The subsampling structure is not very symmetrical. This tree is shown in ﬁgure 2. and the ﬁlters at all higher levels.4. We have . These drawbacks have been overcome with a more recent form of the dual tree known as a Qshift dual tree [65]. There are two sets of ﬁlters used. The results of inverting both trees are averaged as this achieves approximate shift invariance. a delay diﬀerence of 1 2 sample is required between the outputs of the two trees.
x0001a  .H00b .x01b .↓ 2 .↓ 2 .↓ 2 .H01a .↓ 2 .x001b Figure 2.x0000b . Kingsbury.↓2 .H01b .↓ 2 .x01a x00b .H01b . WAVELET TRANSFORMS Level 4 Level 3 Level 2 Level 1 x0a x00a x000a Tree a H0a H1a .x1b .↓ 2 .H00b .x0001b  .x0000a .x1a x0b .↓ 2 .x001a x000b x Tree b H0b H1b .↓ 2 . This ﬁgure was provided by Dr N.↓ 2 .↓ 2 .↓2 .34 CHAPTER 2.H01a .H00a .↓ 2 .H01a .↓2 .H01b .8: The Qshift dual tree structure.↓ 2 .H00a .H00a . .↓ 2 .H00b .
At each level four output subband images are produced by using high pass ﬁlters tuned to diﬀerent orientations. The steerable pyramid is selfinverting in that the reconstruction ﬁlters are the same as the analysis ﬁlters.3 Steerable transforms Simoncelli et al [107] highlighted the problem of translation invariance for orthogonal wavelet transforms and developed a theory of shiftability. REDUNDANT COMPLEX WAVELETS 35 chosen to use the (1319)tap nearorthogonal ﬁlters at level 1 together with the 14tap Qshift ﬁlters at levels ≥ 2 [65]. 4 subband images with N*N coeﬃcients 2.4. This decomposition has the disadvantages of nonseparable ﬁlters. and a ﬁnal low pass image with (N/8)*(N/8) coeﬃcients. A radially symmetric lowpass ﬁlter is also applied and the output is subsampled by a factor of two in each direction to produce the input image for the next level.2. In addition it divides each frequency band into a set of four orientation bands. 2. The Qshift transform retains the good shift invariance and directionality properties of the original while also improving the sampling structure. The decomposition is called “steerable” because the response of a ﬁlter tuned to any orientation at a particular level can be obtained through a linear combination of the four computed responses at that level.4. Hence we have gone from N 2 numbers to (4(1+1/4+1/16)+1/64)N 2 ≈ 5. 4 images with (N/4)*(N/4) coeﬃcients 4. They developed a twodimensional pyramid transform that was shiftable in both orientation and position which decomposes the image into several spatial frequency bands. .25N 2 coeﬃcients. When we talk about the complex wavelet transform we shall always be referring to this Qshift version unless explicitly stated otherwise. We will often refer to this transform by the initials DTCWT. nonperfect reconstruction and being an overcomplete expansion. Note that there is no subsampling in the highpass channels and so a three level decomposition of an N by N image will produce: 1. 4 images with (N/2)*(N/2) coeﬃcients 3.
This leads to the good shift invariance properties. The advantages of this special case are: 1. The absence of signal paths between the two trees (reﬂected in the diagonal structure of the matrices) leads to less computation. Multiwavelets are closely related to the DTCWT. For some of these a quadrature pair of steerable ﬁlters was used. The preprocessing and ﬁlters are carefully designed to allow an interpretation of the output as complex coeﬃcients produced by ﬁlters discriminating between positive and negative frequencies. The DTCWT processing at scale 1 can be viewed in two ways. One way is to see it as part of a multiwavelet structure that uses repeated row preprocessing (giving the redundancy) and has diﬀerent ﬁlters for the ﬁrst scale.4 Multiwavelets An alternative way of avoiding the limitations of the standard wavelet transform is known as multiwavelets [3. 2. The choice of preprocessing decides the redundancy of the system and it is possible to have both critically sampled and redundant multiwavelet systems. and image denoising [107]. 46].4. texture synthesis [45]. From the ﬁlterbank perspective the diﬀerence is that the signals are now vector valued and the scalar coeﬃcients in the ﬁlter banks are replaced by matrices. The other way is to interpret the ﬁrst scale as performing the multiwavelet preprocessing while preserving (in the scale 1 subbands) the parts of the signal that are ﬁltered out. In either case it is clear that the DTCWT is a special case of a multiwavelet transform. Using quadrature ﬁlters makes the steerable transform almost equivalent to the DTCWT with the main diﬀerences being that the steerable transform has increased redundancy and worse reconstruction performance. 2. The equivalent matrices in the multiwavelet ﬁlterbank are simply 2 by 2 diagonal matrices whose diagonal entries are given by the corresponding coeﬃcients from the DTCWT ﬁlters. WAVELET TRANSFORMS This transform has been used for a number of applications including stereo matching [107]. If we combine the signals from tree a and b to produce a single 2 dimensional signal then it is clear that the DTCWT for scales 2 and above is equivalent to a multiwavelet system of multiplicity 2. Experimental results in the literature indicate that the redundant systems usually give better results for denoising [114] (but are less appropriate for coding applications). .36 CHAPTER 2. The conversion of the original data signal to the vectorised stream is known as preprocessing and there are a number of choices.
The model and notation are described in section 2. σ 2 IM .2. Invert the wavelet transform to recover a new signal y ∈ R N y = P v.5. However. NOISE AMPLIFICATION THEORY 37 2.1. 3.5. We give results (proved in appendix A) that describe the noise gain during reconstruction and in particular equation 2.15 that gives a formula for the noise gain in terms of the redundancy and unbalance of the transform (the noise gain is a measure of the change in error energy caused by the inverse wavelet transform).1 Preliminaries We consider a simple form of wavelet processing in which we have an observed image (or signal) x ∈ R N that is processed by: 1. 2. Calculate the wavelet transform w ∈ R M of x w = Wx This theory applies to both real and complex wavelet transforms. If the wavelet transform is associated with an orthogonal matrix then an error in the wavelet coeﬃcients translates directly into an error of the same energy in the signal. For complex transforms we use the separated form in which w is still a real vector and consists of the real parts of the complex coeﬃcients followed by the imaginary parts. Apply some wavelet domain processing to produce new wavelet coeﬃcients v ∈ R M . for biorthogonal and redundant transforms the energy of the error can change signiﬁcantly during reconstruction. 2. In the mathematical analysis of this model we will model the wavelet domain processing by adding independent white Gaussian noise of mean zero and variance σ 2 to the wavelet coeﬃcients. This will be the case for (appropriately scaled) orthogonal wavelet transforms. This section considers the eﬀect of a very simple model of wavelet domain processing. v = N w.5 Noise ampliﬁcation theory The use of redundant transforms gives a much greater design freedom but there are also extra complications.5.
9: Model of wavelet domain processing. . We deﬁne a normalised transform to be a transform scaled in this manner. The total expected energy of the added noise is E { v − w 2 } = Mσ 2 . However.1) that this is equivalent to the requirement that tr(W T W ) = N.14) A low noise gain means that small changes in the wavelet coeﬃcients lead to small changes in the reconstructed signal. The quantisation noise will remain at the same level for diﬀerent scaling factors s and the argument is now a valid argument for scaling the coeﬃcients to use the full dynamic range. In order to get meaningful values for the noise gain we adopt a convention that the scaling factor s is chosen such that the transform preserves the energy of a white noise signal during the forward wavelet transform. We assume that all transforms mentioned in this section satisfy this convention. This model is illustrated in ﬁgure 2. in almost all practical applications a scaling factor of s will mean that the noise standard deviation σ is also increased by the same factor2 . We would like to deﬁne the noise gain as the ratio of these two energies but there is a problem due to scaling. The noise gain g is then deﬁned as g= E { y − x 2} E { v − w 2} (2. During reconstruction all the coeﬃcients are scaled down by a factor of s and hence the noise energy after reconstruction is reduced by s2 .9. Suppose we construct a new wavelet transform W = sW that simply scales the values of all the wavelet coeﬃcients by a factor of s and a new reconstruction matrix P = (1/s)P that still achieves PR. In this case we will call the reconstruction robust. 2 One exception to this principle is when we use ﬁxed precision numbers to store the coeﬃcients.38 CHAPTER 2. We shall prove (A. The total expected energy of the error after reconstruction is given by E { y − x 2 }. WAVELET TRANSFORMS Wavelet Transform x W w + v Inverse Wavelet Transform P y noise Wavelet domain processing Figure 2. where σ 2 is the variance of the added noise. if we use the transform to analyse a signal containing independent white Gaussian noise of mean 0 and variance α2 (to give a total expected input energy of Nα2 ) then the total expected energy of the wavelet coeﬃcients will be equal to Nα2 . In other words.
such as denoising. and w as the wavelet coeﬃcients of this original image.5. for example Q if dQ is very small then a uniform distribution of errors in the range −dQ /2 to dQ /2 is a closer approximation while for large dQ (that quantise most of the coeﬃcients to zero) a Laplacian distribution may give a better ﬁt. If these values are equally spaced by a distance dQ then we could attempt to model the errors in the wavelet coeﬃcients as white Gaussian noise of variance d2 /12. in the decoded image for a given level of quantisation error in the wavelet coeﬃcients. Consider an image restoration technique. From this new perspective the noise gain of the transform measures the relationship between the wavelet estimation error and the ﬁnal image error and it is clear that a low noise gain will be beneﬁcial in order to produce a low ﬁnal image error. but white Gaussian noise is often a reasonable ﬁrst approximation. The noise in the model represents the wavelet coeﬃcient estimation error. The quantisation results in errors in the wavelet coeﬃcients and consequently errors in the decoded image. they are modelled based on the original image. In this case we would want a low noise gain because in a codec we wish to minimise the error. 2. The estimates v are in reality produced by some estimation technique acting on the observed data. Nevertheless. for which the wavelet coeﬃcients are signiﬁcantly changed. if we now reinterpret x as representing the original image. . NOISE AMPLIFICATION THEORY 39 We now attempt to motivate this model by giving two examples where it might be appropriate: 1. In an image coding example the wavelet coeﬃcients are commonly quantised to a number of discrete values. then the same model can be used to represent the belief that v will be noisy estimates of w. However. The theory is equally valid for any linear transforms W and P provided that the system achieves perfect reconstruction (P W = IN ) and the scaling convention (tr(W T W ) = N) is observed. Better models certainly exist. x − y. This reinterpretation may seem a bit confusing but is worth understanding. The wavelet domain processing is designed to make the output coeﬃcients v a reasonable estimate of the wavelet coeﬃcients of the original image.2. The purpose of the technique is to get an enhanced image and initially it may seem inappropriate to use a transform that minimises the eﬀect of the change.
• The noise gain of any real linear perfect reconstruction transform.5. We will call U the unbalance between the analysis and reconstruction transform.5 for proof). • The average of these numbers gives the gain in energy when we transform white noise signals and so the average is one for normalised transforms (see A. P .4 for proof). dN (which are deﬁned in A.) This results in U taking its minimum value U = 0 . Most of these results come from standard linear algebra and can be found in the literature. .. used to invert W is given by g= N +U M (2. WAVELET TRANSFORMS 2. and this inversion achieves the lower bound on noise gain (see A.3 for proof).6 for proof). Balanced wavelet transforms use the conjugate timereverse of the analysis ﬁlters for the reconstruction ﬁlters and therefore P T = W .2 and are the squares of the singular values of W). • If the frame is tight then it can be inverted by the matrix W T .15) where U is a nonnegative quantity given by U = tr (P − W T )(P T − W ) (see A. (This result is true regardless of whether we have a real or complex wavelet because we are using the expanded form of the complex wavelets that treat the real and imaginary parts separately..7 for proof). We present them here for interest and as a route to the ﬁnal simple equation for the noise gain in terms of the unbalance. The equivalent statement for H complex matrices is that PC = WC .15) appearing in the literature. The maximum robustness of the transform W can be calculated from the numbers d1 . We are not aware of this ﬁnal equation (2..2 Theoretical results Consider a ﬁnite linear transform represented by the matrix W.40 CHAPTER 2. • The frame bounds are given by the largest and smallest of these numbers and so the transform represents a wavelet frame if and only if the smallest is nonzero (see A. • Any linear perfect reconstruction transform that is used to invert W has noise gain bounded below by 1 M i=N 1 i=1 di and this lower bound is achievable (see A.
The factors corresponding to zeros on the real axis are split equally between H0 (z) and G0 (z). and so all had a redundancy of 2. This matrix is found by transforming signals e1 . . This choice gives the greatest possible diﬀerentiation between positive and negative frequencies. The wavelet transform of ek gives the 3 The single tree complex wavelets were restricted to produce a real output by taking the real part of the reconstructed signal. The frequency response of an analysis ﬁlter will be (for a balanced transform) the conjugate of the frequency response of the corresponding reconstruction ﬁlter. . . The purpose of this section is merely to illustrate the problems that can occur. Therefore a necessary condition for low noise gain is that the frequency responses of H0 (z) and G0 (z) must be close.3. 2.3 argued that it was impossible to get a complex tree that diﬀerentiated positive and negative frequencies while maintaining good noise reconstruction properties.5. This can also be expressed in terms of the frequency responses. We tested the robustness of the DTCWT and a variety of single tree complex wavelets. The transforms were normalised to preserve energy during the forward transform by a single scaling applied to all output coeﬃcients. Daubechies wavelets of a certain length are designed to have the maximum number of zeros in the product ﬁlter P (z) at z = −1 subject to the constraint of satisfying the conditions necessary for perfect reconstruction [30]. Usually factors are allocated in conjugate pairs in order to produce real ﬁlters but complex wavelets can be produced by alternative factorisations [73]. . The product ﬁlter is then factorised into H0 (z) and G0 (z). Note that this means that the magnitude of the frequency responses will be equal. Each factor a + bz −1 corresponds to a zero of the product ﬁlter at z = −b/a.3 Numerical results Section 2. NOISE AMPLIFICATION THEORY 41 and we deduce from the last result that balanced wavelet transforms will have the least noise gain and hence the greatest robustness. For our experiment we put all the factors corresponding to zeros with a positive imaginary part into H0 (z) and all those corresponding to negative imaginary parts into G0 (z). .2. eN where ek is zero everywhere except that at position k the value is 1. Each choice of ﬁlters corresponds to a linear transform that can be represented by the matrix W .5. Each transform was designed to produce a 6 level wavelet decomposition of a real signal3 of length 128 and all produced 128 (complexvalued) wavelet output coeﬃcients.
50028 0.5 1.599 0.0 ∗ 106 4. The greater this number the smoother the wavelets and so in the table these wavelets are described by the acronym “STCWT” followed by the number of zeros. We characterise the single tree complex wavelets (STCWT) by the number of zeros the H0 (z) ﬁlter has at z = −1. Similarly the columns of matrix P are found by reconstructing signals from wavelet coeﬃcients that are all zero except for a single coeﬃcient. The numerical results are tabulated in ﬁgure 2.10: Comparison of noise gain for diﬀerent transforms 2.7 ∗ 108 1.571 0.7 211 3430 21000 6.6 53 225 1570 8370 Reconstruction noise gain 0.2 9.5 0. The theoretical minimum reconstruction noise gain would be given if we reconstructed using the pseudoinverse solution.5.93 3.10. The results of appendix A can be used to calculate both the noise gain for P and the minimum noise gain.8 ∗ 1010 Figure 2.6 0. The matrix P represents one way (that can be eﬃciently implemented with a tree of ﬁlters) but other ways can give a better noise gain.4 Discussion The transform labelled STCWT 2 is a special case because the product ﬁlter has no complex zeros.50032 0. There are many ways of designing a perfect reconstruction inverse of W . Recall that we are treating the real and imaginary parts separately and hence W is of size 2N × N.42 CHAPTER 2. WAVELET TRANSFORMS k th column of W . The analysis and reconstruction ﬁlters are all real and this transform is identical to a .5 6. Transform name Original dualtree Qshift dualtree STCWT 2 STCWT 3 STCWT 4 STCWT 5 STCWT 6 STCWT 7 STCWT 8 STCWT 9 STCWT 10 Minimum noise gain 0.
We want to distinguish positive and negative frequencies in order to: (a) Improve the directional frequency resolution of the 2D transform while still using eﬃcient separable ﬁlters.6. The single tree complex wavelets. The forms more commonly used are much more balanced and do not suﬀer from these reconstruction problems but consequently have poor discrimination between positive and negative frequencies.2.6 Conclusions The main aim of this chapter was to introduce the terminology and construction of wavelet and complex wavelet systems. The transform is therefore balanced and achieves low noise gain but has the problems of large shift dependence and wavelets which are not very smooth. We note that the Qshift tree has a lower noise gain than the original dual tree. To allow direct comparison we still treat these real wavelets as having complex outputs and so add complex noise to them. This is because of the better balanced ﬁlters in the Qshift version. CONCLUSIONS 43 real (and orthogonal) Daubechies wavelet transform (of order 2). We deﬁne a transform to be balanced if the reconstruction ﬁlters are the conjugate time reverse of the analysis ﬁlters. If we attempted to compute a wavelet transform with 1 zero in H0 (z) we would obtain the Haar wavelet transform with a noise gain of 1/2 for the same reasons. We now summarise the principal points relating to this secondary aim: 1. 2. however. (b) Reduce aliasing and produce shift invariant methods. Note that the minimum noise gain increases at a much slower rate suggesting that an alternative reconstruction transform could be found with much less noise gain. 2. even this alternative would not be of much practical use as the minimum noise gain is still signiﬁcant. The secondary aim was to explain why we want to use complex wavelets and what form of complex wavelet is appropriate. We show that the noise gain during reconstruction . The DTCWT achieves a very low noise gain and so will give robust reconstructions. However. Note that this is a very unusual choice of complex Daubechies wavelet. Half of this complex noise is lost when we take the real part at the end of the transform which is why the noise gain is 1/2. have a rapidly increasing noise gain for the longer (and smoother) wavelets which is likely to make the wavelets useless.
give shift invariant methods. lowest) noise gains are given by balanced transforms and are equal to the reciprocal of the redundancy. We illustrate numerically the problems of noise gain caused by lack of balance when we use a single tree complex wavelet that strongly diﬀerentiates between positive and negative frequencies. harmonic wavelets. For this reason we have just selected one transform. multiwavelets. that the best (i. steerable transforms. The purpose of this dissertation is not to compare diﬀerent types of complex wavelet. 6. in particular. 4. but rather to explore the potential of such a system against more standard approaches. prolate spheroidal sequences.e. . 3. as a representative. the Qshift dual tree complex wavelet system. and use short support ﬁlters. We introduced several forms of redundant wavelet transforms including Gabor wavelets. We explain why linear PR complex wavelets based on a single standard tree cannot simultaneously. Using an elementary processing example we explain more generally why any nonredundant linear PR transform must use ideal ﬁlters in order to achieve shift invariance. 5. WAVELET TRANSFORMS is closely related to the balance of the transform and. be balanced.44 CHAPTER 2. and dual tree complex wavelets.
The only original results in this chapter come from the replication of the denoising experiments in section 3. its extra directional frequency resolution) make it appropriate for texture classiﬁcation [47. Interestingly. 33] and gives methods that are eﬃcient in terms of computational speed and retrieval accuracy. The phase of the complex coeﬃcients is closely related to the position of features within an image and this property can be utilised for motion estimation [74]. The properties of the DTCWT (in particular. The purpose of this chapter is to describe applications for which the DTCWT (or a similar transform) has already been evaluated.2 Motion Estimation Magarey [74] developed a motion estimation algorithm based on a complex discrete wavelet transform. 3. 43. The complex wavelets are also appropriate for use in denoising images [62]. when the DTCWT is used in more sophisticated denoising methods [24] it is found to signiﬁcantly outperform even the equivalent nondecimated methods.1 Summary This dissertation aims to explore the potential of the DTCWT in image processing. The transform used short 4tap complex ﬁlters but did not possess the PR 45 . Previous work has shown that a nondecimated wavelet transform [82] performs better than decimated transforms for denoising and the DTCWT is able to achieve a performance similar to the nondecimated transforms.4.Chapter 3 Previous applications 3.
but for the purposes of this dissertation we highlight just a couple of the conclusions.” Although not included in this dissertation. it is easy to detect the motion of a single dot in an image.” “In addition. For example. but it is much harder to detect the motion of a white piece of paper on a white background. “In tests on synthetic sequences the optimised CDWTbased algorithm showed superior accuracy under simple perturbations such as additive noise and intensity scaling between frames. the author had been unable to ﬁnd simple FIR ﬁlters that could be used to exactly reconstruct the original signal.46 CHAPTER 3. PREVIOUS APPLICATIONS property. The idea is to compute a small set of texturedescribing features for each image in a database in order to allow a search of the database for images containing a certain texture. 33]. we have found such phase based computation beneﬁcial for constructing an adaptive contour segmentation algorithm based on the DTCWT [34]. In other words. 43. The DTCWT has been found by a number of authors to be useful for classiﬁcation [47. By measuring the phase changes it is possible to infer the motion of the image. Magarey developed a method for incorporating the varying degrees of conﬁdence in the diﬀerent estimates.3 Classiﬁcation Eﬃcient texture representation is important for content based retrieval of image data. The fundamental property of wavelets that makes this possible is that translations of an image result in phase changes for the wavelet coeﬃcients. Each uses the DTCWT in diﬀerent ways to compute texture features for an entire image: . The task is to try and estimate the displacement ﬁeld between successive frames of an image sequence. the eﬃciency of the CDWT structure minimises the usual disadvantage of phasebased schemes– their computational complexity. Detailed analysis showed that the number of ﬂoating point operations required is comparable to or even less than that of standard intensitybased hierarchical algorithms. A major obstacle in motion estimation is that the reliability of motion estimates depends on image content. 3. The ﬁlter shapes were very close to those used in the DTCWT suggesting that the conclusions would also be valid for the DTCWT.
The aim is to adapt the transform to have the greatest frequency resolution where there is greatest energy. Hill. and Kingsbury [43] use features of the mean and standard deviations of complex wavelet subbands.75% for the DTCWT on a database of 16 images [47]. However. in order to produce rotationally invariant texture features. All authors report signiﬁcant improvements in classiﬁcation performance compared to a standard real wavelet transform. 3. instead of using the DTCWT based on a ﬁxed tree structure. The diﬀerent authors used diﬀerent databases so the following results are not directly comparable: 1.3. they use an adaptive decomposition that continues to decompose subbands with energy greater than a given threshold.4. There are many diﬀerent methods for adjusting the coeﬃcients but the basic principle is to keep large coeﬃcients while reducing small coeﬃcients. This adjustion is known as thresholding the coeﬃcients. .5% for the DTCWT on a database of 100 images [33]. Mitra. 3.64% for a real wavelet (with an adaptive decomposition) to 79. Bull. Mitra. they use features based on either the Fourier transform or the autocorrelation of the 6 energies at each scale.35% for the DWT to 93. Hatipoglu. 3.4 Denoising In many signal or image processing applications the input data is corrupted by some noise which we would like to remove or at least reduce. de Rivaz and Kingsbury report an improvement from 58. and Kingsbury report an improvement from 69. However. Hatipoglu. DENOISING 47 1. de Rivaz and Kingsbury [33] compute features given by the logarithm of the energy in each subband. 2. 2. and Canagarajah report an improvement from 87. Hill.8% for the DWT to 63. Bull. Wavelet denoising techniques work by adjusting the wavelet coeﬃcients of the signal in such a way that the noise is reduced while the signal is preserved.73% for the DTCWT (with the adaptive decomposition) on a database of 116 images [43]. and Canagarajah [47] compute the energies of the subbands at each scale.
the nondecimated wavelet transform (NDWT) [82]. at a given scale an object edge in an image may produce signiﬁcant energy in 1 of the 3 standard wavelet subbands. Therefore the reconstruction of the signal from just the large coeﬃcients will tend to contain most of the signal energy but little of the noise energy. Kingsbury has proposed the use of the DTCWT for denoising [62] because this transform not only reduces the amount of shiftvariance but also may achieve better compaction of signal energy due to its increased directionality. As for the standard techniques. The ﬁrst wavelet transform proposed for denoising was the standard orthogonal transform [39].1 shows the results when using a simple soft denoising gain rule. It was found that this method produces similar results to the nondecimated wavelet method while being much faster to compute. In other words. Experiments on test signals show that the NDWT is superior to the DWT. . White noise was added to a test image and the denoised rms error was measured for the diﬀerent techniques. The method is to attenuate the complex coeﬃcients depending on their magnitude. The main disadvantage of the NDWT is that even an eﬃcient implementation takes longer to compute than the DWT. An alternative rationale comes from considering the signal as being piecewise stationary. orthogonal wavelet transforms (DWT) produce results that substantially vary even for small translations in the input [63] and so a second transform was proposed. The size of each wavelet coeﬃcient can be interpreted as an estimate of the power in some timefrequency bin and so again we decide to keep the large coeﬃcients and set the small ones to zero in order to approximate adaptive Wiener ﬁltering. large coeﬃcients are kept while smaller ones are reduced. Where the signal power is high. which produced shift invariant results by eﬀectively averaging the results of a DWTbased method over all possible positions for the origin [69. However. while (for standard orthogonal wavelet transforms) white noise signals are represented by white noise of the same variance in the wavelet coeﬃcients. we attenuate the signal. but only 1 of the 6 complex wavelet subbands. by a factor of the three times the number of levels used in the decomposition. 26]. PREVIOUS APPLICATIONS One rationale for this approach is that often real signals can be represented by a few large wavelet coeﬃcients. where the signal power is low.48 CHAPTER 3. For each piece the optimum denoising method is a Wiener ﬁlter whose frequency response depends on the local power spectrum of the signal. Figure 3. we keep most of the power.
it is found that large coeﬃcient values cascade along the branches of the wavelet tree. A model known as the Hidden Markov Tree (HMT) proposed by Crouse.g.3 14. In this case we can see that the DTCWT has a slightly better SNR than the NDWT method but that the diﬀerence is not visually noticeable. Nowak.19dB 6.43dB Figure 3.47 12. and Baraniuk attempts to capture the key features of the joint statistics of the wavelet coeﬃcients (including persistence) [28]. In particular.3. This property is known as persistence [104]. Bottom right: NDWT results.54 SNR improvement 0dB 5. There are often signiﬁcant correlations between the wavelet coeﬃcients in the transforms of real images.4.66 12.35dB 6. DENOISING 49 Method No denoising DWT NDWT DTCWT RMS error 26.1: Top left: Original image. This is achieved by means of hidden state variables that describe the likely characteristics of the wavelet coeﬃcients (e. Bottom left: DWT results. Top right: Noisy image. whether they are likely to be . Bottom middle: DTCWT results.
Table 3.8 28.0 29.5 results we see that our results are Image σ Noisy HMT/DWT HMT/NDWT HMT/DTCWT New HMT/DTCWT Boats 10 28.9 34.8 24.8 29.6 34. similar for the boats image.7 25. The main reason for the diﬀerence is that we are using a slightly diﬀerent version of the dual tree. and Baraniuk proposed a shiftinvariant denoising model based on the HMT using the nondecimated wavelet transform (NDWT) [104].5 33.8 25.5 20.5 20. The main point to notice is that in this case the shift invariance given by the use of the NDWT tends to give a small improvement in results.1 33. The table displays the peak signal to noise ratios (PSNRs) of the denoised images. The table also contains the results of our experiments (shown in bold).6 28. The results were found to be consistently better than even the shift invariant HMT.4 34.8 30.2 29. and Bridge) were used with two choices of noise variance (σ = 10 or σ = 25.2: PSNRs in dB of images denoised with the HMT acting on diﬀerent wavelet transforms. Choi. The initial model was based on a decimated wavelet transform (DWT). Combining this code with the DTCWT we have attempted to replicate the denoising results. Lena.7 Figure 3. while the DTCWT tends to give an .7 29. and worse for the bridge image.1 33. The results in normal type are published in the literature [24] while the results in bold come from our replication of the same experiment. Choi et al have also tested the use of dual tree complex wavelets within the HMT framework [24].8 34. NDWT. Diﬀerences are partly caused by diﬀerent noise realisations.7 Lena 10 28. Romberg.6 25.3 29.9 34. Romberg and Choi have made the source code for the HMT available on the internet [103]. but repeating the experiments produces very similar SNR levels.3 29.5 20.7 Bridge 10 28.2 shows their published results for the HMT combined with either the DWT. better for the Lena image.2 28.6 29.4 25. A Markov model is used to describe the relative probability of transitions between diﬀerent states along a branch of the wavelet tree (moving from coarse to ﬁne scales).50 CHAPTER 3.5). PREVIOUS APPLICATIONS large or small).9 30. In experiments this HMT shift invariant denoising was shown to outperform a variety of other approaches including Wiener ﬁltering and the shift invariant hard thresholding method mentioned above [24].4 24. Three test images (Boats. Looking at the σ = 25. or DTCWT.2 25.
3. The magnitude is more robust to translations and hence applications like classiﬁcation and denoising (that should be shift invariant) process just the magnitude. 3.5 Conclusions Treating the coeﬃcients as complex numbers has the result of approximately decoupling the shift dependent information from the shift invariant information. The phase gives a measure of local translations and therefore permits phasebased motion estimation. . for a more complicated HMT denoising the DTCWT is signiﬁcantly better than even the NDWT equivalent. For simple thresholding denoising we see that the DTCWT achieves a similar performance to methods based on the NDWT. CONCLUSIONS 51 even larger improvement. However.5.
PREVIOUS APPLICATIONS .52 CHAPTER 3.
Section 4.1 Summary The purpose of this chapter is to describe how to use the DTCWT to generate texture features and to begin to explore the properties of these features. This chapter reviews two texture synthesis methods based on ﬁlter banks.Chapter 4 Complex wavelet texture features 4. This is of interest because it suggests how complex wavelet texture synthesis might be done and because it illustrates the problems encountered when ﬁlter banks do not have the perfect reconstruction properties of wavelets. For example.8 contains discussion of the experimental results. there are methods based on autoregressive ﬁlters or autocorrelation and histogram [10.2 Introduction There are many techniques for texture synthesis. We ﬁnd that the complex wavelets are much better at representing diagonally orientated textures.7 describes the texture synthesis algorithm and section 4. We describe texture by computing features from the energy of the DTCWT wavelet subbands. Gradient algorithms 53 . One powerful method for evaluating a choice of texture model is to estimate the model parameters for a given image and then attempt to resynthesize a similar texture based on these parameters. 4. 22]. The main original contributions of this chapter are the synthesis results that give an indication of when the DTCWT texture features will be appropriate. We adapt the ﬁrst of these methods to demonstrate the diﬀerence between using real or complex wavelets.
while the second method is of interest mainly to show the diﬃculties posed by lack of perfect reconstruction in the ﬁlters. The indirect motivation is that texture synthesis provides an interesting way to demonstrate visually the relative advantages of diﬀerent sets of texture features. Proper Bayesian inference usually requires extensive computation and is consequently extremely slow but has been done.3 Pyramidbased texture analysis/synthesis Heeger and Bergen [45] describe an automatic method for synthesizing images with a similar texture to a given example.3. COMPLEX WAVELET TEXTURE FEATURES are used to simultaneously impose the autocorrelation function and the histogram. The steps for iteration n are as follows: . They assume that textures are diﬃcult to discriminate when they produce a similar distribution of responses in a bank of orientation and spatialfrequency selective linear ﬁlters. A trivial technique which is practically very useful is to simply copy the texture from a source image (used extensively in computer games). usually by means of Markov Random Fields (often at a variety of scales) [91. 134].4 describe methods that have been used to synthesize textures based on the output of a ﬁlter bank. Other techniques include models based on reactiondiﬀusion [118.1 Method The method starts with an image x(0) of the desired size that is ﬁlled with white Gaussian noise of variance 1 (the histogram matching step will mean that the results will be the same whatever the choice of initial mean and variance). This kind of approach can be powerfully extended by a stochastic selection of appropriate sources [16].54 CHAPTER 4. 72].4. We will use (in the second half of the chapter) a technique very similar to the one described in section 4. While texture synthesis by itself is an important topic we also have an indirect motivation for studying these methods. 4. 129] frequency domain [72] or fractal techniques [40. The method makes use of the steerable pyramid transform [107] that was described in section 2.3. Our main goal in this chapter is to use the DTCWT to produce useful texture features.3. Their method synthesizes textures by matching the histograms of ﬁlter outputs.3 and 4. 4. Sections 4.
Invert the pyramid transform to generate x(n) . Make pyramids from both y(n) and the input texture. 4. the next image in the sequence. The conclusion is that this texture synthesis can be useful in a variety of images which need the replacement of large areas with stochastic textures but that the technique is inappropriate for images that need the replacement of areas with structured texture[45]. 4. More precisely. 3. 2. In order to get both the pixel and pyramid histograms to match. we generate a new image y(n) by applying a monotonically increasing transform to x(n−1) such that the histogram of y(n) matches the histogram of the input texture.2 Results and applications The algorithm is eﬀective on “stochastic” textures (like granite or sand) but does not work well on “deterministic” textures (like a tile ﬂoor or a brick wall).4.3. First the histogram is matched to the input texture. Stopping after about K = 5 iterations is suggested.3. Their algorithm extends the original with the goal that the synthesized texture becomes an image which is a combination of an original image and a synthetic texture. iterating too many times introduces artefacts due to reconstruction error [45]. PYRAMIDBASED TEXTURE ANALYSIS/SYNTHESIS 55 1. these steps are iterated K times. As the ﬁlters are not perfect. Alter the pyramid representation of y(n) in order that the histogram of each subband matches the histogram of the corresponding subband in the pyramid transform of the input texture. Igehy and Pereira [51] describe an application of this algorithm to replacing part of an image with a synthesized texture (this might be done to remove stains or scratches or other unsightly objects from an image). . the combination is controlled by a mask. The output of the algorithm is the last image x(K) . The algorithm remains the same as before. except that at each iteration the original image is composited back into the synthesized texture according to the mask using a multiresolution compositing technique that avoids blurring and aliasing [18].
4. However. y)f.4 Gabor based texture synthesis Navarro and Portilla [83] propose a synthesis method based on sampling both the power spectrum and the histogram of a textured image. Only the real part of this function is actually used.1) where θ speciﬁes the desired orientation. plus a low pass residual (LPR).56 CHAPTER 4.θ. The ﬁltering is applied to shrunk versions of the input image. This means that only one ﬁlter needs to be deﬁned for each orientation. Each ﬁlter is a separable Gabor function of the form g(x. Then the same procedure is repeated for each scale. 4 complex subimages of size (N/16) ∗ (N/16) . whose LPR power spectrum and histogram are also modiﬁed to match the original features. (4.α = exp[−πa2 x2 + y 2 ] exp[j2πf (x cos θ + y sin θ)]. The synthesis process consists of mixing 16 Gabor ﬁltered independent noise signals (whose energy and bandwidths are chosen to match the measured values) into a single image. 4 complex subimages of size (N/8) ∗ (N/8) 4.4.1 Filters The human visual system (HVS) is imitated by using a set of 4 ∗ 4 ﬁlters (four scales. Demodulation is applied to the Gabor channels after ﬁltering thus enabling a reduction in the number of samples by a factor of two in each dimension. 4 complex subimages of size (N/2) ∗ (N/2) 2. the resulting channels become complex and so the eﬀective compression ratio is 2. This means that overall a four scale decomposition of a N ∗ N image produces: 1. f gives the radial frequency and a is a weighting factor that makes the function decay to zero as you move away from the origin. 4 complex subimages of size (N/4) ∗ (N/4) 3. After ﬁltering for the four highest frequency channels the image is ﬁltered with a lowpass ﬁlter and downsampling by a factor of two in both directions. four orientations). The spectrum is sampled by measuring the energy and equivalent bandwidths of 16 Gabor channels. COMPLEX WAVELET TEXTURE FEATURES 4.
4.) The equivalent bandwidths are given by the areas of these 1D spectra. Weighting of the ﬁltered noise signals. (Both integrations act on the original 2D spectrum and produce a 1D spectrum. 1. GABOR BASED TEXTURE SYNTHESIS 57 Therefore we have gone from N 2 numbers to 4 ∗ 2 ∗ N 2 (1/4 + 1/16 + 1/64 + 1/256) ≈ 2. . The histogram of the original image is also recorded with a resolution of 16 levels. Gaussian ﬁltering of the noise signals. Finally ﬁve parameters are extracted from the low frequency DTFT coeﬃcients of the original image.4. The lowest frequencies are extracted separately using the Discrete Time Fourier Transform (DTFT).2 Extracted features The energy in each Gabor channel is computed to give information about the main directions in the texture. It is calculated by lowpass ﬁltering and subsampling the full histogram.4. These parameters are not particularly crucial but are reported to be necessary for a complete visual description of many real textured images.4.4. Modulation of the weighted ﬁltered noise signals to produce the synthetic channels. 4. Noise Generation. and the ﬁnal two are averages for low oblique frequencies. 4. This conversion is achieved by integrating the power spectrum along the two frequency axes. To calculate the equivalent bandwidths they ﬁrst calculate the 2D power spectrum using the DTFT and then convert this into a pair of 1D normalized power spectra.66N 2 coeﬃcients. Navarro et al report that the degree of spectral spreading of the spectrum at diﬀerent spectral locations is an essential feature to characterize the texture. 2. and dividing the results by their respective maxima. One is the DC component. Therefore for each channel they compute the equivalent bandwidths along the u and v frequency axes. two are the averages of the modulus of the DTFT along the two frequency axes for low frequencies. 3.3 Method The synthesis procedure consists of seven stages.
Weighting It is desired to weight the signals so that when the Gabor ﬁlters are applied to the synthesized texture. This task is made hard due to the overlap between channels. Gaussian ﬁltering The noise signals are then convolved with separable elliptical Gaussian masks to provide them with an elliptical Gaussian spectral shape.q) synthetic channel is: gspq (x.58 CHAPTER 4. 6. the resulting equivalent bandwidths of its Gabor channels have equal values to those measured in the input image.2) The factors bupq and bvpq are chosen using an approximate formula so that when the Gabor ﬁltering scheme is applied to the synthetic texture. The ﬁlter function for the (p. The inverse of this matrix can be precomputed and the appropriate weights are given by multiplying this inverse matrix by the vector of the measured energies. Since the synthetic channels are statistically independent the energy of a sum is equal to the sum of the energies and so it is possible to calculate a matrix which describes the eﬀect when the channels are added together. y) = bupq bvpq exp[−π(b2 pq x2 + b2pq y 2)] u v (4. When the matrix is multiplied by a vector of energies in the synthetic channels the resulting vector contains the energies that would be observed using the Gabor ﬁltering. Adjustment of the histogram. COMPLEX WAVELET TEXTURE FEATURES 5. Merging the synthetic channels. Equalization of the LPR frequencies. 7. Noise generation Sixteen independent signals of complex white Gaussian noise are generated. The exact computation is hard due to the overlapping between channels but it is reported that the approximate scheme does not signiﬁcantly aﬀect the visual quality of the results. the resulting energies will be equal to those measured in the original image. .
keeping the phase unchanged. Errors occur because the frequency content of each channel is always shifted to the central location. The resulting square of the spectral moduli is imposed on the lowest frequencies of the synthetic mix obtained before. The expansion is performed by upsampling followed by a lowpass ﬁlter.4.4. Merging A pyramid structure is then used to combine the synthetic channels. expanding the image by a factor of two in both spatial dimensions. This means summing the four lowest resolution synthetic channels. Equalization The equalization is done in the frequency domain. GABOR BASED TEXTURE SYNTHESIS 59 Modulation The signals are then expanded by a factor of two in both spatial dimensions and modulated by the appropriate central frequency. The bandwidths of the channels are not so closely matched but it is reported that the consequences of these inaccuracies are not signiﬁcant compared to the inaccuracies due to the limitations of the texture model. Then a standard histogram matching algorithm is used to modify the histogram of the synthesized texture to match the decompressed version of the original one. Histogram matching The compressed original histogram is decompressed to its former size by expanding and lowpass ﬁltering it. the ﬁve average values of the LPR frequency moduli obtained at the feature extraction stage are decompressed. 4. This means that a texture with a welldeﬁned orientation in the original .4. by merely replicating them in their respective spectral areas. and so on.4 Results The method described above achieves a good match in the histogram. until the highest frequency channels are added. and then adding the result to the four synthetic channels of the next resolution. LPR channel modulus and channel energies. First.
b. In other words. We deﬁne a texture feature fb. We will also compare the results when we augment our feature set with the values of a histogram of the original image.s = k wk. 4.6 Texture features We now propose a simple set of texture features based on the DTCWT and then test these features by synthesis experiments.60 CHAPTER 4. the wavelet texture features are given by the energies of the subbands.b. This method is fast and can be generalised for alternative ﬁlters but only approximately achieves the goal of producing matching feature values. The Gabor texture synthesis attempts to model the interaction between signals inserted in diﬀerent subbands. COMPLEX WAVELET TEXTURE FEATURES image will only be well reproduced if the orientation is one of the four orientations of the Gabor ﬁlters. The principle diﬀerences are that we use the invertible complex wavelet transform and that in the matching stage we match the energy of the subbands rather than their histograms. 4.7 Algorithm The structure of the algorithm is very similar to the method invented by Heeger and Bergen that was explained in section 4. .5 Discussion We have described two texture synthesis algorithms for producing synthetic textures with certain feature values. Let wk. The histogram will be calculated using 256 equally spaced bins across the range of intensity values in the image. The pyramidbased texture synthesis method achieves a better match for the feature values but is iterative and only works for transforms that can be inverted.3. 4.s2 .s be the k th wavelet coeﬃcient in subband b at scale s.s for each subband as fb. or at least have a reasonable approximate inversion.
Scale the contents of each noise subband (those in decomposition A) so that the resulting energy is equal to the corresponding energy for the texture subbands (those in decomposition B). In other words. Invert the complex wavelet transform of decomposition A to produce the next approximation x(n) to the synthesized texture. The output of the method is the image x(K) . The synthesis starts with an image x(0) of the desired size that contains white Gaussian noise of mean 0 and variance 1. ALGORITHM 61 We compare texture synthesis using just the wavelet texture features (called energy synthesis) with texture synthesis using the augmented feature set (called histogram/energy synthesis). 2. Once the lookup tables have been constructed. The input to the algorithm is an example texture.4. . each pixel in the noise image is transformed once by the ﬁrst lookup table to get its approximate rank. Match the histogram of x(n−1) to the input texture. generate a new image y(n) by applying a monotonically increasing function to x(n−1) in order that the histogram of the new image matches the histogram of the input texture. The algorithm for energy synthesis is identical except that in step 1 y(n) is a direct copy of x(n−1) . These steps are then iterated K times where K is a positive integer. The texture features are measured for this texture and then a new texture is synthesized from these features. Use the complex wavelet transform to generate a multiresolution decomposition for both y(n) (decomposition A) and the example texture image (decomposition B). 4. The steps for iteration n are 1. If the original energy is EA and the desired energy is EB then the correct scaling factor is EB /EA .7. 3. and then by the second lookup table to discover the intensity in the example image that should have that rank. The ﬁrst lookup table is computed from the cumulative histogram of the noise image and eﬀectively gives a transform from pixel intensity to rank order. The second lookup table is computed once for all iterations and is the inverse to the intensity to rank transform for the example texture. Histogram matching is a relatively quick operation as it can be computed by means of two lookup tables. First we describe the algorithm for histogram/energy synthesis.
The same experiment was repeated for energy synthesis and the results are shown in ﬁgure 4. All the textures other than the ones in ﬁgure 4. The method used was to match histograms in the image domain and energy in the transform domain (the histogram/energy method). These images seem fairly well synthesized. The images are of grass.4 are taken from the Brodatz set and are all 128 by 128 pixels. . COMPLEX WAVELET TEXTURE FEATURES 4. The results appear just as good as for histogram/energy synthesis. 5 level transforms and K = 3 iterations are used in the algorithms.2.1.1: Results of using histogram/energy synthesis First some good results are shown in ﬁgure 4. Original sand texture Hist/Energy method Original grass texture Hist/Energy method Original water texture Hist/Energy method Figure 4.62 CHAPTER 4.8 Results This section contains a selection of textures synthesized by the algorithm. The original textures are on the left and the synthesized textures are on the right. sand and water.
4.2: Results of using energy synthesis .8. RESULTS 63 Original sand texture Synthesized sand texture Original grass texture Synthesized grass texture Original water texture Synthesized water texture Figure 4.
The energy in each subband was measured just before the subband rescaling. histogram/energy synthesis. We tested energy synthesis.3: Results of using histogram/energy synthesis on a wood grain texture. However. These measured energies are plotted against iteration number with one plot for each of the subbands. has a strong vertical orientation. COMPLEX WAVELET TEXTURE FEATURES Figure 4. Although the texture synthesized with histogram/energy synthesis does seem to be biased towards the vertical.4 shows an example of a texture where no performance is entirely satisfactory. and a version of histogram/energy synthesis based on the DWT. for this texture the DTCWT possesses a clear superiority over a comparable algorithm based on a normal Discrete Wavelet Transform (DWT) as the DWT features cannot discriminate between energy near +45◦ from energy near −45◦ . Figure 4. The texture consists of many diagonal blobs of the same intensity. The original texture Original Texture Synthesized Texture Figure 4. it looks very diﬀerent because the strong orientation has been lost.64 CHAPTER 4. There is also a horizontal line in each plot corresponding to the target energy value for the corresponding subband. The target texture in this case was the the water texture (bottom left plot of ﬁgure 4.3 shows an example where the results are not as good. Energy synthesis is a very bad choice for this texture as it makes the synthesized image much less piecewise smooth than the original and this diﬀerence is easily perceived. For every subband the energies rapidly converge to the target values. . Figure 4. For energy synthesis the convergence is even more rapid. Histogram/energy synthesis partly captures the diagonal stripes in the texture but the variation in direction and size of the stripes again give rise to a quite noticeable diﬀerence.5 demonstrates the good convergence of the histogram/energy synthesis algorithm.1).
The improvement of the DTCWT relative to the DWT is seen when the texture has diagonal components. and histogram/energy synthesis is best. the DWT synthesis is worst.4: Results of using diﬀerent methods on a strongly diagonal texture 4. The improvement of histogram/energy synthesis is seen when the histogram of the original texture has strong peaks. the diﬀerences are only noticeable in certain cases. However.4. then energy synthesis gives reasonable performance.9 Conclusions The methods are not able to adequately synthesize images with either strong directional components or with a regular pattern of elements but for textures without such problems there is a clear order of performance. CONCLUSIONS 65 Original Texture Histogram/Energy synthesis Energy synthesis DWT Histogram/Energy synthesis Figure 4.9. This occurs if the image contains regions where the intensity is constant. The DTCWT can separate features near 45◦ from those near −45◦ while the DWT combines such features. .
Level 4 5000 0 0 0 0 Figure 4. . Level 3 5000 Subband 1. Level 3 5000 0 Subband 3. Level 1 5000 0 Subband 2. Level 4 0 Subband 2. Level 1 5000 0 Subband 4.66 CHAPTER 4. Level 2 5000 0 Subband 6. Level 1 5000 0 Subband 6. Level 2 5000 0 Subband 3. Level 3 5000 0 Subband 4. Level 4 5000 0 Subband 4. Level 2 5000 Subband 1. Level 2 5000 0 Subband 4. Horizontal lines represent the target energy values. Level 3 5000 0 Subband 2. Level 1 5000 0 Subband 5. Level 3 5000 0 Subband 5. Level 1 5000 0 Subband 3. Level 3 5000 0 Subband 6. Level 4 5000 0 Subband 6. Level 2 5000 0 Subband 2. COMPLEX WAVELET TEXTURE FEATURES Subband 1.5: Energy before rescaling for diﬀerent subbands during the histogram/energy synthesis algorithm. Level 4 5000 0 Subband 5. Level 2 5000 0 Subband 5. Level 1 5000 5000 Subband 1. Level 4 5000 0 Subband 3.
1 Summary The purpose of this chapter is to explore the performance of the DTCWT features for nonBayesian texture segmentation. We explain the reason for the power of the DTCWT in terms of its directionality and aliasing properties. Finally we show how simple multiscale techniques yield a fast algorithm based on the DTCWT with even better results. The algorithms in this chapter are simple extensions to existing methods. In a recent comparison of features for texture classiﬁcation [101] no clear winner was found. The original scheme used a statistical classiﬁer that required signiﬁcant quantities of reliable training data. The chapter is of interest because it demonstrates that complex wavelets are easy to use in existing algorithms and because the experimental results suggest that the DTCWT provides a powerful feature set for segmentation. 67 . but the fast methods tended to give worse performance and it was concluded that it was important to search for fast eﬀective features. This is unreasonable in practice and we propose an alternative simpler method. The original contribution of this chapter is principally the experimental comparison of DTCWT features with other features.Chapter 5 Texture segmentation 5. This simple method gives poor experimental performance for the DWT but is reasonable for the NDWT and when used with the DTCWT the results are better than any one of the schemes used in the original classiﬁcation.
and a mosaic made up from sections of these textures. Randen and Husøy note that it is important to have disjoint test and training data.68 CHAPTER 5. Section 5. The problem is to assign each pixel in the mosaic to the correct texture class. An example of this is work by Kam and Fitzgerald who have produced results of using DTCWT features for unsupervised segmentation [59]. This task is called supervised segmentation because of the availability of examples from each texture class. Section 5. Recently Randen and Husøy [101] performed an extensive comparison of diﬀerent texture feature sets within a common framework. The input to the algorithm is a set of training images each containing a single texture.2 Introduction Texture segmentation has been studied intensively and many features have been proposed. The comparison was performed on a supervised texture segmentation task. This means that the textures present in the mosaic are new examples of the same texture as the training data. Their study concluded that there was no clear winner among the features (although some feature sets. Now consider a test image whose every pixel intensity is an identical medium brightness. This may seem very similar but suppose that one class contains very bright images. TEXTURE SEGMENTATION 5. decimated wavelet coeﬃcients for example. were clear losers) and that the best choice depended on the application.3. The supervised method may have a 50% chance of classifying the pixels correctly while the unsupervised method will classify every pixel into the same class and hence be deemed to achieve a 100% accuracy.4 describes how we simplify the training stage of the algorithm to give a more practically useful scheme. A supervised segmentation is considered perfect when every pixel is classiﬁed into the right class while an unsupervised segmentation is considered perfect when the boundaries are in the correct positions. They also claimed that computational complexity was one of the biggest problems with the more successful approaches and that therefore research into eﬃcient and powerful classiﬁcation would be very useful. rather than being direct copies of a portion of the training texture. Unsupervised segmentation attempts to solve the same problem without these examples.6 contains the . First the original method used in the comparison is described in section 5. Section 5. while the other contains very dark images.5 explains in detail the method tested. Care must be taken when comparing results of supervised and unsupervised techniques because they are actually attempting slightly diﬀerent tasks.
5. Smooth the resulting features. Filter the input image. 4. This scheme results in a classiﬁer that chooses the closest class where the distance is deﬁned as the standard Euclidean metric to the class centre. Diﬀerent ﬁlters result in diﬀerent feature sets. The main steps in the method are: 1. However. Ring and Wedge ﬁlters. In section 5. Square the ﬁlter outputs. alternative classiﬁcation schemes such as multiscale classiﬁcation can give faster and better results. ORIGINAL CLASSIFICATION METHOD 69 results of the proposed method and also the published results using the more advanced training scheme. Laws ﬁlter masks.8 discuss the reasons for the relative performance.3.3 Original Classiﬁcation Method This section describes the method used in the comparison paper[101] and brief details of the features tested. This scheme requires time consuming training in order to select class centres that give good classiﬁcation results on the training data. Classify each pixel to the closest class.9 we propose and test a multiscale segmentation algorithm in order to show that the beneﬁts of the DTCWT features are retained for these more powerful techniques. 5. Amongst the ﬁlters examined were: 1. 2. The classiﬁcation method used was “Type One Learning Vector Quantization” (LVQ) [66]. 2. The comparison used pixel by pixel classiﬁcation in an attempt to isolate the eﬀect of the features. . We will discuss the eﬀect of this choice later in section 5. The principal diﬀerence between the compared methods was in the choice of the ﬁlters used in the ﬁrst step. 5.8. The features were smoothed in step 3 using a Gaussian lowpass ﬁlter with σs = 8. 3. Sections 5.7 and 5. Take the logarithm of the smoothed features.
2. Optimized representation Gabor ﬁlter bank. correlation. 5. Optimized FIR ﬁlter bank Randen and Husøy also tried a few nonﬁltering approaches including: 1. TEXTURE SEGMENTATION 3. Training neural networks using back propagation and median ﬁltering the resulting classiﬁcation. 8. packets. AR modelbased features. 3. Eigenﬁlters derived from autocorrelation functions. Statistical features (angular second moment. This is clearly very quick to compute and only a small amount of training data will be needed to obtain a reasonable estimate. The training data is repeatedly classiﬁed and the parts incorrectly classiﬁed are used to update the class centres to give better results. Wavelet transform. Discrete Cosine Transform.70 CHAPTER 5. Nondyadic Gabor ﬁlter bank.4 Training simpliﬁcation The LVQ training method used by Randen and Husøy results in a simple classiﬁcation scheme but makes use of the training data to improve the classiﬁcation accuracy. 5. 11. 9. 6. Dyadic Gabor ﬁlter bank. This is a very slow procedure and experiments suggested that it gave only slightly improved results. and frames based on the Daubechies family of wavelets. 7. and entropy). 4. Prediction error ﬁlter. In the experiments we have instead used a simpler system that simply sets the class centres to be the average over the training data of the feature vectors. 10. Quadrature Mirror Filters. contrast. .
. .2) Finally a feature vector f(x. y) = u=−K v=−K h(u. The class centres µc are deﬁned as µc = 1 NM M −1 N −1 (5. as is the case here. K}. As the transform is nondecimated each of these subbands will be the same size as the original image. For x ∈ {0. y ∈ {0. . y) x=0 y=0 (5. y) be the feature vectors calculated f (c) (x. N − 1} as fs (x. . y − u)2 (5.1) where σs controls the amount of smoothing and K sets the point at which we truncate the ﬁlter. . . We use a value of K = 24. We deﬁne a smoothing ﬁlter h(x. For all other values of x and y we deﬁne X(x. . . . the change will not make the results better. . . y) for s = 1. Suppose that we wish to calculate the feature vectors for an image of size M × N. Let Ws (x. M − 1}. . y) to represent the image. We use X(x. y) = 0 (this is called zero padding). v)Ws(x − u. y) is deﬁned as the value of the pixel at position x. y) ∈ R S is deﬁned for x ∈ {0. 2. N − 1} X(x. S be the subbands produced by the NDWT acting on X(x.5. . K K Ws (x. y) = log(Ws (x. . . y) = 1 x2 + y 2 exp − 2 2 2πσs 2σs (5. . . . .5. . y) are produced by convolving the rectiﬁed subbands with this smoothing ﬁlter.4) . . . As mentioned before a value of σs = 8 was used by Randen and Husøy and we choose the same value.5 Detailed description of method We describe the method for the NDWT and then explain the necessary modiﬁcations for the DWT and the DTCWT. . y) for x ∈ {−K. y in the image. 5. . . y)) from the training image for class c. . M − 1}. .3) Now suppose that there are C classes and let f (c) (x. DETAILED DESCRIPTION OF METHOD 71 The ﬁrst goal of this chapter is to demonstrate the superiority of the DTCWT features and so we are allowed to alter the comparison technique only if. . y ∈ {0. We ﬁrst describe the generation of the feature vector and then the classiﬁcation scheme. y). K} as h(x. y ∈ {−K. Smoothed subbands Ws (x.
A better interpolation might improve the performance . y as belonging to closest class r(x. N − 1}.7) where z represents the largest integer not greater than z. Note that this expansion of the DWT is not equivalent to the NDWT. Let Ps (x. .usually some form of interpolation such as lowpass ﬁltering is performed during expansion operations. y) by Ws (x. y) (5.C} dc (x. 2. In fact. These are not very compelling reasons. it is quite likely that an alternative expansion will improve the performance. . . Calculate the feature vectors f (x. The results of our experiments indicate that even this crude expansion allows the DTCWT to perform better than the alternative features. . We deﬁne the expanded subband Ws (x. y) is equal to the value of the wavelet coeﬃcient (of subband k) that is closest to the location x.. y): dc (x. .5) 3. This expansion means that the value of Ws (x. In order to apply the same method we ﬁrst expand the subbands until they are of size M × N. The rest of the method is identical. y) r(x. The expanded DWT subbands at level k are piecewise constant on squares of size 2k × 2k while the NDWT subbands have no such restriction.6) For decimated transforms the subbands are reduced in size. . . . C} calculate the class distances dc (x. M − 1}. y) = Ps ( y x .72 CHAPTER 5. y) for the test image. k ) k 2 2 (5. This may seem a strange way of expanding the subbands . y ∈ {0. The main reason we use this very simple expansion is in order to provide the fairest comparison with the original experiments. . 2. . c ∈ {1.. y) be a subband at level k of size M/2k × N/2k . The feature values will be smoothed in subsequent processing. Classify each pixel at position x. . y.. y) = f (x. For x ∈ {0. TEXTURE SEGMENTATION The classiﬁcation scheme has the following steps 1. . y) = argminc∈{1. The subbands at level k are only of size M/2k ×N/2k . y) − µc 2 (5. . Two possible justiﬁcations for this approach are: 1.
2 contains the results of using the diﬀerent features for the diﬀerent mosaics. the centre bar shows the error for the NDWT features. diﬀerent texture features performed best on diﬀerent textures. Table 5.1 shows the diﬀerent mosaics. In the published comparison there was no clear winner. and the highpass ﬁlter has 11 taps. We use 4 levels of decomposition of the biorthogonal (6. Randen and Husøy compared other wavelet types but found that the main diﬀerence was between decimated and nondecimated wavelets rather than the ﬁlters used. In each cluster of bars the left bar shows the error for the DWT features.3 reveals that for almost all experiments the DTCWT does better than the NDWT. random classiﬁcation would get 1 in C pixels correct and would have an expected error of 1 − 1/C. The classiﬁcation errors for every mosaic and every feature set were published and we summarise this information in two ways. All errors are plotted as percentages.6. Each plot is dedicated to a mosaic with a particular number of textures. and the right bar shows the error for the DTCWT features.6 Experiments The classiﬁcation was tested on the same twelve test images and sets of training data as used in the original comparison [101]. The ﬁgure contains 4 bar chart plots comparing the diﬀerent feature sets.5. The second measure is designed to be fairer but is slightly more complicated. For a C class experiment. Inspecting the results in ﬁgure 5. and the NDWT does better than the DWT. These results are plotted in ﬁgure 5. This measure is called the mean error rate. 5. The ﬁrst measure we extract is the average performance for each feature set averaged over all mosaics. The problem with this approach is that the average may be dominated by the performance on the mosaics with a large number of classes (as these will have the largest errors). We tested features generated from the DWT. The error in an image is deﬁned as the proportion of incorrectly classiﬁed pixels. EXPERIMENTS 73 of the DTCWT method but could also provoke the criticism that the performance gain is caused merely by the additional smoothing rather than the choice of wavelet transform. Within each plot there is one cluster of bars for each of the mosaics. For each . NDWT. The lowpass ﬁlter has 17 taps. and DTCWT.8) ﬁlters in the DWT and NDWT [30]. The error therefore varies between 0 for a perfect classiﬁcation and 1 for a completely wrong classiﬁcation. Figure 5.3.
74 CHAPTER 5.1: Mosaics tested . TEXTURE SEGMENTATION j k l a b c d e f g h i Figure 5.
4 1.6 1.9 25.8 30.3 24.3 33.1 0.7 50. Undecimated 16 tap quadrature mirror ﬁlters (QMF) 9.5 31 26. Gabor ﬁlter bank 5.8 16.1 9.9 9.2 35. the following methods from the published comparison are used in the ranking: 1. Undecimated Daubechies 4 wavelet 8.4 19.6 0. and then we average these ranks over all 12 mosaics.4 39 44.9 12. In addition to the three new methods described above.1 47. Ring/wedge ﬁlters 3. Critically decimated Daubechies 4 wavelet 7.2: Comparison of segmentation results for diﬀerent transforms image we rank the diﬀerent methods according to their performance (with a rank of 1 given to the best). DCT 6.3 28. Laws ﬁlters 2.3 Figure 5.6 22 20. EXPERIMENTS 75 Mosaic a b c d e f g h i j k l DWT error *100% 12. Dyadic Gabor ﬁlter bank 4.6 17.6.9 21.8 40.2 DTCWT error *100% 10.6 12.3 38.7 43. Cooccurrence features .9 NDWT error *100% 11.2 16.5 28.5.3 28.7 10.
NDWT.DTCWT) .76 CHAPTER 5.3: Percentage errors for (DWT. TEXTURE SEGMENTATION Two textures 40 20 0 DWT NDWT DT−CWT 40 20 0 Five textures j k l a b c d e Ten textures 40 20 0 40 20 0 Sixteen textures h i f g Figure 5.
This eﬀectively contributes an extra noise source to the feature values and hence increases the classiﬁcation error.4 tabulates the two measures of performance1 . the DTCWT features with an average error of 18% outperform all the other methods despite the simpler training. Prediction error ﬁlter 13. For a shift dependent transform (the DWT for example) a translation causes the energy to be redistributed between subbands due to aliasing. Nevertheless.4 shows that the NDWT features with simple training give a mean error of 23. Back propagation neural network We have omitted some of the badly performing methods and have taken just the best of families of methods (such as all the diﬀerent choices of wavelet transform). There are 6 subbands 1 The mean error rates were used as a measure of performance in the comparison [101]. Optimized FIR ﬁlter 14. However. Eigenﬁlter 12. This gives a total of 17 methods compared in the ranking. the values given here do not quite agree because there was an error in the published results for one mosaic.5. Table 5. Autoregressive features 11. The results in bold are original while the others are taken from the published study [101]. 5.8%. The biorthogonal ﬁlters we use in the NDWT are similar to the quadrature mirror ﬁlters used in the comparison and therefore it seems that the much simpler training method results in only a small decrease in performance. The ﬁrst is the increased directionality of the DTCWT. The values given here are based on the corrected experimental results [100]. . DISCUSSION OF RELATIVE PERFORMANCE 77 10. Table 5.7. The next best features are the nondecimated QMF ﬁlters while the worst results are given by the neural network classiﬁer.7 Discussion of relative performance The features are calculated from the energy in the wavelet subbands. All the published results make use of the LVQ training algorithm.9% while the nondecimated QMF ﬁlters give a mean error of 20. There are two main reasons for the improved DTCWT performance compared to the NDWT.
4 57.08 17 9.8 29.3 30. TEXTURE SEGMENTATION Method Laws Ring/wedge Dyadic Gabor ﬁlter bank Gabor ﬁlter bank DCT Critically decimated Daubechies 4 Nondecimated Daubechies 4 Nondecimated 16 tap QMF Cooccurrence Autoregressive Eigenﬁlter Prediction error ﬁlter Optimized FIR ﬁlter Backpropagation neural network DWT with simple training NDWT with simple training DTCWT with simple training mean error *100% 28.3 26.25 13.5 9.25 6.7 7.75 10.2 29.6 27.2 29.42 11.8 23.3 3.5 8 10.9 18.5 8.17 7.3 10.3 26.75 Figure 5.8 31.17 2.5 24.3 26.4: Performance measure for diﬀerent methods .0 Average rank 9.1 26.5 8.9 35.78 CHAPTER 5.7 20.
6 (the imaginary part of the complex wavelet coeﬃcients is plotted with a dashed line).5. It occurs .5%.8 −1 80 100 120 140 160 180 Figure 5.6 −0. Consider the sine wave shown in ﬁgure 5. These coeﬃcients are shown in ﬁgure 5. We compute the scale 4 highpass coeﬃcients for both the nondecimated real wavelet transform and a nondecimated version of the complex wavelet transform. Figure 5.4 0.5. In contrast the magnitude of the DTCWT coeﬃcients is expected to be fairly steady and we would expect much less rectiﬁcation noise. For coarse scales it is possible that the lowpass ﬁlter does not have a narrow enough bandwidth to fully smooth these bumps and so some residual rectiﬁcation noise may remain. We have chosen its frequency so that most of its energy is 1 0. The NDWT highpass subbands will contain slowly oscillating values.7 shows the rectiﬁed values (before smoothing) that are calculated by squaring the transform coeﬃcients.6 0. Rectiﬁcation will convert these to a series of bumps which are ﬁnally smoothed.2 −0. The second reason relates to the smoothing step. It is certainly plausible that the extra features from these extra subbands should allow better classiﬁcation.7.8 0.5: Sine wave input contained in the scale 4 highpass subband. The very low variation of the complex wavelet coeﬃcients is no accident.2 0 −0. In 1D it is quite easy to see the eﬀect of the aliasing and rectiﬁcation problems. The rectiﬁed real wavelet values show a strong oscillation of 100% of the average value while the rectiﬁed complex wavelet values have only a small variation of about 2. DISCUSSION OF RELATIVE PERFORMANCE 79 at each scale rather than 3 and the DTCWT is able to distinguish between features near 45◦ from those near −45◦ .4 −0.
6 −0.7 0.6 0.2 0.4 0.8 −0.5 0.2 0.2 0 0 −0.7 Complex wavelet 0.7: Rectiﬁed nondecimated scale 4 wavelet coeﬃcients .6 0. TEXTURE SEGMENTATION Real wavelet 1 1 Complex wavelet 0.4 −0.4 0.1 0 80 100 120 140 160 180 0 80 100 120 140 160 180 Figure 5.80 CHAPTER 5.4 0.4 −0.8 0.6: Nondecimated scale 4 wavelet coeﬃcients Real wavelet 0.8 0.2 0.5 0.6 0.6 0.3 0.2 −0.4 0.2 −0.8 −1 80 100 120 140 160 180 −1 80 100 120 140 160 180 Figure 5.1 0.3 0.6 −0.
It is natural to ask about the relative signiﬁcance of these eﬀects. If the linear ﬁlter had zero response for negative frequencies then (assuming ω > 0) the output would be simply y(t) = A exp {jωt} 2 (5. 45◦ . The 15◦ . A sine wave can be represented as the combination of two complex exponentials. 4 (5. These graphs have been plotted for scale 4 coeﬃcients. For σs = 8 the smoothing ﬁlter has a half peak width of σs 2 2 log 2 ≈ 19.8) Standard ﬁlter theory says that the output y(t) of a linear ﬁlter applied to this signal will be y(t) = B A exp {jωt} + exp {−jωt} 2 2 (5.e. the DWT and the DTCWT). To answer this question we performed two further experiments both using a cut down version of the DTCWT: HalfCWT In the ﬁrst experiment we halved the size of the feature vector by combining the energy in pairs of subbands. We have advanced two eﬀects (directionality and rectiﬁcation noise) to explain the relative performance.5. There is a huge variation in the DWT rectiﬁed outputs while the DTCWT outputs are almost √ constant.10) and hence the rectiﬁed signal would be constant and equal to y(t)2 = A2 .9) where A is the response of the ﬁlter to frequency ω and B is the response for −ω. one representing a positive frequency component. Figure 5. DISCUSSION OF RELATIVE PERFORMANCE 81 because the complex wavelets have been designed to diﬀerentiate between positive and negative frequencies. This should be about suﬃcient to remove the variation for the NDWT but is clearly insuﬃcient for the DWT. These plots correspond to the rectiﬁed outputs for the decimated transforms (i. At ﬁner scales the coeﬃcients will oscillate faster and we would therefore expect less rectiﬁcation noise. and one representing the negative frequency component: cos(ωt) = exp {jωt} + exp {−jωt} 2 (5.11) The low variation for the DTCWT means we can aﬀord to decimate the output.7. and 75◦ subbands were paired respectively .8 plots every 16th sample from the nondecimated outputs.
2 0. This reduced the transform to only distinguishing 3 directions. y)) (5.5 0.7 Complex wavelet 0. equation 5. y) + Wb (x.6 0. and −75◦ subbands. RealCWT In the second experiment we set the imaginary part of all wavelet coeﬃcients to zero before rectiﬁcation. We conclude that rectiﬁcation noise is not too signiﬁcant because the results for the RealCWT are similar to those for the DTCWT.10 (compared to the original DTCWT results).1 0. the results for the HalfCWT are signiﬁcantly worse.1 0 80 100 120 140 160 180 0 80 100 120 140 160 180 Figure 5.3 0. The results for these new experiments are in table 5.12) where a and b are the two subbands that are combined to give feature s. Note that these two modiﬁcations are intended to be harmful to the performance of the method and such transforms should never be used for a real application.7 0.3 0.2 0. .4 0.6 0. However. like real wavelet transforms. TEXTURE SEGMENTATION Real wavelet 0.9 and shown in ﬁgure 5. −45◦ . y) = log(Wa (x.5 0. This should introduce rectiﬁcation noise to the features.4 0. demonstrating that the main reason for the improved DTCWT performance is its improved directional ﬁltering.3 was altered to fs (x.8: Rectiﬁed decimated scale 4 wavelet coeﬃcients with the −15◦ . More precisely.82 CHAPTER 5.
7 11.11.5 17. In order to resolve the position of the bound .2 37. This is not surprising because the feature values near the borders will be unreliable for two reasons: 1.7 39.9 21.6 RealCWT error *100% 10. Near the border the smoothing ﬁlter will be averaging rectiﬁed coeﬃcients from both sides of the border to produce some intermediate value. The output of the DTCWT method for this mosaic is shown in ﬁgure 5.9 45.1 20.9: Comparison of segmentation results for altered DTCWT 5. These two defects in the classiﬁcation are closely related to the size of the smoothing ﬁlter and place contradictory requirements on its size.5 17 17. The original is shown on the left. Notice that there is usually a border of misclassiﬁed pixels around each segment.8. For example on the 16 texture mosaic “f” there is still a 34% error.9 5. 2.8 Figure 5.5 36.2 23.1 8. DISCUSSION OF DTCWT PERFORMANCE 83 Mosaic a b c d e f g h i j k l HalfCWT error *100% 11 25. Each colour represents a diﬀerent class.9 30 1 1.7 21. Notice also that there are often fairly small groups of pixels assigned to some class.6 22. Near the border the impulse response for the coarser wavelets will straddle both sides and hence be unreliable partly because the value will average the response from both textures and partly because there will often be discontinuities at the border giving extra energy to the highpass coeﬃcients. it is not perfect.8 33.3 41. the classiﬁcation results on the right.8 Discussion of DTCWT performance Although the DTCWT gives better results than the other methods.1 0.5.
RealCWT.84 CHAPTER 5. TEXTURE SEGMENTATION Two textures 40 20 0 HalfCWT RealCWT DT−CWT 40 20 0 Five textures j k l a b c d e Ten textures 40 20 0 40 20 0 Sixteen textures h i f g Figure 5.10: Percentage errors for (HalfCWT.DTCWT) .
5. but in order to accurately determine the class the smoothing ﬁlter should be large to give reliable feature estimates. The basic concept is to use additional information about the nature of segments. However. or we may know that the segments should have smooth boundaries. The ﬁnal sections of this chapter test this prediction for a multiscale segmentation method. There are several methods that address the issue. The author has also used the DTCWT for implementing a level set version of an active contour model [34]. we may not expect to see very small separate segments. Active contour models [60] are useful for encoding information about the smoothness of boundaries while multiscale methods [128] are useful for describing expectations about the spatial extent of segments. DISCUSSION OF DTCWT PERFORMANCE 85 Figure 5. We expect the advantages gained by these methods to be complementary to the advantages of the DTCWT feature set. Diﬀerent methods make use of diﬀerent types of information. Methods also diﬀer in whether the information is explicitly contained in a Bayesian image model (such as Markov Random Field approaches [9]) or just implicitly used [128]. . in the level set paper the emphasis is on using the DTCWT to describe contour shape rather than the texture. This is a wellknown problem in image segmentation [127]. For example.11: Segmentation results for mosaic “f” using the DTCWT aries accurately the smoothing ﬁlter should be small.8.
. . where qs. L)) (L) (5. TEXTURE SEGMENTATION 5.k (x. Let Ws.9. y. 3. 3. During the reﬁnement the classiﬁcation reduces the importance given to the wavelet coeﬃcients that will be signiﬁcantly aﬀected by the boundary.1} b∈{0.9. 6} (for subband) and k ∈ {1. 4} (scale 1 being the most detailed scale.13) . Section 5. This can be thought of as an adaptive smoothing ﬁlter whose size depends on the estimate of boundary position. l). Care must be taken when applying such methods in the real world to ensure that appropriate assumptions are made. 5. The features are deﬁned from the quadtrees by: fs.k (x. 2. 5. It is useful to index this scale L feature set f (L) (x. To deﬁne these sets it is useful to distinguish between the wavelet subbands at diﬀerent scales. y) with indices s ∈ {1.1 Texture features for multiscale segmentation Diﬀerent texture feature sets are used during the multiscale algorithm. The corresponding feature set for position x. l − 1) l > k Suppose the segmentation algorithm is operating at scale L. y)2 1 4 a∈{0.1 deﬁnes the features used and 5.1} qs. .k (2x l=k + a. This method implicitly assumes that regions will have a reasonably large spatial extent and are unlikely to contain small interior regions of alternative textures.86 CHAPTER 5. y is denoted f (L) (x. The basic idea is to ﬁrst calculate an approximate segmentation at a coarse scale.9. These assumptions are true for the mosaics tested and consequently the method works well. k ≤ l. We form a quadtree from each subband denoted qs. 4. . 2y + b. 3. This multiscale segmentation is also faster than the pixel by pixel segmentation described above because the size of the feature set decreases for the more detailed levels. y. y) = (L + 1 − k) log (qs. 5. 6}) at scale k ∈ {1. y). 2.k (x.k (x.2 describes the multiscale classiﬁcation procedure. y. y) be the sth subband (s ∈ {1.9 Multiscale Segmentation We extend a previous multiscale segmentation method [128] to use the DTCWT features.k (x. scale 4 the coarsest that we shall use). 2. 2. and then to reﬁne the estimate of the boundary location at more detailed scales. l) = Ws (x. 4. For an initial image of size N × M the complex subbands at scale k will be of size N/2k × M/2k for the DTCWT. L} (for feature scale).
. For classiﬁcation at the coarsest scale bc (x. M/2L − 1}.2 Multiscale classiﬁcation (L) Once the test image feature vectors fs. . . y/2 ) = c (5. . C} the features are calculated for the corresponding training image and used to calculate feature means µc. MULTISCALE SEGMENTATION 87 Notice the scaling factor L + 1 − k. y) have been calculated the image is ﬁrst classiﬁed at the coarsest level and then the segmentation is reﬁned by reclassifying at more detailed scales.. y/2 ) = c 0 r (L+1) ( x/2 . N/2L − 1}. y) (L) (L) r (L) (x.9. . . y) − µc.k (L) 22k = MN M/2k −1 N/2k −1 fs.17) . . y) = (L) 1 6L 6 L fs. y) = 0.. y) − bc (x. At more detailed scales a reasonable ﬁrst approximation is ac (x.C} dc (x. y) = (L) (4) 1 r (L+1) ( x/2 .14) 5. The scaling factor provides a simple way of favouring the more reliable estimates. .k (x. Classify each pixel at position x. y belongs to class c. Naturally.5. we would expect an estimate formed by averaging many numbers to be more accurate. This represents the information (from higher scales and notions of continuity of regions) about the probability that the scale L block at x. .s. C} calculate the class distances dc (x. The values in the quadtree can be considered to be local estimates of the average energy in the wavelet subbands. . . y) x=0 y=0 (L) (5. (L) ac (x.k (L) µc.3 explains why we do not try and optimise these factors.k s=1 k=1 (L) (L) 2 (5. For x ∈ {0. y) deﬁned to be 1 if the corresponding parent block at scale L + 1 belongs to class c or 0 otherwise. y) = argminc∈{1. For each class c ∈ {1..16) All that remains is to deﬁne bc (x.9. .s.k (x. . .s. The classiﬁcation at scale L has the following steps 1. y as belonging to closest class r (L) (x.k (x. y): (L) dc (x. .15) 2. . 2. . y). y ∈ {0. y) (L) (5. c ∈ {1. .9. Alternative scaling factors may well give better results but section 5.
The average error for the DTCWT multiscale method is 9. The average for the multiscale DWT method is 16. An approximate treatment of a similar quadtree algorithm for white Gaussian noise ﬁelds is given by Spann and Wilson [111].5%. There is also a danger in an experimental optimisation that the method will work very well on the mosaics used during the optimisation but poorly on alternative mosaics (of diﬀerent shapes or texture content). An analysis of the eﬀect of the parameters would be interesting but is beyond the scope of this dissertation. The values were chosen during development of the algorithm to give sensible results on some mosaics (but not the test mosaics used by Randen and Husøy). K (L) bc (x. This aim is satisiﬁed better by using a “typical” multiscale algorithm than by testing a version optimised for a particular dataset. (such as the scaling factors. The aim of this section on multiscale segmentation is to demonstrate that the DTCWT features are also useful for the more advanced classiﬁcation schemes.5.18) where α is a parameter that controls the amount of information incorporated from previous scales.5%. For our experiments we used α = 1/4. On most of the test images the DTCWT performs substantially better than the DWT. λ.9. y − v) (5. 5. . There is just one case (image l) where the DWT gives better results than the DTCWT and even in this case the diﬀerence is only 0.0% for the nonmultiscale DTCWT method. that will aﬀect the performance.9. These values are certainly not theoretically or experimentally optimal.4 Multiscale results The multiscale algorithm was tested on the 12 test mosaics as before using both the DTCWT features as explained above and using features calculated in an analogous way from the DWT. and α).4% as compared to 18. TEXTURE SEGMENTATION Near the boundaries we should be less conﬁdent in the class assignment and we soften the function to reﬂect this uncertainty. y) = α u=−K v=−K K exp − (u2 + v 2 ) 2λ2 (L) ac (x − u. The softening is done by smoothing with a Gaussian ﬁlter with smoothing parameter λ = 3.3 Choice of parameter values There are a number of parameters in the method. 5.88 CHAPTER 5.
8 15. and that the DTCWT features are better than the NDWT features.4 0. greatly reduced.3 11 3. Tests on a multiscale algorithm indicated that the superior performance of the DTCWT features is preserved even for more sophisticated classiﬁcation methods.3 25.0% for the DTCWT features.4 13. For the test mosaics used the multiscale classiﬁcation reduced the average error to 9. despite the simpler training.9% for NDWT features.8 31.12: Comparison of segmentation results for multiscale methods The diﬀerence in performance of the method can be clearly seen in the results for mosaic “f” in ﬁgure 5.7 21.10.8% for DWT features. CONCLUSIONS 89 Mosaic a b c d e f g h i j k l Multiscale DTCWT error *100% 3.5 7.8 23.7 24.8 7.5.14.4%.3 17. 23. The main reason for the DTCWT outperforming the NDWT features is the increased number of subbands that allow more accurate orientation discrimination.1 6 1.4 21. and 18.8 24.4 Figure 5.10 Conclusions The experimental results clearly show that the NDWT features are better than the DWT features.4 2. There are many fewer small segments and the boundary errors are 5.9 9.3 0.3 2. A comparison with published results [101] reveals that the simpler training scheme gives almost as good results as LVQ training and the DTCWT features performed better than any of the feature sets tested in the published study. For the pixel by pixel classiﬁcation experiments the average error was 26.9 Multiscale DWT error *100% 3.8 31. .
multiscale DTCWT.13: Percentage errors for single scale DTCWT. . multiscale DWT. TEXTURE SEGMENTATION Two textures 40 20 0 DT−CWT multiscale DT−CWT multiscale DWT 40 20 0 Five textures j k l a b c d e Ten textures 40 20 0 40 20 0 Sixteen textures h i f g Figure 5.90 CHAPTER 5.
14: Segmentation results for mosaic “f” using the multiscale DTCWT .5. CONCLUSIONS 91 Figure 5.10.
92 CHAPTER 5. TEXTURE SEGMENTATION .
We compare the relative eﬀect of these diﬀerent parameters for the DTCWT. The main parameters came from the autocorrelations of the wavelet coeﬃcients and the crosscorrelations of the magnitudes of subbands at diﬀerent orientations and scales. Simoncelli used over 1000 parameters to describe his textures.Chapter 6 Correlation modelling 6. The original contributions of this chapter are the experimental synthesis and segmentation results.2 Autocorrelation Method The basic method is to repeatedly match both the image statistics and the transform statistics by alternating matching statistics and transforming to and from wavelet space 93 . The autocorrelation allows better texture synthesis and experiments indicate that sometimes autocorrelation based features can also give improved segmentation performance. 6. This method works well for many stochastic textures but fails when there is more structure in the image such as lines or repeated patterns. The method described is substantially based on a previous algorithm [108] and is not claimed as original.1 Summary The purpose of this chapter is to give an example of the use of the phase of the complex coeﬃcients. Simoncelli has demonstrated good performance with a similar synthesis technique when more parameters are extracted from an image than merely the energy [108]. We described in chapter 4 a synthesis technique that generated textures with matching subband energies.
94 CHAPTER 6. 6} indexes the 6 orientated subbands at scale 1. to avoid mixing changes caused by matching correlation with changes caused by a new set of image statistics we choose. Raw Autocorrelation The method generates the statistics rk (δx. 10. to simply measure and match the image histogram. . 4. We use instead the DTCWT. 2. kurtosis. variance. 8. y)∗wk (x + δx. y where x and y are integers. rk (δx. To make the equations simpler it is also convenient to deﬁne wk (x.3. 9. The DTCWT subbands contain complex coeﬃcients. δy) for subband k directly from the complex valued autocorrelation of the subband. 3. CORRELATION MODELLING and image space. mean. δy) = x y wk (x. ωy ) as shorthand for these two variables. 12} indexes the subbands at scale 2. minimum and maximum values [108]. Simoncelli based his synthesis upon the oriented complex ﬁlters of the steerable pyramid described in section 2. y) = 0 for any positions that lie outside the subband. The spectrum is a function of both horizontal (ωx ) and vertical (ωy ) frequency. δy) based on the autocorrelation of the magnitude of the complex wavelet coeﬃcients. k ∈ {1. 5. Let H(ω) be the spectrum of the ﬁlter. We use ω = (ωx . y + δy) Magnitude Autocorrelation The second method reduces the size of the parameter set by calculating real valued statistics rk (δx. y + δy) In both cases we match the appropriate statistics in essentially the same way. We have tested two method based on autocorrelation based statistics. 11. For example. rk (δx. We solve for an appropriate ﬁlter to apply to the subbands that will change the autocorrelation by roughly the required amount. as in chapter 4. Simoncelli measured the following image pixels’ statistics. y)wk (x + δx. Let Pim (ω) and Pref (ω) be the Fourier 1 In this section we use the more compact notation of a single number k to index subbands at diﬀerent scales and orientations. We ﬁrst describe the raw autocorrelation matching method. while k ∈ {7. δy) = x y wk (x. We denote wk (x.4. We start by measuring the parameters of a target image and generating a random (white noise) image of the correct size. skewness. These 6 values capture the general shape of the histogram and we would expect them to give results very similar to using the full histogram. However. y) the subband k wavelet coeﬃcient1 at position x.
1) We require this output spectrum to be close to the reference spectrum and so the natural ﬁlter to use is given by: H(ω) = Pref (ω) Pim (ω) + δ (6. To avoid divisions by small numbers we increase the denominator by a small amount δ (in the experiments we use δ = 0. Pim (ω) and Pref (ω) are known as the power spectra of the subbands.2. Note that as we only retain a few autocorrelation coeﬃcients the Fourier transforms involved are small and consequently fast.01Pim(0)). After using equation 6. AUTOCORRELATION METHOD 95 transforms of the autocorrelations respectively of a subband and the corresponding subband from the transform of the target texture. We only have the central samples of the autocorrelations and so we estimate the power spectra by zeropadding the autocorrelation matrices to twice their size before taking Fourier transforms. a better matched autocorrelation.6. We then convolve the subband with the ﬁlter (this is actually done in the Fourier domain by multiplication) in order to produce a new subband with. Finally the new magnitudes are recombined with the original phases to produce the new complex subband. The form of the autocorrelation is such that r(x. We only impose correlations on the complex subbands: the scaling coeﬃcients are not changed during the matching of transform statistics. Standard ﬁlter theory tells us that the power spectrum of the ﬁltered image is given by: Pf ilt (ω) = H(ω)2 Pim (ω) (6. Throughout this chapter we will always use a 5 scale DTCWT decomposition. y) = r(−x. For the magnitude autocorrelation method we ﬁrst compute the magnitude and phase of each coeﬃcient in the subband. For counting purposes we will treat the real and imaginary parts as separate parameters. This results in 5 ∗ 6 = 30 complex subbands plus a real lowpass subband of scaling coeﬃcients. Then the above matching method is applied to a subband consisting of just the magnitudes.2) The deﬁnition of the power spectrum ensures that Pim (ω) and Pref (ω) are always real. in practice we found that this scheme converged within a few iterations. −y)∗ and so for an autocorrelation of size K by K (for odd K) .2 to produce the ﬁlter spectrum we use an inverse Fourier transform to produce the actual ﬁlter coeﬃcients. Although we have not proved that this will always improve the match. hopefully.
For comparison. For the magnitude autocorrelation this is reduced to 15(K 2 + 1).96 CHAPTER 6.3 Autocorrelation Results One texture (of wood grain) on which the energy synthesis method performs poorly is shown in ﬁgure 6. the strong correlation is lost. The problem is the highly correlated lines of grain that cross the Matching size 1x1 mag autocorrelation Target texture Processed picture after 3 iterations Figure 6. Next we test the magnitude autocorrelation method. the central sample r(0. These results are just as bad as the .2. The improvement is even greater if we use the central 5 by 5 samples as shown in ﬁgure 6.4 shows the results of using the central 5 by 5 samples of the magnitude autocorrelation (there is no noticeable diﬀerence if we just match a 3 by 3 autocorrelation).3 where the synthetic texture appears very similar in texture to the original. We ﬁrst test the raw autocorrelation method. CORRELATION MODELLING there are only (K 2 + 1)/2 independent complex numbers. although the general diagonal orientation of the texture is reproduced.1. Moreover. the energy synthesis method of chapter 4 needs only 30 parameters to describe a texture. The diagonal lines are longer and the image is more tightly constrained to one orientation. Recording and matching merely the central 3 by 3 samples of the autocorrelation matrix results in the improved results shown in ﬁgure 6. We conclude that we need to record K 2 numbers to record the raw autocorrelation for a single subband. 6. 0) is always real. Figure 6. and hence we have a total of 30K 2 parameters describing the transform statistics.1: Results of original energy matching synthesis texture and.
6. AUTOCORRELATION RESULTS 97 Matching size 3x3 raw autocorrelation Target texture Processed picture after 3 iterations Figure 6.3: Results of matching 5 by 5 raw autocorrelation values .2: Results of matching 3 by 3 raw autocorrelation values Matching size 5x5 raw autocorrelation Target texture Processed picture after 3 iterations Figure 6.3.
The original texture contains alternating stripes of strongly orientated material and although the synthetic texture does contain some patches of strongly orientated texture.98 CHAPTER 6. More iterations produced negligible changes to the synthesized images. Therefore when we match the raw autocorrelation we are ensuring that the line segments will be correctly aligned.4 Discussion The raw autocorrelation matching gives a signiﬁcant improvement and so is managing to capture the correlation in the image. while magnitude matching fails to help. We have shown the results after 3 iterations as these methods Matching size 5x5 mag autocorrelation Target texture Processed picture after 3 iterations Figure 6.5.4: Results of matching 5 by 5 magnitude autocorrelation values were found to converge very quickly. CORRELATION MODELLING original energy synthesis. There are some textures for which the autocorrelation does not work as well such as the hessian pattern in ﬁgure 6. We described in chapter 3 how complex wavelets can be used for motion estimation because the change in phase of a wavelet coeﬃcient between frames is roughly proportional to the shift of an underlying image. We added measures of crosscorrelation between subbands in an attempt to solve this problem as described in the following sections. if a subimage is responding to lines in the image then the phases of the autocorrelation coeﬃcients encode the relative positions of the line segments. This is because at these places there is energy both at an orientation of 45◦ and of −45◦ . In a similar way. it also contains several places where there seems to be a more checkerboard appearance. This means that signiﬁcant information is contained in the phases of the wavelet coeﬃcients. . 6.
k (a. Unfortunately. b) to be the correlation . there is not any signiﬁcant raw correlation between subbands. y)wk (x. The between scales crosscorrelation measures the correlation between a subband at one scale and the subbands at the next coarser scale. Suppose that there is a particular pattern that is repeated many times in a certain texture and suppose we break the original image up into lots of smaller subimages containing examples of this pattern. This was not a problem for autocorrelation as the coeﬃcients within a single band all respond in the same way to a translation and hence the relative phase contains useful information. will rapidly change if the image is translated at an angle of −45◦ . The single scale crosscorrelation cjk between subband j and subband k is given by cjk = x y wj (x. while the phase of the coeﬃcients in the −45◦ subband will be much less aﬀected by such a translation [74].5 Crosscorrelation method We measure the crosscorrelations between subbands at one scale. The phase of coeﬃcients in the 45◦ subband. and between subbands at one scale and those at the next scale.5. We deﬁne bl. The problem is that the phase of wavelet coeﬃcients gives information about the location of features in a particular direction.j. CROSSCORRELATION METHOD 99 Target texture Processed picture after 3 iterations Figure 6. The translations therefore alter the phase relationship between subbands and hence when the crosscorrelation is averaged across the entire texture the individual correlations will tend to cancel out.5: Results of matching 5 by 5 raw autocorrelation values 6. It may seem odd that we use the magnitudes when we have discovered the importance of phase for autocorrelation matching. y). say. There is no reason for the pattern to have any particular alignment within the subimages and we can interpret the subimages as being a single prototype image distorted by diﬀerent translations.6.
6 Crosscorrelation results and discussion We compare three methods in this section: Energy The subband energy matching method from chapter 4 using 30 parameters to describe the wavelet statistics.k (a. The matching procedure ﬁrst matches the raw autocorrelation values in the way described earlier and then attempts to match the crosscorrelations. we do not use a between scales correlation for the coarsest scale subbands as these wavelet coeﬃcients have no parents. Again we use a measure of magnitude correlations as the raw phases will tend to cancel out2 We need 6 ∗ 5/2 = 15 parameters to describe the cross correlations at a single scale.100 CHAPTER 6. We use a and b to measure a separate correlation for each of these 4 choices.j. b) = x y wl (2x + a. The details of the crosscorrelation matching method can be found in Simoncelli’s paper [108]. For the 5 scale decomposition this gives a total of 15∗5+144∗4 = 651 parameters to describe cross correlations in addition to the parameters used to describe the image statistics and autocorrelations. The phases at the coarser scale will change at half the rate of the ﬁner scale and so to make the comparison meaningful the coarser scale coeﬃcients must have their phase doubled. Due to the down sampling in the tree structure each position at scale l + 1 is eﬀectively the parent of 4 positions at scale l. Naturally. and 6 ∗ 6 ∗ 4 = 144 parameters to describe the cross correlations between subbands at one scale and the next coarser scale. The experiments presented here do not make use of this modiﬁcation but details can be found elsewhere [96]. 2 A way has been proposed to avoid the cancellation when computing the crosscorrelation between two subbands at diﬀerent scales but the same orientation. 2y + b)wl+1(x. 6. Raw autocorrelation The 7 by 7 raw autocorrelation matching method from the start of this chapter using 30 ∗ 7 ∗ 7 = 1470 parameters to describe the wavelet statistics. y) where a and b take values 0 or 1. . CORRELATION MODELLING between subband j at scale l and subband k at scale l + 1: bl.
6. Original Energy Raw Auto−correlation Cross−correlation In Figure 6.6 displays the results of the diﬀerent methods applied to 4 test images. There are several penalties associated with the increased feature set size. but synthesis itself is only of secondary interest. cluding the crosscorrelation statistics leaves the ﬁrst three textures essentially unaltered.6: Comparison of diﬀerent synthesis methods. On all of these textures the autocorrelation method gives a clear improvement compared to the original energy synthesis method but it has a signiﬁcant increase in the size of the parameter set.6. For texture synthesis it is reasonable to expect an increase in quality for each new feature matched. Figure 6. but there is also a possible decrease in performance. The most obvious drawbacks are that the storage and computation requirements are increased. The last hessian texture may be considered to be slightly improved but the improvement is certainly not very large. The main interest is in . CROSSCORRELATION RESULTS AND DISCUSSION 101 Crosscorrelation The 7 by 7 raw autocorrelation matching together with crosscorrelation matching to give a total of 30 ∗ 7 ∗ 7 + 651 = 2121 parameters to describe the wavelet statistics.
We cannot directly use the autocorrelation to provide features because we just get one autocorrelation value for the entire subband. The principal problems with extra features are that: 1. 2. Filter the subband Wk (u.102 CHAPTER 6. Instead we consider a very simple extension of the DTCWT that performs extra ﬁltering on each subband to determine four features per subband. Each subband Wk (u. v) vertically with the ﬁlter 1 − z −1 to produce Bk (u. v) vertically with the ﬁlter 1 + z −1 to produce Ck (u. 4. Filter the subband Wk (u. Filter the subband Lk (u. The next section examines the performance of a larger feature set for the segmentation task. v). 3. the features may have signiﬁcant correlations – this causes problems if we want to use simple metrics to compare feature sets. the features may be modelling variation within a single texture class. v). 6. 3. We propose using a simple shift invariant extension to the DTCWT based on the Haar wavelet transform. v). v). v) vertically with the ﬁlter 1 + z −1 to produce Ak (u. large textural regions are needed to reliably estimate the feature values. CORRELATION MODELLING using the features for texture analysis. v) horizontally with the ﬁlter 1 + z −1 to produce Lk (u. v) vertically with the ﬁlter 1 − z −1 to produce Dk (u. . 2.7 Large feature set segmentation For the reasons mentioned above it would be inappropriate to try and use the more than 2000 auto and cross correlation parameters for texture segmentation but it would be interesting to see the eﬀect of using features based on the 3 by 3 raw autocorrelation. v) is split into four by: 1. v) horizontally with the ﬁlter 1 − z −1 to produce Hk (u. We ﬁrst describe this algorithm and then explain why it is approximately equivalent to calculating a local autocorrelation estimate. Filter the subband Lk (u. and for analysis applications extra features can be a disadvantage. v). while for segmentation we clearly need local estimates of feature values. 6. 5. Filter the subband Hk (u. v). Filter the subband Hk (u.
25% of the peak amplitude.4 Horizontal Frequency Vertical Frequency 0 0. LARGE FEATURE SET SEGMENTATION 103 All the ﬁltering operations are nondecimated and use symmetric edge extension. Let {yk } denote the samples of the ﬁltered signal and {xk } the original white . Bk . Contours are plotted at 90%. Suppose we ﬁlter a signal containing white Gaussian noise of mean zero and variance 1 whose Z transform is X(z) with a ﬁlter H(z) = 1 + az −1 where we assume a is real. Figure 6.1 0.3 0. Edge extension is important when ﬁlters overlap the edge of an image. Now we explain why this is an appropriate measure of local autocorrelation.1 0 Vertical Frequency 0 0.1 0.4 Horizontal Frequency Vertical Frequency 0 0.2 0.2 0.5 0.3 0.3 0.2 0.3 0.1 0. Ck .4 Horizontal Frequency Figure 6.4 0.1 0 0.2 0.5 0. For the very short ﬁlters used here we merely need to repeat the last row (or column) of the image.4 0.4 0.2 0. The features are then calculated as before but based on Ak .5 0. but the connection with autocorrelation is not as clear. 75%. 0.1 0 Vertical Frequency 0 0.3 0.2 0. The short ﬁlters mean that the features will produce local features.2 0.6. The new subbands approximately quarter the frequency response of the original subband.3 0.5 0.4 0.3 0. A dashed contour at the 25% peak level for the original 45◦ scale 2 subband is also shown in each plot. 50%.7: 2D frequency responses for the four subbands derived from the level 2 45◦ subband.7 shows contours of the frequency responses for the four subbands derived from the original 45◦ subband at scale 2.1 0.3 0.4 Horizontal Frequency 0.7. Dk rather than on the original subbands.2 0. This leads to four times as many features. To motivate our choice we consider a one dimensional example.1 0 0.
The output {fk } of using the 1 + z −1 ﬁlter is equivalent to ﬁltering the original white noise with a combined ﬁlter of (1 + z −1 )H(z) = 1 + (1 + a)z −1 + az −2 This will therefore produce an output with expected energy E fk 2 = 1 + (1 + a)2 + a2 = 2(1 + a + a2 ) If instead we ﬁlter with 1 − z −1 to get {gk } we have an expected energy of E gk 2 = 1 + (−1 + a)2 + a2 = 2(1 − a + a2 ) The sum of these two energies is 4 times the average autocorrelation at lag 0. Table 6. More precisely. This illustrates the close link between the extra ﬁltering and autocorrelation and suggests why the ﬁltering provides an appropriate measure of local autocorrelation statistics.104 CHAPTER 6. Now consider the expected energy after using the Haar ﬁlters. the ﬁltering introduces correlations between the samples.7% compared to 18.2. while the diﬀerence is 4 times the average autocorrelation at lag 1. CORRELATION MODELLING noise samples. The extended feature set is better for 9 out of the 12 test mosaics but 5. this gain is not automatic and great care must be taken in choosing features. The output samples {yk } will contain coloured noise. The average of the errors for the extended feature set is 17. This agrees with the argument in the previous section that although extra features can sometimes provide improvements in segmentation. the average autocorrelation for lag 0 is 1 + a2 and it can easily be shown that the autocorrelation for any other lag is 0.0% for the DTCWT.8 presents the results of using the enlarged feature set for the pixel by pixel segmentation experiment described in section 5.5% worse for mosaic a. In other words. as yk = xk + axk−1 and yk+1 = xk+1 + axk we can calculate the following expected correlations: E {yk yk } = E x2 + 2aE {xk xk−1 } + a2 E x2 k k−1 = 1 + a2 E {yk yk+1 } = E {xk xk+1 } + aE x2 + aE {xk−1 xk+1 } + a2 E {xk−1 xk } k = a This means that the average autocorrelation for lag ±1 is a.6. . Also tabulated are the results for the DTCWT repeated from table 5.
1 18.8 16.4 24. CONCLUSIONS 105 Mosaic a b c d e f g h i j k l DTCWT error *100% 10.4 0.4 8.6.6 17. Matching crosscorrelation only slightly changed the results. The phase is an important part of the correlation because matching features based solely on the magnitude autocorrelation gave inferior results.1 0.3 Extended DTCWT* 100% 16.3 33.8 19.4 19.6 15.4 21.8 40. Numerical experiments using an extended feature set conﬁrmed that autocorrelation related features can sometimes increase segmentation performance but that they can also decrease performance in other cases.6 0.8.4 34 39.9 21.2 16.5 14. These conclusions are all based on the subjective quality of synthesized textures.6 1.8 Conclusions The extra features generated by measuring the autocorrelation of the subbands are useful for modelling longerrange correlations and allow good synthesis of strongly directional textures.1 9.3 28.2 Figure 6.8: Comparison of segmentation results for diﬀerent transforms 6. .
CORRELATION MODELLING .106 CHAPTER 6.
The speciﬁc motivation for using the Bayesian methodology to address the problems in the following chapters is to provide a mathematical framework within which we can place and compare techniques. The purpose of this chapter is to introduce a complex wavelet model and compare this with alternative model formulations and alternative wavelet choices. The concepts of probability distributions and Bayes’ theorem are brieﬂy stated and then used to construct a common framework for a range of diﬀerent texture models. Note that we are not directly aiming to compare Bayesian and nonBayesian methodologies. Within this context the choice of wavelet 107 . Both approaches are commonly used and both approaches have diﬀerent advantages. The remaining chapters have a very diﬀerent ﬂavour. We consider a number of diﬀerent ways in which we can deﬁne a prior distribution for images and reexpress each model in terms of a multivariate Gaussian. The aim of this dissertation is to explore the use of complex wavelets in image processing. We will now approach image processing from a Bayesian perspective. The previous chapters have illustrated their use within certain nonBayesian methods and now we wish to explore the use within Bayesian processing. The general approach has been to design what we hope to be an appropriate algorithm for addressing a particular problem and then examine the experimental results.Chapter 7 Bayesian modelling in the wavelet domain 7.1 Introduction The previous chapters have been concerned with nonBayesian image processing techniques.
Joint cdfs are deﬁned as FX. Basic familiarity with the concepts of probability theory and with the ideas of random processes and their covariance and correlation functions [116] is assumed. y) ∂x∂y . 7.Y (x.108 CHAPTER 7. and the experimental measurements in 1D and 2D of the eﬀects of using a decimated transform. the identiﬁcation of the autocorrelation of the process in terms of wavelet smoothing. Y ≤ y) and joint pdfs as fX. y) = P (X ≤ x. BAYESIAN MODELLING IN THE WAVELET DOMAIN transform is considered in section 7.Y (x. Probability Density Function (pdf) The pdf for a random variable X is a function fX : R → R . deﬁned by fX (x) = ∂FX (x) ∂x where FX (x) is the cdf for the random variable. The main original contributions are. These terms are deﬁned as: Cumulative Distribution Function (cdf) The cdf for a random variable X is a function FX : R → R . We will use the terms pdf or cdf to describe the distribution of a random variable. y) = Conditional pdfs are deﬁned as fXY (xy) = ∂P (X ≤ xY = y) ∂x ∂ 2 FX.2 Introduction to Bayesian inference This section gives a brief introduction to the terminology used for Bayesian inference.4. deﬁned by FX (x) = P (X ≤ x). the description of shift invariant wavelet models in terms of a Gaussian random process.Y (x.
1) where we have dropped the subscripts for clarity (this will usually be done within the dissertation when there is little risk of confusion). This represents our knowledge about the model. x represents the unknown parameters and y the observed image data. There are special words that refer to particular parts of the formula: Prior The prior distribution is the name for f (x). Generating a sample from f (yx) is equivalent to using our model to generate some typical noisy image data based on some particular values of the parameters x.1.3 Bayesian image modelling In this chapter we are only concerned with specifying the prior distribution for images. (Later.7. we explain how. In equation 7. we are trying to mathematically express the expectations we have for the images that are likely to occur. BAYESIAN IMAGE MODELLING 109 Using these deﬁnitions Bayes’ theorem for pdfs can be written as f (xy) = f (yx)f (x) f (y) (7. a simple prior for an image might be that all pixels have intensity values independently and uniformly distributed between 0 and 1. We will use upper case for vectors of random variables and lower case for observations of . in chapters 8 and 9. This represents all the information we have about the model parameters after observing the image data. Clearly this will be intimately related with the precise application and the models we describe will have a number of parameters that can be tuned depending on the circumstances. Bayes’ theorem provides a way of estimating the parameters of the model (such as the clean image before noise was added) from the observed data. Likelihood The likelihood is the name for f (yx).) In other words.3. This represents the information we have about the model parameters before observing the image data. 7. once equipped with a suitable prior. For notational simplicity we represent images by placing the pixels into column vectors. Posterior The posterior distribution is the name for f (xy). we can proceed to deﬁne the likelihood for a particular problem and ﬁnally infer estimates from observed data. We will make use of Bayes’ theorem when we have a model that generates images based on some unknown parameters. For example.
Let xa ∈ R 2 be the position of the ath pixel. A process is deﬁned to be stationary in the wide sense if 1. . Let N be the number of locations within the image. We use the notation Z s N (µ. B) p(Z = z) = 1 (2π)N/2 B1/2 1 exp − (z − µ)T B −1 (z − µ) 2 (7. N} E {Za } = c 2. The ﬁrst condition means that µa = E {Za } = c and for simplicity we shall assume that the data has been shifted to ensure c = 0. We shall make extensive use of the multivariate Gaussian distribution (also known as a multivariate Normal distribution). One useful standard result is that if Z s N (µ. the expectation is independent of position: there exists a c ∈ a ∈ {1. for example. ABAT .b = R(xa − xb ) 1 R such that for all The intensity values are given by continuous random variables. b E {Za Zb } = R(xa − xb ). R is called the autocorrelation function for the random process.110 CHAPTER 7. The process is called discrete because values are only deﬁned on the grid of pixel positions that cover the image. integers. There is no assumption that the intensity values must be.2) where N is the length of the vector Z and B is the determinant of B. The multivariate Gaussian is deﬁned to have the following pdf for Z s N (µ. BAYESIAN MODELLING IN THE WAVELET DOMAIN such vectors. B) to denote that the random variables contained in Z are drawn from a multivariate Gaussian distribution with mean µ and covariance matrix B. . We are particularly interested in wide sense stationary processes. B) and Y = AZ where A is a real matrix with N columns and S ≤ N rows then Y s N Aµ. . . . the correlation of two random variables Za and Zb is a function only of their relative position: there exists a function R : R 2 → R such that for all a. Images generated with a multivariate Gaussian distribution are also known as realisations from a discrete1 Gaussian random process. The second condition means that the covariance matrix B has a speciﬁc structure: E {Za Zb } = Ba.
7.3. The i. we call this a generative speciﬁcation.1 Filter model A simple example of the generative method is the ﬁlter model. for example. The signals produced by such a system are examples of wide sense stationary discrete Gaussian random processes. Then let A be a matrix that applies the ﬁltering operation so Z = AR.3. The values Z in the ﬁltered image will be distributed as Z s N 0. j entry in the covariance matrix is equal to the covariance function of the random process evaluated for the diﬀerence in position between pixel i and pixel j. we shall call this the direct speciﬁcation. We now consider a number of standard prior models and convert each case to the equivalent multivariate Gaussian form for the distribution. the ARMA (autoregressive moving average) model. In other cases a method for generating samples from the prior is given. AAT .7. BAYESIAN IMAGE MODELLING 111 The covariance function of the random process is deﬁned to be E {(Za − E {Za })(Zb − E {Zb })} and is equal to R(xa − xb ) because E {Za } = E {Zb } = c = 0. Sometimes a formula for the pdf is explicitly stated.2 Fourier model The Fourier model is another generative speciﬁcation closely related to the previous model. This relationship is well known [116] and we highlight a couple of standard results (assuming that the ﬁlter is stationary.3. Imagine that we have an image ﬁlled with random white noise of variance 1 which is then ﬁltered. We imagine generating complex white noise in the Fourier domain with variance depending . 7. There are two styles of distribution speciﬁcation that are often encountered. The covariance function of the random process is equal to the autocorrelation function of the ﬁlter impulse response. This process generates sample images from the prior.e. Let R be a column vector containing all the white noise samples. that the same ﬁlter is used across the entire image): 1. Many models are special cases of this system including. 2. i.
The energy of the Fourier coeﬃcients is equal to the energy in the image. This is known as Parseval’s theorem: a 2 √ M by √ M . The images generated by the Fourier model can be written as as: RR RI F H D (RR + jRI ) where RR and RI are distributed s N (0. then inverting the Fourier transform. . IM ) Denote the real part of F by FR .112 CHAPTER 7. The inverse is the Hermitian transpose of F . BAYESIAN MODELLING IN THE WAVELET DOMAIN on the frequency according to some known law.b = √ exp −2πj √ M M where we assume that the images are square of dimensions of the Fourier transform has the following properties: 1. FR D 2 FR + FIT D 2 FI . and ﬁnally taking the real part of the output to generate a sample image. This deﬁnition = aH a = aH F H F a = F a 2 Let D be the (real) diagonal matrix that scales white noise of variance 1 to give white noise in the Fourier components of the desired variances. The images can be expressed as Z = = F H D (RR + jRI ) T FR − jFIT D (RR + jRI ) T = FR DRR + FIT DRI = T FR D FIT D RR RI and we deduce that the prior distribution is T Z s N 0. and the imaginary part by FI . IM ) s N (0. F H F = IM 2. We deﬁne a matrix F to represent the Fourier transform: 1 xT xb a Fa.
One way to understand the covariance is via the equation W T D 2 W C = W T D 2 W (W T D 2 W )−1 = IN . 7.7. P the reverse wavelet transform. If we let the (real) matrix W represent the forward wavelet transform. (We also assume that D is a real matrix. 2. If we have a complex wavelet transform then it is convenient to treat the real and imaginary parts of the output coeﬃcients as separate real outputs of the transform. BAYESIAN IMAGE MODELLING 113 T again a multivariate Gaussian.3 Wavelet direct speciﬁcation One way of using a wavelet transform to deﬁne the prior is to specify a pdf deﬁned on a weighted sum of the squares of the wavelet coeﬃcients of the transform of the image. Multiply the Fourier coeﬃcients by the diagonal entries of D 2 .3. 4.) This equation represents the following process: 1. Let B = FR D 2 FR + FIT D 2 FI be the covariance matrix. Take the Fourier transform of an impulse at location xj . This process represents a simple blurring operation and we conclude that the covariance function of the generated random process is given by such a blurred impulse. The entry Bij can be written as Bij = eT Bej i T = eT FR D 2 FR + FIT D 2 FI ej i = eT i F H D 2 (FR + jFI ) ej where ei is the column vector containing all zeros except for a one in the ith place. (W T D 2 W )−1 The covariance matrix C = (W T D 2 W )−1 has a strange form. and D a diagonal matrix containing the weights we apply to the wavelet coeﬃcients then p(Z = z) ∝ exp − 1 DW z 2 2 1 = exp − zT W T D 2 W z 2 which we can recognise as the form of a multivariate Gaussian. Extract the real part of the entry at location xi . Z s N 0.3. 3. Invert the Fourier transform of the scaled coeﬃcients.
For nonbalanced wavelets W T is not the same as the reconstruction transform P .3. we use reconstruction ﬁlters given by the conjugate time reverse of the analysis ﬁlters. 2. Recall that for a standard wavelet transform H0 (z) and H1 (z) deﬁne the analysis ﬁlters while G0 (z) and G1 (z) deﬁne the reconstruction ﬁlters.4 Wavelet generative speciﬁcation We can also generate sample images using wavelets by generating white noise samples of variance 1 for each wavelet coeﬃcient. G1 (z) = H1 (z −1 ) where the conjugation operation in these equations is applied only to the coeﬃcients of z. BAYESIAN MODELLING IN THE WAVELET DOMAIN This means that the ith column of C is transformed by a wavelet sharpening process W T D 2 W to become ei . For a balanced wavelet the wavelet sharpening algorithm consists of the following steps: 1. Multiply the wavelet coeﬃcients by the diagonal entries of D 2 . Invert the wavelet transform. a nonredundant balanced transform) then the inverse of a wavelet sharpening process will give the same results as a wavelet smoothing process. These may no longer correspond to a perfect reconstruction system but if we nevertheless use the reconstruction ﬁlter tree with these new ﬁlters then we eﬀectively perform a multiplication by W T . However. If we are using an orthogonal wavelet transform (i. In other words.114 CHAPTER 7. Take the wavelet transform of the image. scaling them by a diagonal matrix D and then . but not to z itself.e. 7. We deﬁne a new set of ∗ ∗ reconstruction ﬁlters G0 (z) = H0 (z −1 ). there is a natural interpretation for W T in terms of the ﬁlter tree used to compute wavelets. the covariance function is such that if we apply a wavelet sharpening process we produce an impulse. In other words. but this is not necessarily true for a redundant balanced wavelet transform. If we assume that the same weighting is applied to all the coeﬃcients in a particular subband then (for a shift invariant transform) the prior will correspond to a stationary discrete Gaussian random process with some covariance function. 3. The mathematics translates to saying that the shape of this covariance function is given by the inverse of the sharpening algorithm applied to an impulse.
Invert the wavelet transform. The images are generated by Z = P DR and the prior distribution will be given by Z s N 0. this second assumption is discussed in section 7. Forward wavelet transform the impulse. 3. The next sections report on two factors relating to the accuracy of the results: .7.4 Choice of wavelet There are a number of factors to consider in our choice of wavelet.4. Lemma 1 in Appendix B shows that the autocovariance function for a sum of two independent images is given by the sum of the individual covariance functions. 7. 2. Scale the wavelet coeﬃcients by D 2 . CHOICE OF WAVELET 115 inverting the wavelet transform. For a balanced wavelet W = P T and. the covariance function of this process is given by a smoothing procedure applied to an impulse: 1. the covariance of the sum of the reconstructions will be the sum of the individual covariances because the scales all have independent noise sources. Consequently.2. This is the most important model for our purposes as it is the model that will be used in the next chapter. one from each subband.4. The assumption that this prior is stationary means that the prior is a stationary discrete Gaussian random process. just as for the Fourier method. Each subband has a common scaling applied to the wavelet coeﬃcients and so can be viewed as a special case of the ﬁltering method with the ﬁlter being the corresponding reconstruction wavelet. We will assume that the same weighting is used for all the coeﬃcients in a particular subband and that the choice of wavelet transform is such that the sample images are implicitly drawn from a stationary prior. P D 2P T . An alternative view of this method for a S subband wavelet transform is to consider the images as being the sum of the S reconstructions. The covariance function for images generated from noise in a single subband is therefore given by the autocovariance of this wavelet.
The analysis ﬁlters for the Gaussian pyramid are Gaussians. Flexibility in prior model The covariance structure of the prior model is determined partly by the choice of scaling factors and partly by the choice of wavelet. 3.1 Possible Basis functions We based our discussion of the wavelet generative model on wavelet transforms but the discussion is also applicable for any set of ﬁlter coeﬃcients used in the pyramid structure even if the choices do not belong to a wavelet system. To reconstruct an image from a Laplacian pyramid we use Gaussian reconstruction ﬁlters.4. We should choose a wavelet that allows us to generate the covariance structure of a given application. The analysis ﬁlters for the Laplacian pyramid are (short FIR approximations to) the diﬀerences between Gaussians of diﬀerent widths. BAYESIAN MODELLING IN THE WAVELET DOMAIN Shift invariance the wavelet generative model is only appropriate for transforms with low shift dependence. The dual tree complex wavelet transform (DTCWT).1 proposes ﬁve possibilities for the choice and the following two sections estimate the importance of the factors for each of these choices. Adelson et al. Further discussion more tightly linked to the nature of the application can be found in section 8. [1] suggested using either the Gaussian or Laplacian pyramid to analyse images. The Gaussian pyramid transform (GPT).116 CHAPTER 7.7. 4. A real nondecimated wavelet transform (NDWT) based on the Daubechies ﬁlters of order 8. 2. . The WTransform (WWT). We will consider ﬁve waveletlike systems: 1. 5. The GPT is one of the oldest waveletlike transforms. A real fully decimated wavelet transform (DWT) based on the Daubechies ﬁlters of order 8.4. Section 7. 7.
Choosing G(z) to be a simple Wavelet Coefficients at scale 3 Upsample Rows Wavelet Coefficients at scale 2 Filter Rows with G(z) Upsample Columns Filter Columns with G(z) Upsample Rows Wavelet Coefficients at scale 1 Filter Rows with G(z) Upsample Columns Filter Columns with G(z) Upsample Rows Filter Rows with G(z) Upsample Columns Filter Columns with G(z) Output Surface Figure 7.1: Sequence of operations to reconstruct using a Gaussian Pyramid 5tap ﬁlter √ G(z) = z −2 + 3z −1 + 4 + 3z + z 2 /6 2 (7. CHOICE OF WAVELET 117 We will be using the pyramid to reconstruct surfaces and it can be implemented in the same way as a normal dyadic wavelet transform by a succession of upsampling operations and ﬁltering operations.1 shows the sequence of operations involved in reconstructing a surface from wavelet coeﬃcients at 3 scales. The analysis lowpass ﬁlter is 1 H0 (z) = √ −z −1 + 3 + 3z − z 2 2 2 . Figure 7.3) gives a close approximation to Gaussian shape and provides good results for very little computation.4.7. The WWT (Wwavelet transform) [67] is a biorthogonal wavelet transform.
We shall use the WWT in a manner analogous to the pyramid transform with G(z) = z −1 + 3 + 3z + z 2 .2 Shift Invariance The wavelet generative model produces a stationary random process if the aliasing is assumed to be negligible. Each time we subsample the output of a wavelet ﬁlter we halve the Nyquist frequency. Standard wavelet transforms (DWT) repeatedly split the spectrum into two halves and downsample by a factor of two. 7. The real wavelets in the DWT have both positive and negative passbands. The ﬁnite length of the ﬁlters means that the bandwidth of each channel will be slightly greater than half the spectrum making these transforms shift dependent. The NDWT will be shift invariant because it has no subsampling.118 CHAPTER 7. for many choices of wavelet the aliasing will be signiﬁcant. BAYESIAN MODELLING IN THE WAVELET DOMAIN and the analysis highpass ﬁlter is 1 H1 (z) = √ −z −1 + 3 − 3z + z −2 2 2 The reconstruction ﬁlters are also very simple. For the original signal the Nyquist frequency is half the sampling frequency. By discriminating between positive and negative frequencies the DTCWT wavelets only have a single . It is useful to consider when a transform is shift invariant. However. For shift invariance we require the bandwidth of each wavelet ﬁlter to be less than the Nyquist frequency for the corresponding subband [63]. Similarly for the generative speciﬁcation there will be no aliasing and hence shift invariance as long as this is true. The direct speciﬁcation will have a shift invariant prior if the energy within each scale is invariant to translations.4. The energy will be shift invariant if there is no aliasing during the wavelet transform. The lowpass ﬁlter is 1 G0 (z) = √ z −1 + 3 + 3z + z 2 2 2 and the reconstruction highpass ﬁlter is 1 G1 (z) = √ z −1 + 3 − 3z − z −2 2 2 Unlike a standard wavelet transform the wavelets at a particular scale are not orthogonal to each other: the orthogonality is sacriﬁced in order to produce smoother reconstruction wavelets.
We demonstrate the eﬀect of shift invariance with two experiments. These are the qualitative results that we wanted to demonstrate with this experiment. The ﬁrst experiment compares approximation methods in one dimension using the diﬀerent transforms. The NDWT.0. and the DTCWT all appear very close to being shift invariant. The second gives a quantitative estimate of the amount of variation in the twodimensional case. CHOICE OF WAVELET 119 passband. 7.1.4. The WWT has a small amount of shift dependence. The ﬁrst gives a qualitative feel for the eﬀect by using a simple one dimensional example. The diﬀerent transforms produce diﬀerent approximations because 2 For the NDWT the variance for each scale was reduced by the amount of oversampling in order to produce equivalent results.5. These precise values are not critical because this experiment is just meant to give a feel for the relative performance. This reduces the bandwidth to within the Nyquist limit and hence allows the reduction of aliasing [63].3 One dimensional approximation The purpose of this section is merely to give an illustration of the eﬀect of shift dependence.0. As the generative speciﬁcation only uses the reconstruction ﬁlters the increased analysis bandwidth does not matter. The results for diﬀerent origin positions are stacked above one another. and the standard DWT has a large amount of shift dependence.2.7. . We repeat the experiment 8 times. The method described in chapter 8 is used to perform the approximation. We chose the positions and values of six points and then used both methods to approximate 128 regularly spaced values.1 and we used a variance of 0. The variances for the diﬀerent scales were (from coarsest to ﬁnest) 4. GPT. The actual shape of the approximation is not signiﬁcant as it is highly dependent on the scaling factors chosen. We make use of the approximation method that will be developed in chapter 8 and readers unfamiliar with approximation can safely skip this section.4. Figure 7.01 for the measurement noise2 . We use a ﬁve scale decomposition for each transform.0. The GPT and WWT reduce aliasing by decreasing the bandwidth of the lowpass reconstruction ﬁlters at the cost of increasing the bandwidth of the lowpass analysis ﬁlters. and we have marked the original positions and data values with crosses.2 shows the estimated values.2. the only diﬀerence between repetitions is that each time the origin is slightly shifted.
3. 7. the associated covariance structures are diﬀerent.4. Using the same notation as in section 7.4 Twodimensional shift dependence Section 7.2: One dimensional approximation results for diﬀerent origin positions. The WWT and the GPT use lowpass ﬁlters with a smaller bandwidth than is usually used for wavelets and therefore the results are smoother. The associated random process will only be stationary if the result of smoothing is independent of location.4. We measure the amount of shift dependence by examining the energy of the diﬀerence.3. This section describes the results of an experiment to measure the variation in such a smoothed impulse. the wavelet smoothing produces the output z = P D2P T d . This is not an important diﬀerence because we could also make the DTCWT and NDWT results smoother by changing the scalings used.120 CHAPTER 7. The basic idea is simple: shift dependence means that smoothing a translated impulse will not be the same as translating a smoothed impulse. BAYESIAN MODELLING IN THE WAVELET DOMAIN Gaussian Pyramid 20 10 0 −10 20 10 0 −10 W Transform 20 40 60 80 100 Dual tree complex wavelet transform 20 40 60 80 Orthogonal real wavelet 100 20 10 0 −10 20 10 0 −10 20 40 60 80 100 Nondecimated real wavelet 20 40 60 80 100 20 10 0 −10 20 40 60 80 100 Figure 7. Crosses show location and values of measured data points.4 described the link between a wavelet smoothed impulse and the covariance function of surfaces produced by the wavelet generative model.
y)d k=1 E(x. 2K −1 2K −1 zk = 1/2 2K x=0 y=0 S(x. Suppose that we have a K level transform and that at level k all the scaling factors for the diﬀerent subbands are equal to σk . By averaging over all translations we can compute a shift invariant estimate that we call zave K 2K −1 2K −1 2 σk k=1 K x=0 y=0 zave = 1/2 2K S(x. y) = − zave . The inverse of this transform is S(x. y)d = k=1 2 σk zk where zk is the average result of reconstructing the data from just the scale k coeﬃcients. The amount of smoothing is determined by the diagonal entries of the matrix D. This allows us to decompose D as K D= k=1 Ek σk (7. y)T P Ek P T S(x. This will give approximately circular priors (the quality of the approximation is demonstrated in the next section). y) to be a matrix that performs a translation to the data of x pixels horizontally and y pixels vertically.4) The output of the wavelet smoothing is z = P D2P T d K = k=1 2 σk P Ek P T d Now deﬁne S(x. y)T P Ek P T S(x. y)T (assuming periodic extension at the edges).7. Deﬁne diagonal matrices Ek whose entries are (Ek )ii = 1 if the ith wavelet coeﬃcient belongs to a subband at level k and zero otherwise. y)d.4. E(x. y)T P Ek P T S(x. of the error between the wavelet smoothed image and the shift invariant estimate as K 2 2 σk S(x. CHOICE OF WAVELET 121 where d represents the input image (0 everywhere apart from a single 1 in the centre of the image). For a particular translation of the data of x. y). y pixels we deﬁne the energy.
0 35.4 29. y)d − zk we can write the energy Eave of the error averaged over all translations as K 2K −1 2K −1 4 σk 1/22K k=1 x=0 y=0 Eave ≈ ek (x. BAYESIAN MODELLING IN THE WAVELET DOMAIN K 2 2 σk = k=1 K S(x. y) 2. 2.7 K=2 6. It may seem strange to have the fourth power of σk .122 CHAPTER 7. and Eave using the above equations.4 32. If we deﬁne ek (x.6 ∞ 26.8 ∞ 18. y)d − zk where in the last step we have assumed that the errors from each scale will be approximately uncorrelated. Measure the amount of shift dependence f = Eave / zave 2 . ek (x.3: Shift dependence for diﬀerent scales/dB.8 ∞ 18. .6 The NDWT has a Figure 7.3. y)d − zk T T 2 ≈ k=1 4 σk S(x. zave . This is a consequence of weighting 2 by σk during the smoothing step. The results are converted to signal to noise ratios given by −10 log10 (f ).7 32. Evaluate zK . To give a quantitative estimate of the importance of shift dependence for diﬀerent priors we carry out the following procedure for K varying between 1 and 4: 1. The results of this experiment for the diﬀerent transforms are shown in table 7. Diﬀerent applications will have diﬀerent priors.5 32. 3. The error energy depends on the parameters σk and will tend to be dominated by the level k with the largest σk . y) to be be the error at scale k due to shift invariance. y) = S(x.9 35. y)T P Ek P T S(x. Transform DWT NDWT WWT DTCWT GPT K=1 6. Set σk = 0 for k = K and σK = 1. y) P Ek P S(x. y)T P Ek P T S(x.7 K=3 6.8 ∞ 13.1 K=4 6.8 ∞ 17.
the WWT only manages about 18dB while the DWT has a very poor performance with 7dB.4. CHOICE OF WAVELET 123 SNR of ∞ because this transform has no downsampling and is shift invariant. An quantitative example of the importance of this eﬀect is given in section 8. none of the other methods can produce priors that favour images containing correlations at angles near 45◦ without also favouring correlations at angles near −45◦ . However. Care must be taken when interpreting the SNR values tabulated. (Here we use blank to mean every pixel value is 0. in many applications it will be reasonable to assume that the prior for the images is isotropic and so one way of testing the ﬂexibility is to measure how close the covariance function is to being circularly symmetric.5 Flexibility If a prior is required to be anisotropic (i. The increased directionality of the DTCWT means that it is much more ﬂexible than any of the other methods for modelling anisotropic images.2 which explains the signiﬁcance of these measurements for a particular application (of interpolation). However. Similarly the multiple trees mean that there is eﬀectively no downsampling in the ﬁrst level of the DTCWT and it also has inﬁnite SNR for K = 1. For example. For each wavelet transform the following process is used: 1. the ﬁnal solution to a problem is based on the posterior density and the posterior is a combination of the likelihood and the prior. However.) .4. As in section 7.7.4.4 the covariance is calculated by a wavelet smoothing method applied to an impulse. 7dB corresponds to an shift dependence error energy of about 20% of the signal energy.7. 7. Each choice of scalings for the subbands implicitly deﬁnes the signal model as a stationary process with a certain covariance function. Both the higher scales of the DTCWT and the GPT have very low amounts of shift dependence with SNR levels around 30dB. Generate a blank image of size 64 × 64. some directions are allowed more variation than others) then we alter the model so that we can separately vary the scaling factors for each subband. In some circumstances the information in the likelihood can counteract the deﬁciencies in the prior to produce a good quality posterior. The measurements describe the degree to which the wavelet generative model produces a stationary prior.e.
The W transform. The Gaussian pyramid produces the most circularly symmetric covariance. The DWT has a signiﬁcantly noncircular covariance function. and 7. 40 35 60 30 25 20 15 10 5 0 −5 80 60 40 40 20 0 0 20 60 80 10 20 30 40 50 60 30 20 10 50 40 Figure 7. Invert the wavelet transform.5. 4. The exact values used for the scaling factors in this experiment are not crucial and are just chosen to give results large enough for the symmetry to be seen.7.7.7. 32) to have value 1. (The Gaussian pyramid and W transform methods have smoother lowpass ﬁlters and we change the scaling factors slightly in order to give a similar shaped covariance. The most important part of these diagrams is the section near the centre. Scale the low pass coeﬃcients by σ5 = 49 .124 CHAPTER 7. and the NDWT all produce reasonable approximations to circular symmetry.4: Covariance structure for an orthogonal real wavelet. . 6. 3. The ﬁnal image produced is proportional to the covariance function. 2 5. BAYESIAN MODELLING IN THE WAVELET DOMAIN 2. Scale the wavelet coeﬃcients at level l by σl2 = 42l .) The results are shown in ﬁgures 7.6. DTCWT. Set the pixel at position (32.8. For large distances the contours are not circular but at such points the correlation is weak and hence not as important. Wavelet transform the image using 4 levels.7.4.
4.5: Covariance structure for a nondecimated real wavelet.6: Covariance structure for the Gaussian pyramid. 35 30 25 20 15 30 10 20 5 10 0 80 60 40 40 20 0 0 20 60 80 10 20 30 40 50 60 60 50 40 Figure 7. CHOICE OF WAVELET 125 35 30 25 50 20 15 10 5 0 10 −5 80 60 40 40 20 0 0 20 60 80 10 20 30 40 50 60 40 30 20 60 Figure 7. .7.
.7: Covariance structure for the W transform. BAYESIAN MODELLING IN THE WAVELET DOMAIN 50 60 50 30 40 20 30 20 10 0 80 60 40 40 20 0 0 20 60 80 10 20 30 40 50 60 40 10 Figure 7.126 CHAPTER 7.8: Covariance structure for the DTCWT . 35 30 25 50 20 15 10 5 0 10 −5 80 60 40 40 20 0 0 20 60 80 10 20 30 40 50 60 40 30 20 60 Figure 7.
It is possible to improve the circularity of the wavelet transform results by adjusting the scaling factors for the ±45◦ subbands. shape. Based on this assumption. 28). we conclude that each of the four image models .5. and a ? indicates intermediate behaviour. A indicates good behaviour with respect to the property.9 shows the same experiment (still using the DWT) except that we change step 2 to act on the pixel at position (28. a × indicates bad behaviour.5 Conclusions The ﬁrst part of the chapter is based on the assumption that the chosen wavelet transform is shift invariant. 7. However.4.10 summarises the ﬂexibility and shift dependence properties of the diﬀerent trans√ forms.6 Summary of results Table 7. These subbands should be treated diﬀerently because their frequency responses have centres (in 2D frequency space) further from the origin than the centres of the other subbands at the same scale. CONCLUSIONS 127 Figure 7. The covariance changes to a diﬀerent (noncircular) 25 20 15 10 5 0 10 −5 80 60 40 40 20 0 0 20 60 80 10 20 30 40 50 60 60 50 40 30 20 Figure 7. these changes will do nothing to alleviate the problems of shift dependence found for the DWT.7.9: Covariance structure for a translated orthogonal real wavelet. 7.
10: Summary of properties for diﬀerent transforms discussed are equivalent to a wide sense stationary discrete Gaussian random process. . DTCWT. and GPT all possess reasonably isotropic covariance functions even without tuning the scaling for the ±45◦ subbands. The NDWT. • The wavelet direct speciﬁcation corresponds to a covariance function that transforms to an impulse when a wavelet sharpening operation is applied.128 CHAPTER 7. WWT. BAYESIAN MODELLING IN THE WAVELET DOMAIN Transform DWT NDWT WWT DTCWT GPT Shift invariant × √ ? √ √ Isotropic modelling × √ √ √ √ Anisotropic modelling × × × √ × Figure 7. The experiments in 1D and 2D suggest that these errors are relatively small for the DTCWT but large for the DWT. In particular we conclude that: • The Filter model corresponds to a process with covariance function given by the autocovariance function of the ﬁlter impulse response. • The wavelet generative speciﬁcation of the prior corresponds to a process with covariance function given by a wavelet smoothed impulse. For a shift dependent transform the wavelet generative prior model will be corrupted by aliasing.
We originally developed these methods for the determination of subsurface structure from a combination of seismic recordings and well logs. the method for fast conditional simulation. and one that allows eﬃcient Bayesian sampling of approximated surfaces from the posterior distribution. Finally we describe two reﬁnements to the method. We assume that a simple stationary process is an adequate prior model for the data but observations are only available for a small number of positions. the Bayesian interpretations of spline processing and minimum smoothness norm solutions. After a brief description of the problem area we place a number of diﬀerent interpolation and approximation techniques into the Bayesian framework. This framework reveals the implicit assumptions of the diﬀerent methods. Further details about the solution of this problem and the performance of the wavelet method can be found in [35]. 129 . The main original contributions are. and the method for trading speed and accuracy. one that increases speed at the cost of accuracy. the experimental measures of these qualities. We propose an eﬃcient wavelet approximation scheme and discuss the eﬀect of shift dependence on the results. the theoretical estimates for aesthetic and statistical quality. the proposed wavelet approximation method.Chapter 8 Interpolation and Approximation 8.1 Summary The purpose of this chapter is to explore the use of the DTCWT for Bayesian approximation and interpolation in order to illustrate the kind of theoretical results that can be obtained.
. we assume that the image is shifted to have zero mean. Section 8. There is no restriction on the relative positions (e. The approximation is called an interpolation in the special case when the estimated image is constrained to precisely honour the sample values. This is an example of an approximation problem. we do not assume x1 must be next to x2 ) and so this reordering does not reduce the generality of the analysis. Using the same conventions as in chapter 7 we use Z to represent the (unknown) contents of the image. Denote the S observations by Y ∈ R S and deﬁne a S by N matrix T as Tab = 1 a = b and a ≤ S (8.1) 0 otherwise When applied to a vector representing an image. . as before.130 CHAPTER 8. . In other words. Let Q ∈ representing the measurement noise: Q s N 0. we assume we have observations at the locations x1 .2 Introduction The task is to estimate the contents of an image from just a few noisy point samples. The prior distribution is Z s N (0. B) where B is a N × N covariance matrix and.g. The assumption that the process is stationary means that B can be expressed in terms of a covariance function R as Bab = R (xa − xb ) Suppose we have S observations at locations xs(1) . We also assume that the observations are all at distinct locations within the grid. xs(S) . . . It is notationally convenient to reorder the locations so that s(a) = a. xS . In this chapter we assume that the image is a realisation of a 2D (wide sense) stationary discrete Gaussian random process. the matrix T selects out the values corresponding to the observed locations. . The observations are given by Y = TZ + Q (8. σ 2 IS where σ 2 is the variance of the measurement noise.2) R S be a vector of random variables . .9 discusses the implications of the assumptions in this model. INTERPOLATION AND APPROXIMATION 8. .3) (8.
POSTERIOR DISTRIBUTION 131 In this chapter we will assume that both the covariance structure of the process and the variance σ 2 of the noise added to the samples are known. and Wiener ﬁltering. A completely Bayesian approach would involve treating the parameters of the model as random variables and setting priors for their distributions. In practical applications it is usually possible to estimate these from the sample values [110]. This problem has been extensively studied and many possible interpolation methods have been proposed. and inverse distance that work reasonably when the surfaces are smooth and there is little noise but are inappropriate otherwise. There are several crude methods such as nearest neighbour.3 describes the form of the posterior distribution for the problem. 8. We will deﬁne the basic problem (with known covariance) to be “stationary approximation” (or “stationary interpolation” when σ = 0) but we will usually shorten this to simply “approximation” (or “interpolation”). linear triangulation. Splines [119].3 Posterior distribution Suppose we wish to obtain a point estimate for the random variable Zk corresponding to location xk (we assume that there is no available observation at this location). Radial Basis Functions [97]. Strictly speaking our methods should be described as empirical Bayes because the prior is based on estimated values.3. A reasonable estimate is the mean of the posterior distribution. Let D ∈ R S be the vector of . However. This chapter describes a wavelet based Bayesian method for (approximately) solving the stationary approximation problem and shows how a number of the alternative techniques are solutions to particular cases of the problem. A description of a fully Bayesian approach to the problem can be found in the literature [12]. Appendix B proves that the posterior distribution for such a point given observations Y = y is Gaussian and gives an expression for the mean. The ﬁrst part of this chapter discusses the alternative techniques from a Bayesian perspective. As a ﬁrst step towards relating the techniques section 8. The second part describes the wavelet method and experimental results. Let C be the S by S leading submatrix of the covariance matrix B that expresses the correlations between the observation locations. this introduces further complications during inference that would distract from the main aim of evaluating complex wavelets. There are also more advanced methods such as Kriging [13].8.
132 CHAPTER 8. . Da = E Zs(a) Zk = Bs(a). .4 8.1 Approximation techniques Kriging Kriging is a collection of general purpose approximation techniques for irregularly sampled data points [13]. We express the estimate in this form because λ (and hence Λ) does not depend on the location being estimated and therefore point estimates for every location are simultaneously generated by the ﬁltering of Λ. known as Simple Kriging. The equation for the estimate can be interpreted as ﬁltering the image Λ with the ﬁlter h(x) = R(−x) and then extracting the value at location xk . 8. INTERPOLATION AND APPROXIMATION covariances between the observation locations and location xk .4.k = R(xs(a) − xk ) = R(xa − xk ) ˆ Appendix B shows that the estimated value Zk is given by ˆ Zk = E {Zk y} = DT σ 2 IS + C If we deﬁne a vector λ ∈ R S as λ = σ 2 IS + C then we can express the estimate as ˆ Zk = DT λ S −1 −1 y (8. considers an estimator Kk for . .4) y = a=1 N R(xa − xk )λa R(xa − xk )Λa a=1 = where Λ ∈ R N is a vector whose ﬁrst S elements are given by λ1 . Λ represents an image that is blank except at the observation locations. λS and whose other elements are all zero. Its basic form. .
. APPROXIMATION TECHNIQUES 133 the random variable Zk that is a linear combination of the observed data values: Kk = w T Y where w is a S × 1 vector containing the coeﬃcients of the linear combination associated with position xk . The covariance assumption means that we know E {Zk Ya } and E {Ya Yb } for a.4. Now consider the approximation problem again. . As before we suppose that the data has been preprocessed so that the mean is zero. The technique is based on the assumption that the mean and covariance structure of the data is known. More precisely. However. The expected energy of the error F is given by: F = E (Kk − Zk )2 S S S = E a=1 S S wa Y a b=1 wb Y b − 2E a=1 S wa Y a Zk + E {Zk Zk } = a=1 b=1 wa E {Ya Yb } wb − 2 a=1 wa E {Ya Zk } + E {Zk Zk } F is minimised by setting ∇w F = 0 ∂F =2 E {Ya Yb} wb − 2E {Ya Zk } = 0 ∂wa b=1 This gives a set of S linear equations that can be inverted to solve for w.8. there is suﬃcient information to calculate the estimator with the nicest properties among the resticted choice of purely linear estimators. there is no assumption that the data is necessarily distributed according to a multivariate Gaussian. . In particular. However. . S}. w is chosen to achieve the minimum expected energy of the error between the estimate and the true value. We can calculate E {Zk Ya } = E {Zk (Za + Qa )} = E {Zk Za } + E {Zk } E {Qa } = E {Zk Za } = Da S (8. b ∈ {1.5) . It is impossible to calculate the posterior distribution because the precise prior distribution is unknown. this is all that is assumed about the prior distribution of the data.
An interpolated image based on RBFs is assumed to be a linear combination of S functions where the functions .134 CHAPTER 8. If a = b then similarly E {Ya Yb } = E {(Za + Qa ) (Zb + Qb )} = E {Za Zb } = Ca.4.b + σ 2 Using these results we can rewrite equation 8. INTERPOLATION AND APPROXIMATION where we have used the fact that Zk and the measurement noise Qa are independent and that the noise has zero mean.e.2 Radial Basis Functions Recall that the Bayesian solution can be implemented by placing weighted impulses (in an otherwise blank image) at the sample locations and then ﬁltering this image with a ﬁlter whose impulse response is given by the covariance function. for interpolation) the S weights take precisely the values needed to honour the S known data values (for simplicity we ignore the possibility of these equations being degenerate).5 in matrix form as 2 C + σ 2 IS w − 2D = 0 and hence deduce that the solution is w = C + σ 2 IS −1 D The estimator corresponding to this minimising parameters is Kk = DT C + σ 2 IS −1 y This estimator is exactly the same as the Bayesian estimate based on a multivariate Gaussian distribution. We have shown the wellknown [4] result that if the random process is a multivariate Gaussian then the simple Kriging estimate is equal to the mean of the posterior distribution. The link with Radial Basis Functions (RBFs) is straightforward.b while if a = b E {Ya Yb } = Ca. 8. Additionally. in noiseless conditions (i.
Using this notation we can write: ˆ Z = argminZ∈Ω ZT Z (8. Deﬁne D to be a N by N diagonal matrix whose ath diagonal entry Daa is 1 if the ˆ corresponding frequency is within the allowed band. (IN − D) F Z = 0} In order to show the equivalence it is convenient to introduce two additional variables α and σ that represent the degree to which the constraints are imposed. The weights in the linear combination are chosen to honour the known values. Now consider expanding the following expression (a(IN − D) + D) −1 2 FZ 2 = 1 (IN − D) + D F Z a . Let Z denote the estimate produced by this bandlimited interpolation. 8. but zero otherwise.6 can be rewritten as 1 ˆ Z = lim lim argminZ∈RN ZT Z + 2 α→0 σ→0 σ 1 T = lim lim argminZ∈RN Z Z + 2 α→0 σ→0 σ 1 T = lim lim argminZ∈RN Z Z + 2 α→0 σ→0 σ 1 = lim lim argminZ∈RN 1 + 2 α→0 σ→0 α 1 (IN − D) F Z 2 2 α 1 2 T Z − y + 2 ZH F H (IN − D) (IN − D) F Z α 1 1 2 T Z − y + 2 F Z 2 − 2 DF Z 2 α α 1 1 ZT Z + 2 T Z − y 2 − 2 DF Z 2 σ α TZ − y 2 + where we have made use of the identity 2D − D 2 = D 2 .4.6) where Ω is the space of images that are both band limited and honour the known observations. In practice exactly the same equations are used to solve RBF and Kriging problems and the equivalence between these techniques is wellknown.3 Bandlimited interpolation Another approach is to assume that the image is bandlimited (only contains low frequencies) and then calculate the bandlimited image of minimum energy that goes through the data points [37].4. As before let F be a N by N matrix that represents the (2 dimensional) Fourier transform. APPROXIMATION TECHNIQUES 135 are all of the same shape and centered on the S known data points [97]. Ω = {Z : T Z = y.8. Assuming that the limits are well behaved then equation 8.
In the limit σ → 0 the ˆ measurement noise is reduced to zero and Zσ. INTERPOLATION AND APPROXIMATION 1 (IN − D) F Z 2 + DF Z 2 a2 1 = 2 ZH F H (IN − D) (IN − D) F Z + DF Z a 1 1 = 2 ZT Z + 1 − 2 DF Z 2 a a 1 1 = 1 + 2 ZT Z − 2 DF Z 2 α α = √ where a = α/ 1 + α2 .2 with a coeﬃcient weighting matrix Dα = a (IN − D) + D.4. Additionally.3. Section 7. which in turn is equivalent to the multivariate Gaussian model.2 shows that this is equivalent to the multivariate Gaussian with a covariance function given by a lowpass ﬁltered impulse (which will be oscillatory due to the rectangular frequency response). They write that . 8. in the limit as α → 0. the prior parameters for the Fourier model tend to Dα = D. Finally.136 CHAPTER 8. for a Gaussian density function the MAP estimate is equal to the posterior mean estimate.α = argminZ∈RN 2 T Z − y σ 2 −1 2 + α √ (IN − D) + D 1 + α2 FZ is equal to the MAP (Maximum A Posteriori) estimate using the Fourier model. 2 Finally consider the Fourier model from section 7. The previous algebra proves that the RHS of this equation is equal to the expression within the earlier minimisation and we conclude that the estimate 1 ˆ Zσ. If we deﬁne the prior pdf for Z with this Fourier model and assume that we have white measurement noise of variance σ 2 then Bayes’ theorem can be used to show that: −2 log (p(Zy)) + k(y) = 1 TZ − y σ2 2 + Dα F Z 2 where k(y) is a function of y corresponding to a normalisation constant.α becomes the interpolation solution.3.4 Minimum smoothness norm interpolation Choi and Baraniuk [23] have proposed a waveletbased algorithm that ﬁnds the signal that goes through the data points with minimum norm in Besov spaces. We conclude that the estimate produced by the bandlimited interpolation is equivalent to interpolation for a particular Fourier model prior.
In practical algorithms the choice is not important because the scaling coeﬃcients are generally preserved.8) Let Z be a vector of signal samples from the (preﬁltered) continuoustime signal f (t).k is the value of the k th scaling coeﬃcient. q are hyperparameters of the norm.k  αj 2 (8. f α W2 (L2 (I)) = uj0. We give the deﬁnitions of Besov and Sobolev norms ﬁrst in the notation of the paper [23] and then rewrite the deﬁnitions in the notation of this dissertation.k 2 wj. 3. 4. uj0 .k is the value of the k th wavelet coeﬃcient at scale j. It is necessary in order for the wavelet coeﬃcients of Z to match the wavelet coeﬃcients of f (t). We do not consider preﬁltering in this dissertation because we assume that data will always be 1 There are a number of diﬀerent treatments of the scaling coeﬃcients in Besov norm deﬁnitions. APPROXIMATION TECHNIQUES 137 “the interpolated signal obtained by minimumsmoothness norm interpolation is diﬃcult to characterize. .7) q/p q p = uj0 . We choose the given deﬁnition as being the most natural.8. In this case the Besov norm is called the Sobolev norm and is written 1/2 p p α Bq (LP (I)) α Bq (LP (I)) for a continuoustime signal f (t).k 2 2 + j≥j0 .4. p.2) and an error term caused by shift dependence. 1] is 1/q (8. In the original notation1 the Besov norm f deﬁned as f where 1. α. 5. t ∈ I = [0. wj. The paper concentrates on the special case of p = q = 2. Preﬁltering is used when converting from continuoustime to discretetime.k + j≥j0 k 2αjp 2j(p−2) wj. The Lp (I) space is the set of all functions on I with bounded norm f 2. even if the noise in the signal samples is white Gaussian”.k p = I f (t)p dt. The scale j0 represents the coarsest scale under consideration. This section brieﬂy describes their algorithm and suggests a characterisation of the interpolated signal as the sum of the posterior mean estimate for stationary interpolation (as described in section 8.
We conclude that the minimum smoothness norm interpolation is equivalent to solving the stationary interpolation problem (with covariance function given by an inverse wavelet sharpened impulse) with the quality of the solution determined by the amount of shift dependence. INTERPOLATION AND APPROXIMATION provided in sampled form. The algorithm [23] found (using a least squares calculation) the wavelet coeﬃcients with minimum Sobolev norm that interpolated the known points. It may be thought that this is unfair.138 CHAPTER 8. As before we deﬁne the wavelet coeﬃcients (from a one dimensional wavelet transform) to be given by w = W Z. Further details can be found in the literature [113. Recall that the wavelet direct speciﬁcation deﬁned the prior as p(Z = z) ∝ exp − 1 DW z 2 2 The minimum smoothness norm solution is therefore equivalent to selecting the highest probability image that honours the observations. that MSN should really be considered as performing nonGaussian interpolation and that what we have called the “error” due to shift dependence . 23].3.9) Although we have deﬁned the Sobolev norm in terms of the one dimensional transform.3 that for an orthogonal transform the covariance function can also be expressed in terms of a wavelet smoothed impulse. the same equation describes the norm for two dimensional wavelet transforms if we use the earlier notation for which Z is a vector representing an entire image. Also recall from section 7. Section 7. We have described the minimum smoothness norm (MSN) solution as a Bayesian solution to stationary interpolation plus an error caused by shift dependence. Armed with this notation the Sobolev norm can be expressed as f α W2 (L2 (I)) = DW Z (8.3 showed that this is equivalent to a stationary Gaussian discrete random process assuming that the wavelet transform is suﬃciently shift invariant. We conclude that the solution is equal to the MAP estimate using the wavelet direct speciﬁcation to generate the prior.3. We now deﬁne a diagonal matrix D to have diagonal entries Daa = 1 if wa is a scaling coeﬃcient and Daa = 2αk if wa is a scale k wavelet coeﬃcient. The published paper made use of a fully decimated wavelet transform. and W represents a twodimensional transform. Later we display experiments comparing the performance of the DWT with alternative transforms to show the considerable eﬀect of shift dependence.
More precisely. APPROXIMATION TECHNIQUES 139 is actually an additional term that makes the technique superior to standard methods. we show that when all the origin positions are considered the average solution will always be a better estimate than using the basic MSN solutions.8. The proof is straightforward. In this context we measure the quality of a solution by means of the energy of the error. Therefore the solution provided by this method should be considered as the average over all positions of the origin. Let Z represent the true values of the signal (or image – this ˆ proof is valid for both signals and images) and let Zi represent the MSN estimate for the ˆ ith origin position (out of a total of NO possible positions). The average solution Z0 is deﬁned as 1 ˆ Z0 = NO NO ˆ Zi i=1 ˆ Then it is required to prove that the energy of the error for the average solution Z − Z0 2 ˆ is always less than the average energy for the individual solutions 1/NO NO Z − Zi 2 . i=1 The error for the individual solutions can be written as 1 NO NO ˆ Z − Zi i=1 2 = 2 1 NO = 1 NO NO ˆ ˆ ˆ Z − Z0 + Z0 − Zi i=1 NO ˆ Z − Z0 i=1 2 2 ˆ ˆ + Z0 − Zi ˆ ˆ Z0 − Zi 2 2 ˆ + 2 Z − Z0 T ˆ ˆ Z0 − Zi T NO = ˆ Z − Z0 ˆ Z − Z0 ˆ Z − Z0 1 + NO + 1 NO NO i=1 NO 2 ˆ Z − Z0 + NO ˆ ˆ Z0 − Zi i=1 = ≥ 2 ˆ ˆ Z0 − Zi i=1 2 2 .4. Our defence to this criticism is that there is no prior information about the absolute location of signal features and shifting the origin location should not aﬀect the output. This last statement may need a little further support as it could be argued that the average always gives smooth answers while MSN will be able to model discontinuities better. We ﬁnish this section by proving that the accusation will never hold. plus an error term due to the particular choice of origin position used in the algorithm.
4 shows that the 2 solution involves inverting the matrix C + σM IS .8. Bsplines have been proposed for solving both interpolation and approximation problems. To generate conditional simulations of a surface they use the method described in section 8. the matrix C can be written as P D 2 P T (using the notation of section 7. but usually the convergence is much faster than this. It does not claim that the average solution will be a good solution. . However.3.5 Large spatial prediction for nonstationary random ﬁelds Nychka et al. The main source for the description of splines is [119] while the Bayesian interpretation is original. and this variation produces nonstationary surfaces. [88] have proposed a method for dealing with nonstationarity by using the W transform basis [67]. INTERPOLATION AND APPROXIMATION where the inequality in the last line is strict unless the transform is shift invariant. Such an algorithm only uses forward multiplication by C and so can be much more eﬃcient than inverting C.6 Comparison with Spline methods This section discusses the link with Bspline methods. If C is a square matrix of width S then in the worst case the gradient algorithm takes S steps to converge to the solution. especially if some preconditioning methods are used. Nychka allows the weighting factors to vary within a single scale.1 which requires a Krigingstyle interpolation to be performed for each realisation. 8. We have therefore shown that the energy of the error will always be greater (when averaged over all origin positions) if the MSN solution is used rather than the smoothed solution. 8. but it does show that it will always be better than the shift dependent solutions.140 CHAPTER 8. We ﬁrst give an overview of the technique and then discuss the methods from a Bayesian perspective. then to generate P realisations they will require 2KP wavelet transforms. in particular note that no assumption had to be made about the true prior distribution of the data. The result is equally valid for good and bad models. This matrix can be very large and so is hard to invert. Nychka makes use of this result by solving using a conjugate gradient algorithm.4.4. If their method takes K iterations of the conjugate gradient algorithm to converge. Equation 8.4) and so multiplication by C can be eﬃciently calculated using wavelet transform techniques. Note that this proof is very general.
This combined ﬁlter is called a cardinal spline (or sometimes the fundamental spline) of order n and converges to a sinc function that eﬀectively performs bandpass ﬁltering of the signal to remove aliased components of the signal. and we attempt to describe the prior model for the signal that would produce the same estimates.8. Higher order splines converge to a Gaussian “bell” shape. We discuss each of the three (one for interpolation. The ﬁrst is called smoothing splines approximation and involves minimising the energy of the error in the approximation plus an additional energy term. A zero order spline will therefore be the square pulse and a ﬁrst order spline will be a triangular pulse. . These coeﬃcents can be calculated by applying a simple IIR ﬁlter to the data. From a signal processing point of view Bsplines can be produced by repeatedly ﬁltering a centred normalized rectangular pulse of width d with itself. This second energy term is the energy of the r th derivative of the approximation. Each technique produces an estimate for the true signal. and then solving for the least energy of the error. a Bspline of order n can be generated by convolving this rectangular pulse with itself n times. The Bsplines can be used for either interpolation or approximation. For high order splines this combination of the IIR ﬁlter with the Bspline ﬁltering can be viewed as a single ﬁltering operation applied to the original data points (also represented as impulses). For interpolation the problem is to choose the spline coeﬃcents so that the reconstructed signal passes through the data points.4. For the uniform sampling case the width d is chosen to be the distance between data points and the height so that the spline has total area equal to 1. The second technique is a least squares approximation and is derived by restricting the number of spline coeﬃcients that are to be used to generate the approximation. APPROXIMATION TECHNIQUES 141 Bspline A Bspline is a continuous piecewise polynomial function. two for approximation) main techniques from a Bayesian perspective. More precisely. There are two main techniques for approximation. If the coeﬃcients are represented as delta functions (of area equal to the value of the coeﬃcients) at the appropriate locations it is possible to construct the interpolation by ﬁltering with the Bspline function.
the smoothing spline g (x) of order 2r − 1 is ˆ deﬁned as the function that minimizes +∞ 2 S +∞ −∞ = k=−∞ (g(k) − g (k))2 + λ ˆ ∂ r g (x) ˆ ∂xr 2 dx (8. while the analogy is reasonable when close to the sample points. Therefore for high orders the high frequencies become less and less likely a priori and the solution will use the lowest frequencies possible that satisfy the data points. Bspline ﬁltering of order n is given by smoothing with a unit pulse n + 1 times. INTERPOLATION AND APPROXIMATION Interpolation The interpolation solution of order n consists of a spline with knot points at the data points. The similarity to the Bayesian estimate described in section 8.142 CHAPTER 8. This solution will therefore converge to the bandlimited signal and hence the cardinal splines will tend to the sinc interpolator. Smoothing splines Given a set of discrete signal values {g(k)}. Therefore. except for very careful choices of the data values. However. it is inappropriate at long distances. Schoenberg has proved the result that the minimising function (even for the general case of nonuniform sampling) is a spline of order n = 2r − 1 with simple knots at the data points[105]. the smoothing spline estimate at long distances will tend to inﬁnity. The Bayesian interpretation is that the prior for the signal is a zero mean wide sense stationary discrete Gaussian random process with covariance function equal to the Bspline function of order n.3 is clear in that the solution is produced by ﬁltering impulses at the data points with the weights chosen by the requirement of exactly ﬁtting the data.10) where λ is a given positive parameter. By analogy with the interpolation case it is tempting to think that smoothing splines will correspond to a random process prior with covariance function equal to a 2r − 1 Bspline and measurement noise depending on λ. The problem is that the integral is zero for polynomials of degree less than r and hence there is no prior information about the likely low order shape of the signal. For the least . Least squares Least squares techniques are equivalent to the Bayesian maximum a posteriori (MAP) estimate when we assume that there is a ﬂat prior for each parameter.
However. a Bspline at the end of the set will contain at least one location k such that 1. The least squares spline approach is an approximation method and hence we conclude that it must be shift dependent.e. if we are allowed to use all N Bspline coeﬃcients then our previous claims show that we can make any image we want.4. this “end” Bspline is nonzero at location k. including an image that interpolates all the observations. This contradiction proves that the Bsplines are linearly independent. First note that the Bsplines are linearly independent. The least squares approach can be thought of as calculating a MAP estimate based on this prior and the observation model of additive white Gaussian noise. In particular. The consequence of the linear independence of Bsplines is that we must be able to model any image if we can choose the values of all N Bspline coeﬃcients. The least squares approach models the data using a set of K spline coeﬃcients where K is less than the number of observations. In other words. Clearly the coeﬃcient of the “end” Bspline must be zero to avoid having a nonzero value at location k and hence we can construct a smaller set by excluding the “end” Bspline. not all coeﬃcients equal to zero) linear combination that is equal to zero at all locations. Next suppose that we have a discrete grid containing N locations. We have argued that a shift invariant least squares method must interpolate the observations.8. APPROXIMATION TECHNIQUES 143 squares spline approach we can also roughly interpret the restricted choice of coeﬃcients as indicating that we know a priori that the original image should be smooth and contain only low frequencies. There is an interesting way of seeing that such an estimate cannot be shift invariant. Now suppose we have the smallest nonempty set of Bsplines that possesses a nontrivial (i. This is clear because in any set of Bsplines we can always ﬁnd an “end” Bspline whose support is not covered by the rest of the splines. If the following is confusing then it can be safely ignored. . If the prior is shift invariant then the model must also include the cases of nonzero coeﬃcients centered on any location. note that the model includes the case of an image generated from a single nonzero spline coeﬃcient. The explanation is somewhat convoluted and not needed for the rest of this disseratation. all the other Bsplines are zero at location k. 2.
a rigorous treatment would need to consider edge eﬀects). Recall that the wavelet generative speciﬁcation uses the prior Z s N 0. y2 . Now we wish to derive the posterior distribution for the wavelet coeﬃcients. . We can now use Bayes’ theorem p(wy) = p(yw)p(w) p(y) ∝ p(yw)p(w) The likelihood p(yw) is the pdf that the measurement errors are y − T P w. D 2 where w is a column vector containing all the wavelet coeﬃcients (with the real and imaginary parts treated as separate real coeﬃcients). Suppose we have measurements y1 . . .5 Wavelet posterior distribution We now change track and describe an approximation scheme based on the wavelet generative model for images (see 7.4). The following sections will describe an eﬃcient solution. INTERPOLATION AND APPROXIMATION This argument is included for interest only and is not meant to be a rigorous mathematical argument (for example. and so we can write the likelihood as p(yw) ∝ exp − 1 (y − T P w)T (y − T P w) 2 2σ . P D 2P T where P represents the wavelet reconstruction transform and D a diagonal weighting matrix. Let T be a matrix of ones and zeros that extracts the values at the S measurement locations. . In wavelet space the wavelet generative speciﬁcation corresponds to the prior w s N 0. This section derives the posterior distribution for the images using Bayes’ theorem. Instead of modelling the values at every point on the surface.3. and show results of some numerical experiments that test our predictions. discuss the choice of wavelet. yS which we stack into a column vector y and that the measurement noise is independent and Gaussian of variance σ 2 and mean zero. with the surface indirectly deﬁned as the reconstruction from these wavelet coeﬃcients. 8. it is better to model the wavelet coeﬃcients directly.144 CHAPTER 8.
8.6.11) (8. Solve for the posterior mean estimates of the wavelet coeﬃcients.6 Method for wavelet approximation/interpolation This section describes the wavelet method for estimating an image given a set of data points and estimates of the measurement noise and the covariance structure of the image. Reconstruct an estimate for the entire image. 8.12) and so we have shown that the posterior distribution for the wavelet coeﬃcients is a multivariate Gaussian with mean a and variance A−1 . 3. . 4. Calculate the responses at the measurement locations to impulses in the wavelet coeﬃcients (matrix T P ). 2. Calculate which wavelet coeﬃcients are important. The following sections describe how each of these steps can be performed eﬃciently. Estimate the amount of energy we expect within each scale. METHOD FOR WAVELET APPROXIMATION/INTERPOLATION 145 The prior for the wavelet coeﬃcients is a multivariate Gaussian distribution of mean zero and variance D 2 and so the prior pdf can be written as 1 p(w) ∝ exp − wT D −2 w 2 We can then calculate the posterior and use lemma 3 of appendix B to simplify the equations p(wy) ∝ exp − 1 1 (y − T P w)T (y − T P w) exp − wT D −2 w 2 2σ 2 1 ∝ exp − wT D −2 + P T T T T P/σ 2 w + wT P T T T y/σ 2 2 1 ∝ exp − (w − a)T A(w − a) 2 where A = D −2 + P T T T T P/σ 2 a = A−1 P T T T y/σ 2 (8. The method uses the following ﬁve steps: 1. 5.
If the wavelets used in this method have some negative values then it is possible for this method to miss important coeﬃcients if the responses from diﬀerent points cancel out.3. and select the choice that did best. A better method uses a result from section 7. INTERPOLATION AND APPROXIMATION 8. and in particular its mean value will be zero. We can greatly reduce the dimension of the problem by leaving out all such unimportant wavelet coeﬃcients. .146 CHAPTER 8. suppose we have a prior estimate for the covariance structure of the data.4 that proves that the covariance will be given by a simple combination of the covariances from each wavelet subband. measure the covariance for each image. This process of model ﬁtting and more sophisticated approaches can be found in [110]. The choice will depend on the prior information we have available and will therefore depend on the application.6. Nonzero coeﬃcients mean that the coeﬃcient is important. For example.3 describes how alternative values permit a trade between accuracy and results. One quick way to determine the important coeﬃcients is to transform such an importance image containing ones at the measurement locations and zeros elsewhere. This threshold is initially 0 but section 8.6. The covariance for each subband can be calculated from the autocorrelation of the corresponding reconstruction wavelet and a simple least squares method will allow a good choice of scaling factors to be found.2 Important wavelet coeﬃcients It is clear that if the support of a wavelet coeﬃcient does not overlap with any of the data points then the wavelet coeﬃcient does not aﬀect the likelihood of the measured data and so it will have a posterior distribution equal to its prior distribution.1 Estimating scale energies We need to calculate the diagonal matrix D that deﬁnes our prior model for the image. One very crude method of choosing D would be to generate images according to the wavelet generative model for several choices of D. We deﬁne an importance image to be an image that is zero except at the measurement locations. 8. Any additional important coeﬃcients found can be added to the list. This is unlikely to happen but can be partially guarded against by transforming another importance map containing random positive numbers between 1 and 2 at the measurement locations.8. We will deﬁne the important coeﬃcients as those whose absolute value is greater than some threshold.
2 contains a theoretical discussion about the probable signiﬁcance of shift dependence. This means that we can generate a single lookup table for each subband which allows us to generate the i. 8.8.7.5 Reconstruct image estimate The reconstruction is a straightforward application of the inverse wavelet transform using the values in a to determine the important coeﬃcient values.7.11 and then solve the equations Aa = P T T T y/σ 2 using Gaussian elimination which is fast for sparse matrices [98]. Section 8.7. Section 8.6.1 Speed The main computational burden is the solution of the linear equations.4 discusses the results of these comparisons. This gives a fast generation of T P .3 Impulse responses The impulse responses for the reconstruction ﬁlters depend only on which subband is being inverted.7. and 0 in all the unimportant coeﬃcients.4 discussed some general principles concerning the choice of transform in the wavelet generative speciﬁcation. while section 8. CHOICE OF WAVELET 147 8.7 Choice of wavelet Section 7.4 Solving for wavelet coeﬃcients Using sparse matrix methods we can quickly generate the matrix A directly from equation 8.6.7.6. The number of equations is given by the number of measurement locations plus the number of important . This section examines the eﬀect on the speed of the method and investigates the signiﬁcance of shift dependence. j element of T P by calculating the position of the ith data point relative to the j th wavelet coeﬃcient and accessing the lookup table at this relative position. 8.3 contains some experimental results that test the predictions of the theory. 8. 8.
measures quality relative to the actual surface being estimated.1 shows the results. These contours are nice in the sense that they do not have the artefacts (of arbitrary deviations in the contour) seen in ﬁgure 7.2 Shift Invariance In this section we use simple approximations to predict the eﬀect of shift dependence on the quality of the interpolated results. Transform DWT NDWT WWT DTCWT GPT number of important coeﬃcients 1327 40860 372 4692 540 Figure 8. Even the shift invariant estimate will only approximate the unknown surface and it may be the case that the errors due to this statistical uncertainty are much . It is important to distinguish two types of quality. INTERPOLATION AND APPROXIMATION wavelet coeﬃcients. measures quality relative to the shift invariant solution produced with a nondecimated version of the wavelet transform. Table 8.4 produced by aliasing. but the nondecimated system will have very many more.1: Count of important coeﬃcients for diﬀerent transforms 8.148 CHAPTER 8. If SN DW T is the energy of the shift invariant estimate (we asssume that the mean is 0) and Eshif t is the average energy of the error for a shift dependent estimate (relative to the shift invariant estimate) then the aesthetic quality is deﬁned in decibels as QA = 10 log10 SN DW T Eshif t We call this the aesthetic quality because the shift invariant methods tend to give the nicest looking contours (of constant intensity). the statistical quality. The ﬁrst type. To illustrate this we generated 20 random sample locations for a 128 by 128 image and counted the number of important coeﬃcients for a 4 scale decomposition. the aesthetic quality. All the decimated systems will have a similar number of important coeﬃcients.7. Notice that the lack of subsampling in the NDWT produces about ten times more important coeﬃcients than the DTCWT and will therefore be much slower. The second type.
8.7. CHOICE OF WAVELET
149
larger than the errors due to shift dependence. If SN DW T is the energy of the shift invariant estimate and EN DW T is the average energy of the error between the shift invariant estimate and the surface being estimated then the statistical quality is deﬁned in decibels as QS = 10 log10 SN DW T EN DW T + Eshif t
To predict these qualities we need to estimate a number of energies. First consider the simple interpolation case when the mean of the data is 0 and the correct value is known at a single point. We have already claimed that: 1. The (posterior mean) estimate will be a scaled version of the covariance function (section 8.3). 2. The covariance function for the wavelet method will be given by smoothed impulses (section 7.3.4). It is therefore reasonable to expect the aesthetic quality (the degradation caused by shift dependence) to be equal to the measured degradation for smoothed impulses given by the values in table 7.3. Now consider the multiple data point case. Widely spaced sample locations will naturally lead to a proportionate increase in the shift dependence error energy. However, there are two main reasons why this may not hold for closer points: 1. The errors may cancel out. 2. There is a correction applied to the size of the impulses so that the interpolated image will honour the known values. The ﬁrst reason may apply when there are two points close together. The aliasing terms could cancel out to give a lower error, but it is just as likely that they will reinforce each other and give an even higher energy error than the sum. In general this eﬀect is not expected to greatly change the shift dependence error. The second reason is more important. The correction ensures that there will be zero error at the sample values. This zero error has two consequences; ﬁrst, that there is no uncertainty in the value at the point and, second, that naturally there is zero shift dependence error at the point. This eﬀect will reduce the amount of shift dependence error as the density of data points increases. In the limit when we have samples at every
150
CHAPTER 8. INTERPOLATION AND APPROXIMATION
location then there will be no shift dependence error at all as the output is equal to the input. Although the precise amount of error will depend on the locations of the sample points and covariance function it is still possible to obtain a rough estimate of the amount of error that reveals the problem. Consider solving an interpolation problem with a standard decimated wavelet transform for a grid of N by N pixels. At level k and above there will be N 2 41−k wavelet coeﬃcients. Now suppose that we have N 2 41−p sample points spread roughly evenly across our grid (for some integer p ≥ 1). For a standard decaying covariance function these points will deﬁne the coarse coeﬃcients (those at scales k > p) fairly accurately but provide only weak information about the more detailed coeﬃcients. The coeﬃcients at scale k = p will have on average about one sample point per coeﬃcient. These coeﬃcients will therefore tend to produce about the same amount of shift dependence as in the single sample case weighted by the proportion of energy at scale p. The statistical uncertainty in the estimates will be roughly the amount of energy that is expected to be found in the coeﬃcients at scale p and the more detailed scales. Let Ep be the expected energy of the coeﬃcients at scale p. Let E≤p be the total expected energy of the coeﬃcients at scales 1 to p. Let r be the ratio Ep /E≤p . This ratio will be close to one for rapidly decaying covariance functions. The discussion above suggests that, approximately, the statistical uncertainty will correspond to a noise energy of E≤p , while the shift dependence will correspond to a noise energy of f Ep where f is the measure of the amount of shift dependence for the transform. An estimate for the aesthetic quality is therefore: QA ≈ 10 log10 SN DW T f Ep SN DW T = 10 log10 f rE≤p ≈ Q0 − 10 log10 f − 10 log10 r
where Q0 ≈ 10 log10 (SN DW T /E≤p ) is the expected statistical quality of the shift invariant estimate. The aesthetic quality is therefore predicted to be the values in table 7.3 with an oﬀset given by the statistical quality of the estimate plus a constant depending on r. The oﬀset is the same whatever the choice of transform and hence the diﬀerent transforms should
8.7. CHOICE OF WAVELET
151
maintain a constant relative aesthetic quality. For example, the table gives a value for −10 log10 f of about 32dB for the DTCWT, but only 6.8dB for the DWT and we would therefore expect the aesthetic quality for the DTCWT to be about 25dB better than for the DWT. As the density of points increases the statistical quality and hence the aesthetic quality will also increase. By using the approximation log(1 + x) ≈ x (valid for small x) we can also write a simple approximation for the statistical quality: QS = 10 log10 SN DW T E≤p + f Ep SN DW T E≤p = 10 log10 E≤p E≤p + f rE≤p = Q0 − 10 log10 (1 + f r) 10 fr ≈ Q0 − log 10
In order to judge the signiﬁcance of this we must know values for f and r. The measure of shift dependence f has been tabulated earlier converted to decibels. This is convenient for the aesthetic quality formula, but for the statistical quality we need to know the precise value of this factor. For convenience, the actual values are shown in table 8.2. Transform DWT NDWT WWT DTCWT GPT K=1 0.21 0 0.044 0 0.0021 K=2 0.21 0 0.018 0.0011 0.0005 K=3 0.21 0 0.015 0.0005 0.0003 K=4 0.21 0 0.014 0.0006 0.0003 Now
Figure 8.2: Shift dependence for diﬀerent scales. suppose that r = 1/2. This is roughly the value for the covariance function plotted in ﬁgure 7.4 because at each coarser level there are four times fewer coeﬃcients, but σl2 is eight times larger. There is therefore approximately twice the energy at the next coarser level than just the previous level, and hence approximately equal energy at the next coarser level to all the previous levels. Substituting for the values in equation 8.13 allows the prediction of the reduction in statistical quality caused by shift dependence. For the DWT the predicted reduction is
152
CHAPTER 8. INTERPOLATION AND APPROXIMATION
−(10/ log(10))(0.21)(0.5) = −0.46dB while for the WWT it is about −0.03dB and for the DTCWT it is only about −0.001dB. These estimates are not very trustworthy due to the large number of approximations used to obtain them, but they do suggest that we should expect shift dependence to cause a signiﬁcant decrease in both statistical and aesthetic quality when the standard wavelet transform is used.
8.7.3
Experiments on shift dependence
We measured the statistical and aesthetic qualities for the DTCWT and the DWT for a variety of sample densities. The wavelet generative model (section 7.3.4) was used to generate the data. In order to produce shift invariant surfaces we use the NDWT transform in the generation. We use equation 7.4 to deﬁne the scaling factors with the same choice of σl values as in section 7.4.5. The sample locations were arranged in a grid with equal horizontal and vertical spacing between samples. Let this spacing be s pixels. The sample locations were at the points {(as + δx, bs + δy)} within the image where a, b ∈
Z. The constants δx, δy ∈ Z
eﬀectively adjust the origin for the transforms. For each realisation of the surface these values were chosen uniformly from the set {0, 1, . . . , 15}. To avoid possible edge eﬀects we measured energies averaged only over a square grid of size 3s by 3s centred on a square of four data points away from the edges. For a range of spacings we performed the following procedure: 1. For i ∈ {1, 2, . . . , 32}: (a) Generate a random surface Zi of size 128 by 128. (b) Generate random values for δx, δy. (c) Sample the surface at the points {(as + δx, bs + δy) : a, b ∈ Z} that are within the image. (d) Interpolate the sampled values using the method described in section 8.6. The interpolation is repeated for three diﬀerent transforms; the NDWT, the DWT, and the DTCWT. (e) Measure the energies needed to calculate the measures of quality:
8.7. CHOICE OF WAVELET
153
• SN DW T,i the energy of the NDWT solution. • EN DW T,i the energy of the error between the NDWT estimate and the original image values. • EDW T,i the energy of the error between the DWT estimate and the NDWT estimate. • EDT −CW T,i the energy of the error between the DTCWT estimate and the NDWT estimate. 2. Average the energies over all values of i. For example, SN DW T 1 = 32
32
SN DW T,i
i=1
3. Calculate the relative statistical and aesthetic qualities based on the averaged energies. The aesthetic quality for the DWT is calculated as QA = 10 log10 (SN DW T /EDW T ) and similarly for the DTCWT. In order to highlight the diﬀerence in the absolute statistical quality we compute a relative quality measure RS deﬁned as the diﬀerence between the statistical quality for the shift dependent estimate and the statistical quality of the shift invariant estimate Q0 : RS = QS − Q0 SN DW T SN DW T − 10 log10 EN DW T + EDW T EN DW T EN DW T = 10 log10 . EN DW T + EDW T RS will be a negative quantity that measures the loss of statistical quality caused by shift = 10 log10 dependence. Figure 8.3 plots the aesthetic quality against the density of points. Figure 8.4 plots the relative statistical qualtity. In both ﬁgures a cross represents the DTCWT estimate while a circle represents the DWT estimate. Results are not shown for the NDWT since the deﬁnitions ensure that this transform will always have an inﬁnite aesthetic quality and a zero relative statistical quality.
8.7.4
Discussion of the signiﬁcance of shift dependence
The simple theory we proposed predicted (in section 8.7.2) that the aesthetic quality for the DTCWT would be about 25dB better than for the DWT. The experimental results shown
INTERPOLATION AND APPROXIMATION 45 40 35 Aesthetic quality /dB 30 25 20 15 10 5 −4 10 10 −3 10 Samples per pixel −2 10 −1 10 0 Figure 8.7 −4 10 10 −3 10 Samples per pixel −2 10 −1 10 0 Figure 8.2 Relative statistical quality/dB −0.6 −0.4: Relative statistical quality for DWT(o) and DTCWT(x) /dB .3: Aesthetic quality for DWT(o) and DTCWT(x) /dB 0 −0.154 CHAPTER 8.1 −0.5 −0.4 −0.3 −0.
while the DTCWT gives a much improved quality.0.2%) caused by shift dependence. 3. This is a fairly poor match with the predicted value. However. The theory predicted that the relative statistical quality for the DWT would be about −0. For densities lower than about 1 in 162 = 256 the sample positions are so widely spaced that they will have little eﬀect on each other. Nevertheless.7. The theory only applies when the critical level is close to one of these levels. the shift dependence energy will be proportional to the density. The estimates will be very uninformative and the statistical error will be roughly constant (and equal to the variance of the original image). and therefore for low densities the shift dependence will be relatively insigniﬁcant and so the relative quality improves (decreases in magnitude). For the most detailed scale E1 = E≤1 and the ratio must be one. CHOICE OF WAVELET 155 in ﬁgure 8. This eﬀect will tend to . For the other scales the ratio will be somewhere between 1 and 1/2. We use 4 level transforms. 1. In practice there will still be some error in these. 2. The results in ﬁgure 8. Considering the large number of approximations made in predicting the value this is a reasonable match.6%) while the DTCWT has almost negligible errors (less than 0. The estimate that r = 1/2 is very crude.46dB for high sample densities. These experimental results conﬁrm that aesthetically the quality of the DWT is low.01dB. Bear in mind that a larger (in magnitude) relative quality means worse results.5dB for very high sample densities.3 suggest that the improvement in aesthetic quality is actually about 20dB for the DTCWT. but that for lower densities the relative quality becomes much smaller (in magnitude).8. The absolute value of the aesthetic quality for the DWT varies from about 20dB for high densities to 9dB for low sample densities. The eﬀect of this is to predict that the relative quality will actually be worse (larger in magnitude) than −0.25dB. Finally we discuss the expected eﬀect of some of the approximations on the discrepancy between the predicted and observed results for the relative quality.46dB (this corresponds to an error of about 11%). We assume that the wavelet coeﬃcients at scales coarser than p are accurately estimated. the experiments conﬁrm the the qualitative prediction that the DWT has an appreciably lower statistical quality (of around 0.4 suggest that the relative quality is only about −0.
8.156 CHAPTER 8. Ep is the expected energy of the level p coeﬃcients in the prior but the actual energy of the coeﬃcients in the interpolated image will tend to be less than this due to the limited information available. We assume that the level p coeﬃcients will produce an expected shift dependent energy of f Ep .1 Extensions Background The Kriging mean estimate gives biased results when estimating a nonlinear function of the image [130] (such as the proportion of pixels above a certain threshold).8 8. Therefore this will produce a slight improvement (decrease in magnitude) in the relative quality. Construct a queue of locations that cover a grid of positions at which we wish to estimate the intensity values. Remove a point from the queue and use Kriging to estimate the mean and variance of the posterior distribution at that point conditional on all the data values and on all the previous locations for which we have estimated values. We assume that the wavelet coeﬃcients at scales ≤ p are inaccurately estimated. The eﬀect of this is to predict less shift dependence and hence a better (smaller in magnitude) relative quality. 2. In practice there will still be some information in these. . 5. The most signiﬁcant of these eﬀects are probably the ﬁrst and last.8. This is because the estimate is given by the mean of the posterior distribution. In the context of Kriging approximation methods this generation is called conditional simulation [36] and works as follows: 1. This eﬀect will tend to decrease the statistical error and hence increase the signiﬁcance of the shift dependence. 4. INTERPOLATION AND APPROXIMATION increase the statistical error and hence decrease the signiﬁcance of the shift dependence. It is better to generate a range of sample images from the posterior and average the results. Therefore this will produce a worse (larger in magnitude) relative quality. The mean of the prior distribution is zero and when there is little information the mean of the posterior will also be close to zero.
8. Repeat steps 2 and 3 until the queue is empty. Generate a sample from the Gaussian distribution with the estimated mean and variance and use this to set the value at the new point.8.8. This process generates a single sampled image and can be repeated to generate as many samples as are desired. It is necessary to include points up to the range of the variogram if the results are to be valid and again the computation rapidly becomes prohibitively long. As described. An eﬃcient simulation method has been proposed [57. EXTENSIONS 157 3. with the additional advantage of wavelet ﬁltering to give better decoupling of the diﬀerent resolutions making our method faster and more accurate.8. . Our approach can be viewed as a multigrid approach. Each step in this process involves inverting a square matrix whose width increases steadily from the number of measurements to the number of locations we wish to know about. 89] that ﬁrst generates a random image with the correct covariance function (but that does not depend on the known data values). 4. This will generate a conditional simulation of the surface and the process can be repeated to generate many diﬀerent realisations. This calculation involves a huge amount of computation for more than a few hundred locations. We now describe a similar method that acts in wavelet space. It is easy to generate samples of the unimportant wavelet coeﬃcients (whose posterior distribution is the same as their prior) by simulating independent Gaussian noise of the correct variance.2 Proposal In order to eﬃciently calculate image samples we calculate samples of the wavelet coeﬃcients and then apply the wavelet reconstruction transform. and then adds on a Kriged image based on the data values minus the values of the random image at the known locations. the estimate for the value at a position is based on all the data points and all the previously calculated points but to increase the speed of the process it is possible to base the estimate just on some of the close points. A multiple grid approach has been proposed [41] that ﬁrst simulates the values on a coarse grid using a large data neighbourhood and then the remaining nodes are simulated with a ﬁner neighbourhood.
INTERPOLATION AND APPROXIMATION To generate samples of the important wavelet coeﬃcients consider solving the equations AZ = P T T T y/σ 2 + T P/σ D −1 T R where R is a vector of random samples from a Gaussian of mean zero and variance 1. We have generated simulated images with a quarter of a million grid nodes using the wavelet method in less than a minute on a single processor. Gaussian elimination is equivalent to LU factorisation (the representation of a matrix by the product of a lower triangular matrix L with an upper triangular matrix U) and so we can generate many samples quickly by calculating this ˆ factorisation once and then calculating Z for several values of R.11 that A = D −2 + P T T T T P/σ 2 . The improvement is possible because the wavelet transform achieves a good measure of decorrelation between diﬀerent ranges of the covariance function and so can interpolate each scale with an appropriate number of coeﬃcients. This is fast because triangular matrices can be quickly inverted using back substitution. The sparsity of A means that Gaussian elimination allows us to quickly solve these equations. . with length equal to the number of measurements plus the number of important wavelet coeﬃcients.158 CHAPTER 8. A−1 A−1 ∼ N a. A similar method of LU factorisation has been used to quickly generate many samples for the Kriging Conditional Simulation method [2] but this method can only simulate a few hundred grid nodes before the cost becomes prohibitive. We also get a very sparse set of equations which can be solved much faster than the fuller system that Kriging methods produce. A−1 D −2 + P T T T T P/σ 2 A−1 ∼ N a. A−1 This shows that such solutions will be samples from the posterior distribution for the wavelet coeﬃcients. Recall from equation 8. ˆ The vector of random variables Z given by T T P/σ ˆ Z = A−1 P T T T y/σ 2 + R D −1 will have a multivariate Gaussian distribution and noting that A is a symmetric matrix we can simplify as follows: T P/σ D −1 T T P/σ D −1 ˆ Z ∼ N A−1 P T T T y/σ 2 .
For example. This tradeoﬀ only aﬀects the time for setting up and solving the equations.3. For 256 measurements. it takes about 14 seconds (SNR=29. We also timed an interpolation of 512 measurements which took 32 seconds (SNR=33. less correlated.6 shows the results of the experiment when we have twice as many data points. On balance the amount of computation is roughly linear .6.5 plots the time taken for the interpolation versus the SNR of the results. and thus easier to solve.2 and measure the total time taken to produce this image (including the determination of the importance map and the wavelet transforms).3 reduces the time by a factor of 3 while giving a SNR of 27dB.3dB). the interpolation takes about 6 seconds (SNR=27.8.13) For example. It can be seen that the computation decreases for minor increases in the threshold while producing little additional error.8. An increased threshold makes the equations become sparser. We perform the experiment twice.2dB). The correct image C is deﬁned to be that produced by the method with zero threshold.8dB). Figure 8.3 Trading accuracy for speed We examine the tradeoﬀ between accuracy and speed by adjusting the threshold used to determine the importance map. We use a grid of 512 by 512 and generate a mean posterior estimated image using the wavelet method of section 8. We calculate the approximation while adjusting the threshold mentioned in section 8. There is also a horizontal dashed line drawn at the time taken for a threshold of zero. the time for determining the importance map and inverting the wavelet transform depends only on the size of the grid. Figure 8.1% of the energy of the surface. EXTENSIONS 159 8. a threshold of 0. For some of the points we have also displayed the associated threshold level.6. The same computation decrease is evident. This is not quite linear.8. Therefore when we have more measurements we can reduce the computation by using a higher threshold while maintaining the same accuracy. Consider the threshold of 0. but notice that the same threshold gives greater accuracy with more measurements. and once with 256 measurements. For 128 measurements. a SNR of 30dB is equivalent to saying that the energy of the error is only 0. For each output image S we deﬁne the signal to noise ratio (SNR) to be SNR = 10 log10 i i j 2 Cij j (Sij − Cij )2 (8. once with 128 measurements.
3 5 0 0 20 40 60 SNR /dB 80 100 Figure 8.04 Threshold=0. .3 20 40 60 SNR /dB 80 100 Figure 8.2 Threshold=0. INTERPOLATION AND APPROXIMATION 20 Threshold=0. 50 Time for interpolation Threshold=0.1 Threshold=0.04 Time for interpolation 15 10 Threshold=0.02 40 30 20 10 0 0 Threshold=0.2 Threshold=0.1 Threshold=0.160 CHAPTER 8.01 Threshold=0.6: Computation time versus SNR (256 measurements).5: Computation time versus SNR (128 measurements).
and such an inversion involves computation roughly cubic in the number of points.9 Discussion of model This section discusses the eﬀect of the following assumptions. the variance of the measurement noise is known.9 seconds and a 1024 by 1024 matrix takes 50 seconds. We would expect an assumption of Gaussian measurement noise to be reasonably accurate in most cases even for nonGaussian noise distributions of zero mean and equal variance. For most applications the original data will only be approximately modelled as a stationary Gaussian random process. 6. in certain circumstances this expectation may not hold. inverting a 256 by 256 matrix in Matlab takes 0. Often more information may be known about the likely structure of the data and a more sophisticated model using this information will almost certainly give better results but will probably also require much more computation to solve. For example. 5. 3. 4.9. the measurements lie on grid positions. To solve a system with S measurements involves the inversion of a S by S matrix. DISCUSSION OF MODEL 161 in the number of measurements for a constant accuracy. For any problem that is taking a considerable amount of time to solve it will clearly be preferable to use the new wavelet method.43 seconds. independent Gaussian noise corrupts the measurements. the image is a realisation of a 2D stationary discrete Gaussian random process.8. the measurements are at distinct locations. The bottleneck in Kriging is the inversion of a matrix. . In contrast. 2. The model assumes that: 1. Inﬁnite variance noise processes (such as alphastable noise). 8. the mean and covariance of the random process are known. The ﬁrst assumption is the most signiﬁcant. However. Kriging scales very badly with the number of data points. Two examples are: 1. a 512 by 512 matrix takes 5.
but that the others are badly corrupted. each method can be viewed as assuming a stationary discrete Gaussian random process for the prior. The reason that this is important for complex wavelets is that our proposed method based on the DTCWT uses a prior of the same form. For approximation all the methods work equally well even with the original samples without this constraint. but there are many methods available for obtaining estimates of these parameters [110]. We argued that Kriging. with the only theoretical diﬀerence between the methods being the assumed covariance function. A simple approach to the problem is to replace repeats by a single sample with value given by the average of the repeats. In this case methods that jointly estimate the surface and the reliable measurements should be able to give signiﬁcantly better results. More precisely. A possible measurement model is that a certain proportion of the measurements are accurate. 8. Radial Basis Functions. The assumption that the sample locations lie on grid points is another unimportant constraint because approximating sample locations by the nearest grid point should give suﬃciently accurate results for most applications. Results in the literature related to radial basis functions [97] suggest that the precise shape of the covariance function only has a small eﬀect on the results and therefore we do not expect the errors in the parameter estimates to be signiﬁcant. The restriction that the measurement locations are at distinct locations is an unimportant constraint that is needed only for interpolation. Bandlimited interpolation. The assumption of known mean and covariance of the process will almost never be true.162 CHAPTER 8.10 Conclusions The ﬁrst part of this chapter considered alternative interpolation and approximation techniques from a Bayesian viewpoint. and spline interpolation can all be viewed as calculating Bayesian posterior mean estimates based on particular assumptions about the prior distribution for the images. Our method has a number of parameters associated with it and these parameters could be tuned in order that the complex . The problem is that it is impossible to interpolate two diﬀerent values at the same position. The same argument applies to the variance of the measurement noise. INTERPOLATION AND APPROXIMATION 2.
We discussed and predicted the eﬀect of shift dependence on measures of aesthetic and statistical quality. it is more useful to use the freedom to tune the DTCWT method for a particular application.10. The second part of the chapter proposed a wavelet method for interpolation/approximation. in practice this is not an appropriate application for any of these wavelet methods. The minimum smoothness norm interpolation based on a decimated wavelet will suﬀer from shift dependence. In particular. CONCLUSIONS 163 wavelet method is an approximate implementation of any of the previously mentioned interpolation techniques.8. This chapter has argued that the DTCWT gives much better results than the DWT and much faster results than the NDWT. Using this method to achieve a constant accuracy we found that the time to solve the equations is roughly linear in the number of data points. We also prove from a Bayesian perspective that shift dependence will always be an additional source of error in estimates. while the computation for Kriging is roughly cubic in the number of data points. However. However. Finally we found a simple method that can be used to increase the speed of the method at the cost of a slight decrease in accuracy. The ﬁrst part also pointed out problems with some other techniques: 1. Smoothing spline estimates tend to inﬁnity when extrapolated away from the data points. even compared to the expected statistical error. For contrast. Better and faster results (at least for the . We also developed a method for generating samples from the posterior distribution that can generate large numbers of sample images at a cost of one wavelet reconstruction per sample image. while the DTCWT produced estimates with statistically insigniﬁcant errors due to shift dependence. Least squares spline approximation techniques cannot be shift invariant. 3. the DWT was found to give signiﬁcantly shift dependent results. 2. a comparable method based on the conjugate gradient algorithm [88] requires 2K wavelet transforms per sample image where K is the number of iterations used in the conjugate gradient algorithm. These predictions were tested experimentally and found to be rather inaccurate but they did give a reasonable guide to the relative importance of shift dependence for the diﬀerent methods.
164 CHAPTER 8. but also superior to the leading alternative methods. the next chapter will describe an application for which the DTCWT is not only better than the DWT and the NDWT. INTERPOLATION AND APPROXIMATION isotropic case) could be obtained with the GPT. In contrast. .
We will assume that the measurement process can be represented by a known stationary linear ﬁlter followed by the addition of white noise of mean 0 and variance σ 2 . The main original contributions are. and the experimental comparison with alternative techniques. the experimental results comparing the results for alternative transforms within the method.1) . 9.Chapter 9 Deconvolution The purpose of this chapter is to give an example of a Bayesian application that illustrates the performance gains possible with complex wavelets. then the captured images will be blurred. We explain how to use a complex wavelet image model to enhance blurred images. if a camera lens is distorted. or incorrectly focused. This model can be written as y = Hx + n 165 (9. We construct an empirical Bayes image prior using complex wavelets and experimentally compare a number of diﬀerent techniques for solving the resulting equations. We compare the results with alternative deconvolution algorithms including a Bayesian approach based on decimated wavelets and a leading minimax approach based on a special nondecimated wavelet [58]. The background for this chapter is largely contained in appendix C which reviews a number of deconvolution methods from a Bayesian perspective.1 Introduction Images are often distorted by the measurement process. For example. the new iterative deconvolution method.
166
CHAPTER 9. DECONVOLUTION
where some lexicographic ordering of the original image, x, the observed image, y, and the observation noise, n, is used. The known square matrix H represents the linear distortion. As it is assumed to be stationary we can write it using the Fourier transform matrix F as H = F H MF (9.2)
where M is a diagonal matrix. For an image with P pixels y, x, and n will all be P × 1 column vectors while F , M, and H will be P × P matrices. As both x and n are unknown equation 9.1 therefore represents P linear equations in 2P unknowns and there are many possible solutions. This is known as an illconditioned problem. The best solution method depends on what is known about the likely structure of the images. If the original images are well modelled as a stationary Gaussian random process then it is wellknown that the optimal (in a least squares sense) solution is given by the Wiener ﬁlter. However, for many real world images this model is inappropriate because there is often a signiﬁcant change in image statistics for diﬀerent parts of an image. For example, in a wavelet transform of an image most of the high frequency wavelet coeﬃcients tend to have values close to zero, except near object edges where they have much larger values. There have been many proposed methods for restoring images that have been degraded in this way. We restrict our attention to the more mathematically justiﬁable methods, ignoring the cruder “sharpening” techniques such as using a ﬁxed highpass ﬁlter or some simple modiﬁcation of wavelet coeﬃcients [14]. (These ignored techniques provide a quick, approximate answer but are less scientiﬁcally useful because often they will not provide an accurate reconstruction even in very low noise conditions.) For astronomical imaging deconvolution there are three main strands; the CLEAN method proposed by H¨gbom [44], maximumentropy deconvolution proposed by Jaynes o [54, 29], and iterative reconstruction algorithms such as the RichardsonLucy method [102]. For images containing a few point sources (stars) the CLEAN algorithm can give very accurate reconstructions, but for images of real world scenes these methods are less appropriate. Alternative image models are found to give better results. Constrained least squares methods [15] use a ﬁlter based regularisation, such as a Laplacian ﬁlter, but this tends to give over smoothed results when the image contains sharp edges. More recently there have been attempts to improve the performance near edges. These methods include total variation [123], Markov Random Field (MRF) [56, 132], and wavelet based approaches. There are
9.1. INTRODUCTION
167
two main contrasting methodologies for using wavelets. The ﬁrst group is based on a minimax perspective [38, 52, 58, 84, 87]. The second group is based on a Bayesian perspective using wavelets to represent the prior expectations for the data [8, 11, 94, 124]. We ﬁrst describe a general Bayesian framework for image deconvolution. In appendix C we draw out the connections between the diﬀerent approaches by reviewing the papers mentioned above with reference to the Bayesian framework. Section 9.1.2 summarises the main results from this review. Section 9.1.3 discusses the reasons guiding our choice of prior model based on the material covered in the appendix. This model is detailed in section 9.2 and then we describe the basic minimisation method in section 9.3. We propose a number of alternative choices for minimisation that are experimentally compared in section 9.4. Section 9.5 compares the results to alternative deconvolution methods and section 9.6 presents our conclusions.
9.1.1
Bayesian framework
To treat image deconvolution from the Bayesian perspective we must construct a full model for the problem in which all the images are treated as random variables. We shall use upper case (Y,X,N) to represent the images as random variables, and lower case (y,x,n) to represent speciﬁc values for the variables. To specify a full model requires two probability density functions to be speciﬁed: 1. The prior p(x) encodes our expectations about which images are more likely to occur in the real world. 2. The likelihood p(yx) encodes our knowledge about the observation model. (As before, we use the abbreviation x for the event that the random variable X takes value x.) All the reviewed methods (except for the RichardsonLucy method described in section C.5) use the same observation model and so the likelihood is the same for all of the methods and is given by
P
p(yx) =
i=1
√
1 2πσ 2
exp
− ([Hx]i − yi )2 2σ 2
.
Given observations y, Bayes’ theorem can be used to calculate the a posteriori probability density function (known as the posterior pdf): p(xy) = p(yx)p(x) . p(y) (9.3)
168
CHAPTER 9. DECONVOLUTION
There are several techniques available to construct an estimate from the posterior pdf. ˆ Normally a Bayes estimator is based on a function L(θ, θ) that gives the cost of choosing ˆ the estimate θ when the true value is θ. The corresponding Bayes estimator is the choice of ˆ θ that minimises the expected value of the function based on the posterior pdf. However, for the purposes of the review it is most convenient to consider the MAP (maximum a posteriori) estimate. The MAP estimate is given by the image x that maximises the posterior pdf p(xy). xM AP = argmaxx p(xy) Usually a logarithmic transform is used to convert this minimisation into a more tractable form: xM AP = argmaxx p(yx)p(x) p(y) = argmaxx p(yx)p(x)
= argminx − log (p(yx)p(x)) = argminx − log (p(x)) − log (p(yx))
P
= argminx f (x) +
i=1
([Hx]i − yi )2 2σ 2
2
= argminx f (x) +
1 Hx − y 2σ 2
where f (x) is deﬁned by f (x) = − log (p(x)) . (9.4)
2
In summary, the MAP estimate is given by minimising a cost function 2σ 2 f (x)+ Hx − y where the choice of f (x) corresponds to the expectations about image structure.
This minimisation problem often appears in the regularisation literature in one of two alternative forms. The ﬁrst is known as Tikhonov regularisation [117]. A class of feasible solutions Q is deﬁned as those images for which the norm of the residual image is bounded. The residual image is the diﬀerence between the observed data and the blurred estimate. Q = {x : y − Hx ≤ }
9.1. INTRODUCTION
169
Tikhonov deﬁned the regularised solution as the one which minimises a stabilising functional f (x). xT IKHON OV = argminx∈Q f (x) The second form is known as Miller regularisation [80]. In this approach the energy of the residual is minimised subject to a constraint on the value of f (x). xM ILLER = argmin{x:f (x)≤E} y − Hx Using the method of undetermined Lagrangian multipliers it can be shown [15] that both problems are equivalent to the MAP minimisation (for particular choices of σ).
9.1.2
Summary of review
Appendix C explains the principles behind the standard deconvolution techniques and attempts to explain the diﬀerences from within the Bayesian framework. Several of the methods are equivalent to a particular choice of f (x) (in some cases with additional constraints to ensure the image is positive). These choices are displayed in table 9.1. Explanations of these formulae can be found in the appendix, the numbers in brackets indicate the corresponding section. The Landweber and Van Cittert algorithms are special cases of Wiener Algorithm (section) CLEAN (C.1) Maximum Entropy (C.2) Wiener ﬁltering (C.4) Van Cittert (C.5) Landweber (C.5) Wang (C.10) Starck and Pantin (C.10) Belge (C.10) Pi˜ a and Puetter (C.10) n − Expression for f (x) − log(β) + α
i
Px x
i j
j
log P
i xi xi j xj
1 2 i σ2 SNRi  [F x]i  mi 2 1 − mi 2  [F x]i 2 i σ2 1−(1−αmi )K mi 2 1 − mi 2  [F x]i 2 i σ2 1−(1−αmi 2 )K 2 i λi  [W x]i  [W x]i  i λi [W x]i − mi −  [W x]i  log mi i
λi  [W x]i p
i
≈
wi p
Figure 9.1: Prior cost function f (x) expressions for standard deconvolution techniques. ﬁltering (if we remove the positivity constraint). In appendix C we also explain why we
170
CHAPTER 9. DECONVOLUTION
can approximate both constrained least squares (section C.6) and the RichardsonLucy algorithm (section C.5) as alternative special cases of the Wiener ﬁlter. The reason for making these connections is because we can predict an upper bound for the performance of all these methods by evaluating just the best case of Wiener ﬁltering (the oracle Wiener ﬁlter). Expressions for f (x) for the total variation (section C.7), Markov Random Field (section C.7), and Banham and Katsagellos’ methods (section C.10) can also be written down1 but the projection (section C.3) and minimax (section C.8) methods are more diﬃcult to ﬁt into the framework. The minimax methods are an alternative approach motivated by the belief that Bayesian methods are inappropriate for use on natural images. Section C.8 discusses the two approaches and explains why we prefer the Bayesian method.
9.1.3
Discussion
This section discusses the reasons guiding the choice of prior based on the review presented in appendix C. The main issue is to identify the nature of the dependence between the pixels in the original image. For astronomical images of sparsely distributed stars an independence assumption may be reasonable, while for many other kinds of images (including astronomical images of galaxies) such an assumption is inappropriate. If independence is a reasonable assumption then the CLEAN, maximum entropy, and maximally sparse methods are appropriate and the choice largely depends on the desired balance between accuracy and speed. For example, the CLEAN method is fast but can make mistakes for images containing clustered stars. For images that are expected to be relatively smooth then the Wiener ﬁlter and iterative methods are appropriate. If the images are known to satisfy some additional constraints (for example, the intensities are often known to be nonnegative for physical reasons) or if the blurring function is space varying then the iterative methods such as RichardsonLucy or constrained least squares are appropriate. Otherwise it is better to use the Wiener ﬁlter because it is fast and approximately includes the iterative methods as special cases.
1
We have not included these expressions because, ﬁrstly, they require a considerable amount of spe
cialised notation to be deﬁned and, secondly, these expressions can easily be found in the literature [8, 56, 90].
IMAGE MODEL 171 For images of scenes containing discontinuities then the total variation and wavelet methods are appropriate. We can write that the prior pdf p(w) for the wavelet coeﬃcients is proportional to 1 exp − wH Aw 2 (9.2 Image model We will assume that we have a balanced wavelet transform so that P H = W where P H represents the Hermitian transpose of P . additive. we use a generative speciﬁcation in which the real and imaginary parts of the wavelet coeﬃcients are independently distributed according to Gaussian distribution laws of zero mean and known variance. This can be considered as a simple extension to the model of chapter 7. This choice means that our proposed method will be an empirical Bayes approach based on the nonstationary Gaussian random process model.6) where A is a diagonal matrix with A−1 being the variance of wi . we choose to use a simple prior model based on an adaptive quadratic cost function [124]. white Gaussian noise model means that the likelihood p(yw) is proportional to exp − 1 HP w − y 2σ 2 2 (9. Speciﬁcally. Realisations from the ii prior pdf for images could be calculated by generating wavelet coeﬃcients according to this . The assumption of a linear. but for many natural images this model is only correct for certain parts of the image while in other parts there may be textured or smoothly varying intensities. We are interested in examining the potential of the DTCWT within deconvolution and therefore we choose to study the restoration of realworld images rather than sparse starﬁelds.5) As mentioned in the previous section.2.9. The Markov Random Field and total variation methods are good for images that are wellmodelled as being piecewise ﬂat. We will assume that the real and imaginary parts of the transform’s outputs are treated as separate coeﬃcients so that W and P are real matrices. 9. For simplicity we choose to use the quadratic cost function proposed by Wang et al [124]. The wavelet methods tend to give a good compromise for images containing such a mixture of discontinuities and texture. The previous section described several ways of constructing a prior with wavelets.
Initialise the wavelet coeﬃcients. the number of pixels in such problems will mean that the pseudoinverse will take a very long time to evaluate and a much quicker approach is needed. Minimise the energy along a line deﬁned by the search direction. However. During the estimation we compute a ﬁrst estimate x0 of the original image. 4. Repeat steps 4 and 5 ten times. 2.5 and 9. We assume that each coeﬃcient has an equal variance in its real and imaginary parts. For this problem the MAP estimate will be identical to the Bayesian posterior mean estimate because the posterior pdf is a multivariate Gaussian. With this scaling the energy function is given by combining equations 9.3 Iterative Solution The simple assumptions made mean that it is possible to write down the solution to the problem using a matrix pseudoinverse. Estimate the PSD of the image. DECONVOLUTION distribution and then inverting the wavelet transform x = P w. Figure 9. Calculate a search direction.6: E= The steps in our method are: 1. The only diﬀerence to the previous model is that the variances are allowed to vary between coeﬃcients rather than being the same for all coeﬃcients in a given subband. 3.3.1 explains the estimation steps.2 contains a ﬂow diagram of this method. For simplicity we assume that the image has been scaled so that σ = 1. 6. 9. We deﬁne an energy E to be the negative logarithm of the likelihood times the prior (ignoring any constant oﬀsets). Section 9. The 1 HP w − y 2 2 1 + wH Aw 2 (9.172 CHAPTER 9. We will attempt to minimise the energy by repeating low dimensional searches in sensible search directions.7) . The MAP (maximum a posteriori) answer is given by minimising this energy. Estimate the variances of the wavelet coeﬃcients. 5.
2: Flow diagram for the proposed wavelet deconvolution method. ITERATIVE SOLUTION 173 Start Estimate model parameters Image Initialisation Calculate Search Direction Minimise Energy along search direction Have we done enough iterations? Yes Stop No Figure 9.9.3. .
This process corresponds to the step Estimate Model Parameters in the ﬂow diagram of ﬁgure 9. Details of this process were not given and so we use an alternative estimation technique. Compute the wavelet transform w = W x of this image. The variance estimates are given by the energy of the wavelet coeﬃcients of the wavelet transform of an estimate of the original image. ˆ 2. but also contain more noise and thus produce overestimates of the variances.3 explains how to minimise the energy within a one dimensional subspace.2. ˆ A simple estimate for the original image would be the observed data x = y. Smaller values of α will preserve the signal more. DECONVOLUTION detail of this image will probably be unreliable but the lowpass information should be fairly accurate. and the scaling coeﬃcients to the scaling coeﬃcients in the transform of the image x0 .3. 9. Our .174 CHAPTER 9. However. Obtain an estimate x of the original image. Alternatively we could compute a deconvolved image via the ﬁlter of equation C. for a typical blurring operation this would underestimate the variances of the coeﬃcients at detailed scales. Later in section 9.3 by the Estimate Wavelet Variances block. In other words.2 explains how the search direction is chosen.3 contains a block diagram of the estimation process. Their method for variance estimation was to perform an edge detection operation on the original image and then increase the variances of coeﬃcients near edges. We initialise the wavelet coeﬃcients to zero. 3. Wiener estimates tend to have smoothed edges and will therefore tend to produce underestimates of the variances near edges. Full regularisation (α = 1) corresponds to using a Wiener denoised estimate.3. Section 9.4 we will propose a better initialisation.1 Variance estimation The method of Wang et al [124] used a similar prior model (based on a real wavelet transform). These steps are represented in ﬁgure 9. we need to: ˆ 1. and I(i) the index of the imaginary part of the complex wavelet coeﬃcient corresponding to index i.3. Figure 9.5. Section 9. Estimate the variances Aii = 2 wR(i) 1 2 + wI(i) where R(i) is the index of the real part.
3: Block diagram of deconvolution estimation process.9. ITERATIVE SOLUTION 175 Observed Image y Estimate PSD Under−regularized Deconvolution Initial estimate of x Wavelet denoising Second estimate of x Estimate wavelet variances A Figure 9. .3.
3. DECONVOLUTION chosen approach is to use the under regularised inverse (with α = 0. Hillery and Chin propose an iterative Wiener ﬁlter which successively uses the Wienerﬁltered signal as an improved prototype to update the power spectrum estimate [48]. By Fourier transforming the observation equation 9. We estimate the power spectrum of the original image by ˆ ˆ px = py − σ 2 1N 2 M H M + βIN −1 In particular. 48]. such an estimate is useful in testing methods as it removes the errors caused by bad spectrum estimates. Square to get an estimate of the observed power spectrum py i = [F y]i 2 . We will always make it clear when we are using such an oracle estimate2 In a real application we need a diﬀerent estimation technique. The constrained least squares method is a variant in which the autocorrelation is assumed to be of a known form [5]. In some experiments we will use the oracle estimate given by the square of the power spectrum of the original image (before convolution). the comparisons of the DTCWT method with other published results will never cheat by using an oracle estimate.5 requires (for α = 0) an estimate of the power spectrum of the original image. Within this dissertation we are more concerned with the performance of wavelet methods than classical estimation theory and we use a fast and simple alternative estimation technique. This is called an oracle estimate because it requires knowledge of the original image but this information will naturally not be available in any real application. The ﬁlter of equation C. This kind of estimate is often required in deconvolution [84. Autoregressive and Markov image models have been used to estimate image statistics [21] but it is reported that the method only works well in noise reduction and not in blur removal [48].1) followed by soft thresholding wavelet denoising. Nevertheless.176 CHAPTER 9. Calculate the Fourier transform of the observed data F y. . ˆ 2.1 it is straightforward to show that the expected value of this estimate is given by ˆ E py = M H Mpx + σ 2 1N where 1N is a N × 1 vector of ones. 1.
Once we have the power spectrum.3. It is important that wi is complexvalued here. The details of this algorithm are: ˆ 1. This represents the Estimate PSD block in ﬁgure 9. Any negative elements in the estimated power spectrum are set to zero. 2. Let wi be the ith complex wavelet coeﬃcient in the output of this transform.9. We use β = 0.3. In the rest of this chapter except for the four steps of this algorithm we use the separated real form of the transform. the estimate of the original image is given by the under regularised ﬁlter. The original white noise of variance σ 2 is coloured by both the wavelet transform and the inverse ﬁltering. In practice it is easier to estimate these values by calculating the DTCWT of an image containing white noise of variance σ 2 that has been ﬁltered according to equation . Calculate the complex wavelet transform of the image estimate x0 . First the signal strengths are estimated for each wavelet coeﬃcient and then a Wienerstyle gain is applied to each coeﬃcient. but here it is more convenient to use the complex form. This ﬁltering can be expressed in matrix form as ˆ ˆ x0 = F H M H M H M Px + ασ 2 IN −1 Fy (9. This ﬁltering is represented by the Underregularized Deconvolution block in ﬁgure 9.8) ˆ ˆ where Px is a diagonal matrix whose diagonal entries are given by px . ˆ 2 ai = wi 2 − γσi ˆ (9. A similar approach is used to perform the initial wavelet denoising.9) 2 where σi is the variance of the noise in the wavelet coeﬃcient and γ takes some constant value. ITERATIVE SOLUTION 177 where β is a parameter used to avoid over ampliﬁcation near zeros of the blurring ﬁlter.01 in the experiments. The parameters of both these processes are known which 2 in theory allows the exact calculation of σi . The value of σi will be the same for all coeﬃcients within the same subband (because the ﬁltering is a stationary ﬁlter and diﬀerent coeﬃcients in a subband correspond to translated impulse responses). This gives the ˆ underregularised image estimate x0 that is further denoised using wavelets. Calculate an estimate ai of the signal power in these coeﬃcients.3.
2 Choice of search direction This section describes the contents of the Calculate Search Direction step in ﬁgure 9. wi = ˆ ai ˆ w 2 i ai + σi ˆ (9. These steps are represented by the Wavelet Denoising block of ﬁgure 9.178 CHAPTER 9. h(1) . We will test three types of preconditioning in both gradient and conjugate gradient algorithms for a total of six alternatives. We ﬁrst describe the conjugate gradient algorithm and then our preconditioning choices. However. The average energy of the wavelet coeﬃcients in the corresponding subbands 2 provide estimates of σi . This is because the noise only 2 corrupts the coeﬃcients with an average energy of σi . New wavelet coeﬃcients are generated using a Wiener style gain law.2. We construct a sequence of search directions h(0) . Convergence can be improved for a badly conditioned system via a preconditioner. 9. either by the steepest descent algorithm h(i) = g(i) .10) ˆ 4.3. . i. In the experiments we will always use γ = 3.3. with this choice there is a signiﬁcant probability that a low power coeﬃcient will be incorrectly estimated as having a high energy. DECONVOLUTION 9. The choice of search direction will only aﬀect the speed of convergence but not the ﬁnal result.8. . The inverse DTCWT is applied to the new wavelet coeﬃcients to give an image x. . One obvious choice is the gradient but better search directions are usually produced by the conjugate gradient algorithm [98]. In practice we ﬁnd it is better to use a larger value to avoid this problem. Let g(i) be the preconditioned descent direction at the ith step of the algorithm.e. As before negative values of ai are set to zero. ˆ 3. A choice of γ = 1 would seem to give a good estimate of the original signal power. for Hessians close to a multiple of the identity matrix. Both algorithms work best for wellconditioned Hessians.
9. the search direction for the conjugate gradient algorithm is given by h(0) = g(0) . The Hessian of the energy expressed as a function of v is ∇2 E = S H P H H H HP S + S H AS v The ith diagonal entry of this equation is ∇2 E v ii = s2 ti i where ti is the ith diagonal entry of the matrix P H H H HP + A. . g(i) = −∇w E = P H H H y − P H H H HP w − Aw.7). Deﬁne scaled wavelet coeﬃcients as v = S −1 w where S = diag {s} for some vector of scaling coeﬃcients s. The required scaling is √ therefore si = 1/ ti .3. The Hessian for our system is P H H H HP + A and so the ideal preconditioner would be P H H H HP + A −1 which would transform the Hessian to the identity matrix but this matrix inversion is far too large to be numerically calculated. Instead for the second type we choose a simpler type of preconditioning that scales the energy function gradient in order to produce a Hessian with diagonal entries equal to 1. The gradient is given by ∇v E = S H ∇w E. For the ﬁrst pass. We compare three types of preconditioning. This deﬁnes appropriate directions for changes in the preconditioned coeﬃcients v. g(i−1) 2 This formula is valid for i > 0. Appropriate directions for changes to the original coeﬃcients w are therefore given by g(i) = S∇v E = S 2 ∇w E. ITERATIVE SOLUTION 179 or by the conjugate gradient algorithm h(i) = g(i) + g(i) 2 (i−1) h . i = 0. The ﬁrst type corresponds to no preconditioning and g(i) is given by the negative gradient of the energy function (E was deﬁned in equation 9.
except for the ith coeﬃcient which is set to 1.7) is a quadratic function of w and hence the optimum can be found by setting the gradient equal to zero. Set all wavelet coeﬃcients to zero. Pick out the value of the ith coeﬃcient pi = P H H H HP ei i .180 CHAPTER 9. The vector gradient of the energy is ∇E(w) = −P H H H y + P H H H HP w + Aw therefore the solution to the problem. The third type of preconditioning is based on analogy with the WaRD method [84]. 6. Also note that because these values (for pi ) depend on the choice of blurring ﬁlter and wavelet transform but not on the observed data they can be computed once and used for many diﬀerent test images. Apply the wavelet transform to get P H H H HP ei . 5. Apply the spatially reversed blurring ﬁlter to get H H HP ei . A negative way of looking at this is to say that calculating a good preconditioner involves the same eﬀort as solving the original equations directly. 3. 4. Invert the wavelet transform to get P ei . wopt . is given by wopt = P H H H HP + A Note that the factor P H H H HP + A −1 −1 P HHHy (9. The entry ti can be calculated by: 1.11) is exactly the same as the ideal preconditioner. Apply the blurring ﬁlter H to get HP ei . to get a unit vector ei . We can therefore compute all the pi by applying this process once for each subband. we can also reverse . The value of pi depends only on which subband contains the nonzero coeﬃcient. The diagonal entries of A are known (these are the inverses of the variance estimates) so consider the matrix P H H H HP . 2. 7. ti = Aii + pi . However. DECONVOLUTION This method requires the precomputation of ti . To explain the analogy we ﬁrst derive the analytic solution to the energy minimisation problem. The expression for energy (equation 9. Calculate ti .
3. The WaRD search direction is therefore given by ˆ g(i) = w. ITERATIVE SOLUTION 181 the logic to say that a reasonable method for solving the original equations will probably also give a reasonable preconditioning method. Wavelet transform the image xα . We can write the regularised linear ﬁltering used in the WaRD method as xα = F H Px F HHy 2I x + ασ N Px F P P HHHy = FH H M MPx + ασ 2 IN M H MP where Px is a diagonal matrix containing the estimated PSD of the image along the diagonal entries.9. We now give the details of how this idea is applied.13) where βi2 is the estimated variance of the wavelet coeﬃcients due to our image model and γi2 is the estimated variance of the wavelets due to the noise (ampliﬁed by the inverse ﬁlter). We deduce that in the WaRD method the rest of the terms together with the wavelet denoising should provide an approximation to the ideal preconditioner. Invert the wavelet transform of the image. On the basis of this logic we propose using the WaRD method as a preconditioner because ee have found that it gives a good ﬁrst approximation to solving the original equations. . Modify each wavelet coeﬃcient by wi = ˆ βi2 βi2 + γi2 wi (9. If we compare this equation with equation 9. 2.3. The WaRD method consists of a linear ﬁltering stage followed by a wavelet denoising stage. On the basis of this analogy we will choose a search direction that is the wavelet denoised version of the image xα = F H M H MP Px F P (−∇Ew ) 2 x + ασ IN (9.11 we spot the term P H H H y on the right of both equation. We use the same strategy except that we are calculating a search direction in wavelet space and so we can omit the ﬁnal step.12) The denoising strategy used by the WaRD estimate is as follows: 1.
2. Multiplication by P H = W is performed by a forward wavelet transform. 4.3. For large blurring ﬁlters it is quicker to implement the linear ﬁlters using a Fourier transform. Then if we add on a times the search direction w = w0 + aδw and we can express the energy as a function of a as 2E(a) = HP w0 + aHP δw − y 2 + wH + aδw H A (w0 + aδw ) 0 (9. 3. −y) of the blurring ﬁlter.182 CHAPTER 9. Multiplication by H is performed by the original blurring ﬁlter. 9. These will be called . Suppose we have an estimate w0 and a search direction δw .4 Convergence experiments and discussion We consider the same problem studied by Neelamani et al [84] of the 256 × 256 Cameraman image blurred by a square 9 × 9point smoother. 2. The blurred signal to noise ratio (BSNR) is deﬁned as 10 log 10 ( y 2 /(256 ∗ 256)σ 2 ) and noise is added to make this 40dB (we assume that images have been scaled to have zero mean). Multiplication by H H is performed by a reversed version h(−x. DECONVOLUTION 9. We compare the performance of the six diﬀerent search direction choices.14) We can minimise this expression by setting the derivative with respect to a equal to zero d(E(a)) = a HP δw 2 + aδw H Aδw da − δw H P H H H (y − HP w0 ) + therefore a= δw H P H H H (y − HP w0 ) − HP δw 2 δw H Aw0 = 0 δw H Aw0 + δw H Aδw When we want to evaluate this expression we never need to do any matrix multiplications because: 1. Multiplication by P is performed by an inverse wavelet transform.3 One dimensional search This section describes the contents of the Minimise Energy along search direction step in ﬁgure 9.
Figure 9. x(10) } of restored images x ˆ produced by these algorithms. 2. The ISNR actually decreases on several of the steps when the WaRD direction is used. We use the oracle estimate for the power spectrum of the original image in order that the SNR will be a measure of the convergence of the algorithm rather than of the quality of the power spectrum estimate. Convegence is very slow without preconditioning (NOPRE). . NOPRECG for the conjugate gradient algorithm with no preconditioning. The CG algorithm gives better results than the SD algorithm for the PRE and NOPRE methods but not for the WaRD iterations. The results from the preconditioned method (PRE) start at a low ISNR but steadily improve (a later experiment will show the performance over many more iterations). CONVERGENCE EXPERIMENTS AND DISCUSSION 183 NOPRESD for the steepest descent method with no preconditioning. PRESD for the steepest descent preconditioned to have ones along the diagonal of the Hessian matrix. We observe the following characteristics of the plot: 1. 4. WaRDSD for search directions deﬁned by the WaRD method. 6. 3.4 plots the improvement in SNR (ISNR) deﬁned by 10 log 10 ˆ x − y 2 / x − x(n) 2 ˆ for the sequence {ˆ (1) .9. . 5. The same initialisation is used for all methods and therefore all methods have the same performance at the start of iteration 1. The WaRD method achieves a high ISNR after the ﬁrst pass. x(2) .4. but there is little subsequent improvement. WaRDCG for search directions deﬁned by the conjugate gradient algorithm acting on the WaRD directions. 7. PRECG for the conjugate gradient algorithm used with the preconditioned system. The ﬁrst pass of the conjugate gradient algorithm uses a steepest descent search direction and therefore the CG and SD methods give the same performance at the start of iteration 2. . . .
. DECONVOLUTION 12 11 10 9 WaRD ISNR/dB 8 7 6 5 4 3 1 PRE NOPRE 2 3 4 5 6 7 8 9 10 Iteration Figure 9.4: Performance of diﬀerent search directions using the steepest descent (x) or the conjugate gradient algorithm (o).184 CHAPTER 9.
WaRDCG. There is an improvement of about 0.5 that the WaRD intialisation means that the ISNR is about 11. and PRESD methods on the same image. The WaRDCG method displays an oscillation of INSR with increasing iteration. as suggested by the ISNR results. while a single iteration of the WaRD direction (starting from the original initialisation of just the scaling coeﬃcients) only reached about 10. The preconditioned conjugate gradient search direction gives the best ﬁnal results. Figure 9. The ISNR is still improving after ten iterations and the third experiment examines the improvement over 100 iterations. Note that we have a much narrower vertical axis range in this ﬁgure than before. It can be seen that the PRECG method reaches the lowest energy.4. and another 0.95dB within about 20 iterations. We will discuss the unusual performance of the WaRD direction more at the end of this section. The PRECG method performs best initially. This is a reasonable initial approximation and consequently the WaRD direction works well the ﬁrst time. We compare diﬀerent choices for the second search direction. In the second experiment we use the WaRD method to initialise the wavelet coeﬃcients (step Initialisation of ﬁgure 9. We will call this “WaRD initialisation”.7 plots the value of the energy function E(w) of equation 9.13 to generate our initial wavelet coeﬃcient estimates. while the PRESD method requires about 100 iterations to reach the same ISNR level. This again supports the argument that the WaRD method works best when used as it was originally designed rather than to construct search directions.05dB from using the conjugate gradient algorithm rather than the steepest descent algorithm.5 plots the results of this experiment.6 compares the performance for the PRECG.8) and then use the WaRD modiﬁcation step of equation 9. However. Figure 9. CONVERGENCE EXPERIMENTS AND DISCUSSION 185 The WaRD direction is designed to give an estimate of the deconvolved image based on the assumption that the signal and noise are diagonalised in wavelet space [58]. In all of the ISNR plots of this section we have seen that . reaching its peak of 11. for subsequent iterations we expect the oﬀdiagonal elements to become more signiﬁcant and therefore it is not surprising that the WaRD direction is less eﬀective. More precisely. Figure 9.05dB from using the preconditioned direction rather than the WaRD direction after 10 iterations.3dB at the start of the start of the ﬁrst iteration.2).9. Note from ﬁgure 9. we calculate the wavelet transform of the image x0 (deﬁned in equation 9. ﬁnally settling around 11. Note that this is the same as using a single pass of the algorithm with a WaRD direction based on an intialisation of both scaling and wavelet coeﬃcients to zero.4.8dB in ﬁgure 9.9dB.7 at the start of each iteration.
186 CHAPTER 9.4 11.5: Performance of diﬀerent search directions using the steepest descent (x) or the conjugate gradient algorithm (o) starting from a WaRD intialisation.7 11. . DECONVOLUTION 12 11.5 PRECG PRESD WaRDSD ISNR/dB NOPRE 11.8 11.1 11 1 WaRDCG 2 3 4 5 6 7 8 9 10 Iteration Figure 9.3 11.2 11.9 11.6 11.
6: Performance of diﬀerent search directions over 100 iterations 0.3 11.24 Energy /pixel 0.18 0.1 11 0 20 40 Iteration 60 80 100 WaRDCG Figure 9.26 0.25 0.27 0. CONVERGENCE EXPERIMENTS AND DISCUSSION 187 12 11.4 11.5 11.6 11.9 PRECG PRESD 11.22 0.21 0.23 0.8 11.7: Value of the energy function over 100 iterations .17 0 PRECG 20 40 Iteration 60 80 100 WaRDCG Figure 9.2 11.9.4.19 PRESD 0.7 ISNR/dB 11.2 0.
7 has two terms. 2. In our model the matrix A that deﬁnes the prior for a particular image has been generated from the image itself. Subsequent directions attempt to correct for these incorrectly large wavelet coeﬃcients. The poor choice of search direction means that the direction also introduces errors elsewhere in the image. ﬁgure 9. . This will tend to reduce the quality of the estimate. The ﬁrst few search directions correct for the most signiﬁcant places where the observations disagree with the current estimate at the cost of increasing the size of some wavelet coeﬃcients. Recall that the energy function of equation 9. Nevertheless. DECONVOLUTION the WaRDCG method behaves strangely in that the ISNR often decreases with increasing iterations. The observation energy is high.7 conﬁrms that the energy function decreases with every iteration. The poor choice of search direction means that some wavelet coeﬃcients are made large during these initial stages despite having an expected low variance according to the prior distribution. a “observation energy” that measures the degree to which the current estimate matches the observations. 3. 3 Note that we are using the word “prior” in a very loose sense. We now suggest an explanation for how bad directions can cause such problems: 1. and a “prior energy” that measures how well the wavelet coeﬃcients match our prior expectations3 . At the start of the method the wavelet coeﬃcients are all zero and the estimated image is a relatively poor ﬁt to the observations. 4. We choose ten iterations of the PRECG search direction with a WaRD initialisation as a reasonably fast and robust choice for the comparisons with alternative methods. In each iteration the total energy (observation energy plus prior energy) decreases but the internal redistribution of energy between the two terms can cause a corresponding ﬂuctuation in the ISNR. We have already argued that the WaRD direction should not be expected to produce sensible search directions except for the ﬁrst pass (for which it was designed) but it may still seem strange that the ISNR decreases. and the prior energy is zero.188 CHAPTER 9. The observation energy decreases and the prior energy increases. The prior energy decreases but now at the cost of increasing the observation energy. This tends to improve the quality of the estimate.
IGNSQ for the satellite image blurred with the square PSF. IGNST for the satellite image blurred with the satellitelike PSF.5. The aerial image is provided by the French Geographical institute (IGN). The number of iterations was chosen to maximise the ISNR for the CMSQ image. This PSF is plotted in ﬁgure 9. COMPARISON EXPERIMENTS 189 9. CMST for the cameraman image blurred with the satellitelike PSF. This alternative PSF is deﬁned as h(x. the 9 by 9 smoother 50 50 100 100 150 150 200 200 250 50 100 150 200 250 250 50 100 150 200 250 Figure 9. .5 Comparison experiments We will compare the performance of algorithms on two images. We compare two point spread functions (PSF). sets which we will call This gives a product of four test data CMSQ for the cameraman image blurred with the square PSF.8. We will test the following algorithms Landweber 235 iterations of the Landweber method.8: Test images used in the experiments. y ≤ 7.9. The two test images (before blurring) are shown in ﬁgure 9. used above. y) = 1 (1 + x2 ) (1 + y 2) for x. the cameraman image used in the previous section and an aerial image of size 256 by 256.9. and an alternative 15 by 15 PSF more like a satellite blurring ﬁlter.
9: Alternative PSF used in experiments. The operation of each step is equivalent to averaging the operation of a DWT step over all possible translations.190 CHAPTER 9.02 6 4 2 0 −2 −4 −6 −6 −4 0 −2 2 4 6 Figure 9. but using a real nondecimated wavelet formed from a biorthogonal 6.8 tap ﬁlter set. PRECGDTCWT Ten iterations of the PRECG search direction starting from a WaRD estimate (using the standard DTCWT ﬁlters of the (1319) tap near orthogonal ﬁlters at level 1 together with the 14tap Qshift ﬁlters at level ≥ 2). This algorithm is described in detail in appendix C. but using a real decimated wavelet formed from a biorthogonal 6. PRECGNDWT The same algorithm as described in the earlier sections for complex wavelets. Wiener A Wiener ﬁlter using a power spectrum estimated from the observed data[48]. .8 tap ﬁlter set. Mirror The nondecimated form of mirror wavelet deconvolution.08 0.04 0. PRECGDWT The same algorithm as described in the earlier sections for complex wavelets. DECONVOLUTION 0.1 0. Oracle Wiener A Wiener ﬁlter using the (unrealisable) oracle power spectrum estimate.06 0.
9.5. COMPARISON EXPERIMENTS
191
In each experiment we use a variance of σ 2 = 2. (For the convergence experiments the variance was about 1.7). Except for the Oracle Wiener method we will use realisable estimates of the power spectrum. In other words, all of the algorithms (apart from Oracle Wiener) could be performed on real data. The results of these experiments are tabulated in ﬁgure 9.10. Notice the following features of these results: Algorithm Landweber Wiener Oracle Wiener Mirror PRECGDTCWT PRECGDWT PRECGNDWT CMSQ 7.48 7.31 8.51 3.88 9.36 8.36 8.64 CMST 3.53 2.70 6.38 5.28 7.3 5.43 5.59 IGNSQ 7.49 8.00 9.11 4.71 9.83 8.84 9.28 IGNST 0.489 3.14 6.07 4.61 6.72 4.89 5.09
Figure 9.10: Comparison of ISNR for diﬀerent algorithms and images /dB 1. The realisable Wiener ﬁlter always performs worse than the oracle Wiener by at least 1dB. 2. The Landweber method always performs worse than the Oracle Wiener, but sometimes beats the standard Wiener ﬁlter. 3. The Landweber method has a very poor performance on the IGNST image. This illustrates the problems of an incorrect choice for the number of iterations. Further tests reveal that for the IGNST image the optimum performance is reached after 36 iterations, reaching a ISNR of 5.4dB. 4. The Mirror wavelet algorithm beats the standard Wiener ﬁlter for the satellite like blurring function (ST), but not for the square blur (SQ). 5. The nondecimated wavelet always performs better than the decimated wavelet. 6. The DTCWT always performs at least 0.5dB better than any of the other tested algorithms. It is possible to rank these performances into three groups:
192
CHAPTER 9. DECONVOLUTION
1. The Landweber, Mirror, and standard Wiener method. 2. The PRECGDWT, PRECGNDWT, and oracle Wiener method. 3. The PRECGDTCWT method. The methods in the second group are always better (in these experiments) than those in the ﬁrst group, and the method in the third group is always better than any other. Finally we attempt to duplicate the experimental setups of published results. Some authors [124] only present results as images and thus it is hard to compare directly but usually a measure of mean squared error (MSE) or improved signal to noise ratio (ISNR) is reported. The cameraman image with a uniform 9 by 9 blur and a blurred signal to noise ratio (BSNR) of 40dB was originally used by Banham and Katsaggelos [8] who report an ISNR of 3.58dB for Wiener restoration, 1.71dB for CLS restoration, and 6.68dB for their adaptive multiscale waveletbased restoration (Constrained Least Squares, CLS, restoration was described in section C.6 and is another deterministic restoration algorithm whose performance on this task will always be worse than the Oracle Wiener solution). Neelamani et al claim [84] that they use the same experimental setup and quote an ISNR of 8.8dB for the Wiener ﬁlter and 10.6dB for the WaRD method. There is a large discrepancy in the Wiener ﬁlter results. A small discrepancy is expected as diﬀerent power spectrum estimates result in diﬀerent estimates; Banham and Katsaggelos [8] explicitly state that that high frequency components of the spectrum are often lost or inaccurately estimated while Neelamani et al add a small positive constant to the estimate to boost the estimate at high frequencies. A close examination of the published ﬁgures reveals that in fact the setup is slightly diﬀerent for the two cases. Banham and Katsaggelos use a ﬁlter that averages the contents in a square neighbourhood centred on each pixel, while Neelamani et al use a ﬁlter that averages the contents in a square neighbourhood whose corner is at the pixel. This change in setup does not aﬀect the amount of noise added as the BSNR is insensitive to shifts, nor does it aﬀect the generated estimates. However, it does aﬀect the ISNR because the starting SNR (for the blurred image) is considerably lowered by a translation. Fortunately, the diﬀerence merely results in a constant oﬀset to the ISNR values. This oﬀset is given by the ISNR that results from using the centred blurred image rather than the displaced one. Let H represent the oﬀset ﬁlter and S the translation that centres the impulse response. The blurred image produced by Neelamani et al is given by Hx + n1 ,
9.5. COMPARISON EXPERIMENTS
193
while Banham and Katsaggelos produce an image of SHx+n2 where n1 and n2 are vectors of the noise added to the images. The diﬀerence ISNRof f set in the ISNR values for an ˆ image estimate x is therefore ISNRof f set = 10 log10 = 10 log10 = 10 log10 x − Hx − n1 2 − 10 log10 ˆ x−x 2 ˆ x − Hx − n1 2 x − x 2 2 x − SHx − n 2 ˆ x−x 2 2 x − Hx − n1 x − SHx − n2 2 x − SHx − n2 ˆ x−x 2
2
For the Cameraman image the value of this oﬀset is ISNRof f set = 3.4260dB when there is no noise added. Using a typical noise realisation reduces this to ISNRof f set = 3.4257dB. We see that the noise levels are very low compared to the errors caused by blurring. In our experiments we have followed the setup of Neelamani et al in using the displaced ﬁlter. For comparison we have calculated the results of the PRECGDTCWT and our version of standard Wiener on the same image. The results from the literature (including the adjusted results of Banham and Katsaggelos) are shown in table 9.11, our original results are printed in bold. In these results the CLS method gives the worst results while Algorithm CLS Wiener (Banham and Katsaggelos) Multiscale Kalman ﬁlter Wiener (Neelamani et al) WaRD Wiener (our version) PRECGDTCWT ISNR /dB 5.14 (1.71+3.43) 7.01 (3.58+3.43) 10.11 (6.68+3.43) 8.8 10.6 8.96 11.32
Figure 9.11: Comparison of diﬀerent published ISNR results for a 9 by 9 uniform blur applied to the Cameraman image with 40dB BSNR. the PRECGDTCWT method gives the best. The adjustment for the oﬀset ﬁlter shows that the WaRD method is 0.5dB better than the multiscale Kalman ﬁlter (instead of the claimed 4dB improvement), while the PRECGDTCWT method is 0.7dB better than the WaRD method (and 1.2dB better than the multiscale Kalman ﬁlter).
194
CHAPTER 9. DECONVOLUTION
Figure 9.12 displays the deconvolved images for our Wiener and PRECGDTCWT approaches. This setup is exactly the same as for the initial experiments in section 9.4.
Original image Observed image
Wiener denoised image (ISNR=8.96)
Restored image after 10 iterations (ISNR=11.32)
Figure 9.12: Deconvolution results for a 9 by 9 uniform blur applied to the Cameraman image with 40dB BSNR using the PRECGDTCWT method with WaRD initialisation. The results for our method are slightly worse here (11.32dB instead of 11.9dB) because we are now using a realisable estimate of the power spectrum. From ﬁgure 9.12 we can see that the results of the PRECGDTCWT method are considerably sharper and possess less residual noise than the results of the Wiener ﬁlter. Belge et al use a Gaussian convolutional kernel [11] h(x, y) = 1 exp −(x2 + y 2 )/(2σx σy ) 4σx σy
with σx = σy = 2 to blur the standard 256 × 256 Mandrill image and add zero mean white Gaussian noise to achieve a BSNR of 30dB. This is an unusual way of writing the kernel
9.5. COMPARISON EXPERIMENTS
195
2 2 (normally there would be a factor of π for normalisation and σx σy would be used to divide
x2 + y 2 ) but this is the kernel speciﬁed in the paper [11]. Their results are presented in root mean square error ( RMSE = ˆ (1/N 2 ) x − x 2 ) which we have converted to ISNR values (ISNR = −20 log10 (RMSE/R0 ) where R0 is the RMSE of the blurred image). They compare their method to CLS and a total variation algorithm. Table 9.13 compares these results with the PRECGDTCWT method (using a realisable power spectrum estimate). Our original results are written in bold. We see that in this experiment the PRECGDT
Algorithm CLS TV Adaptive edgepreserving regularization PRECGDTCWT
ISNR /dB 0.716 0.854 0.862 1.326
Figure 9.13: Comparison of diﬀerent published ISNR results for a Gaussian blur applied to the Mandrill image with 30dB BSNR. CWT method improves the results by 0.46dB compared to the adaptive edgepreserving regularization method. The original, blurred, and restored images using our method are shown in ﬁgure 9.14. One warning should be attached to these results: the deﬁnition of SNR is not explicitly stated in the reference, we assume the deﬁnition (based on the variance of the image) given in section 9.4. Sun [115] used a uniform 3 by 3 blur and 40dB BSNR acting on a 128 by 128 version of Lenna to test a variety of Modiﬁed Hopﬁeld Neural Network methods for solving the Constrained Least Squares formulation. Sun tested three new algorithms (“Alg. 1”, ”Alg. 2”, ”Alg. 3”) proposed in the paper [115] plus three algorithms from other sources (the “SA” and “ZCVJ” algorithms [133], and the “PK” algorithm [92]). We claim in section C.6 that the converged solution must be worse than the Oracle Wiener estimate, but acknowledge that intermediate results may be better. We tested the PRECGDTCWT method (using a realisable power spectrum estimate) and two choices of Wiener ﬁlter (the oracle Wiener ﬁlter, and a Wiener ﬁlter based on the same power spectrum estimate as used in the PRECGDTCWT method) on this problem. The results of all these comparisons is shown in ﬁgure 9.15. Our original results are written in bold.
196 CHAPTER 9.14: Deconvolution results for a Gaussian blur applied to the Mandrill image with 30dB BSNR using the PRECGDTCWT method with WaRD initialisation. . DECONVOLUTION Original Blurred Restored Figure 9.
.13 6.37 9. The PRECGDTCWT method does particularly well in this case4 . The inverse ﬁltering produces large noise ampliﬁcation for high frequencies and.41 8.38 5. 3 SA ZCVJ PK Oracle Wiener Realisable Wiener PRECGDTCWT ISNR /dB 6.1 Discussion Mirror wavelets are designed for hyperbolic deconvolution. the subbands have a tight frequency localisation for high frequencies.5. while a second term (the likelihood) is used to describe the observation model and makes explicit use of the PSF. 1 Alg.7dB. However. This is an appropriate model for the satellitelike PSF.84 6. in order to achieve a bounded variation in the ampliﬁcation for a particular subband. COMPARISON EXPERIMENTS 197 Algorithm Alg.9. 2 Alg. 9.5.19 5. In contrast.91 6. the 9 by 9 smoothing ﬁlter (SQ) has many zeros in its frequency response and consequently the mirror wavelets are inappropriate and give poor results. A better performance could be achieved by designing a more appropriate wavelet transform but no single wavelet transform will be best for all blurring functions. but almost 1dB worse than the Oracle Wiener estimate. A change in PSF requires a change in the likelihood term but the same 4 In this method we use a centred blurring ﬁlter to avoid the artiﬁcial BSNR improvement described earlier. This is better than the realisable Wiener ﬁlter results.19dB. the Bayesian approach uses one term (the prior) to encode the information about the image using wavelets.90 Figure 9.85 7. outperforming the “SA” algorithm by 2.15: Comparison of the PRECGDTCWT and Wiener ﬁltering with published results of Modiﬁed Hopﬁeld Neural Network algorithms for a 3 × 3 uniform blur applied to the Lenna image with 40dB BSNR. The best of the previously published results is the “SA” algorithm which attains an ISNR of 7.
including the Landweber or Van Cittert iterative techniques. It may be possible to use the HMT (Hidden Markov Tree [24]) to deduce the likely presence of large wavelet coeﬃcients (at a ﬁne detail scale) from the presence of large coeﬃcients at coarser scales and hence improve the estimation. This approach has already been shown to be promising for standard denoising when there is no blur (as described in chapter 3). . Note in particular that the PRECGDTCWT method outperforms even the Oracle Wiener method and consequently will perform better than any version of standard Wiener ﬁltering. Therefore the same wavelet method gives good performance for both of the blurring functions.4). In real world applications the following isses would need to be addressed: 1. All the results here are based on simulated data in order to allow the performance to be objectively measured. 2. The estimation of the noise variance and the Point Spread Function (PSF) of the blurring ﬁlter. DECONVOLUTION prior wavelet model should remain appropriate. The nondecimated wavelet transform outperforms the decimated wavelet transform.198 CHAPTER 9. It is not hard to see a plausible reason for this. The signal energy near diagonal edges will therefore tend to be concentrated in a smaller proportion of the wavelet coeﬃcients than for a real wavelet transform and hence will be easier to detect. A much larger improvement is gained from using the DTCWT which is an average of 1.2dB. the improvement is only about 0. If we now look at the relative performance of diﬀerent wavelets we see a familiar result. The degree to which the PSF is linear and stationary. The decimated wavelet gives shift dependent results. However.15dB better than the NDWT. The most promising direction for further research is by taking account of the correlations between wavelet coeﬃcients. We have achieved our goal of comparing the performance of complex wavelets with decimated and nondecimated real wavelets in a practical deconvolution method but we have certainly not “solved” deconvolution or even fully exploited the potential of complex wavelets in this application. the worst errors occur near edges and the DTCWT is able to distinuish edges near 45◦ from those near −45◦ . Deconvolution methods tend to produce rather blurred estimates near edges in the original image that signiﬁcantly aﬀect both the SNR and the perceived quality of the restoration. and shift dependence will always tend to cause worse performance for estimation tasks like this one (as discussed in section 8.4.
CONCLUSIONS 199 3.6 Conclusions We conclude that 1. The NDWT outperformed the DWT in this approach by about 0. The DTCWT outperformed the NDWT by an average of 1.9. 2. and hence are worse than Oracle Wiener ﬁltering. The Mirror wavelet method performed badly on the 9 by 9 uniform blur due to the presence of extra zeros in the response. 4. The WaRD algorithm gives a good starting point for the method but provides inadequate subsequent directions. 9.6. 6. The preconditioned conjugate gradient algorithm provides sensible search directions that achieve good results within ten iterations. including methods based on the Landweber or Van Cittert iteration. 5. In summary. Such minimax algorithms must be tuned to the particular blurring case.2dB. 8. The Landweber and Van Cittert iterative algorithms are special cases of Wiener ﬁltering. Shift dependence therefore has a small eﬀect on performance. The degree to which a Gaussian noise model is appropriate. complex wavelets appear to provide a useful Bayesian image model that is both powerful and requires relatively little computation. The DTCWT method performed better than the Oracle Wiener and hence better than all versions of standard Wiener ﬁltering. .15dB. The method based on the DTCWT performed better than all the other methods tested and better than the published results on similar deconvolution experiments. 7. 3.
DECONVOLUTION .200 CHAPTER 9.
In this chapter we explain how the contents of the dissertation support the thesis that complex wavelets are a useful tool and then discuss the wider implications of the research. For each application we have compared wavelet methods with alternative methods to determine when the wavelets are useful. and 9. These cases provide the main experimental justiﬁcation that complex wavelets are useful for image processing. For most of the applications the complex wavelet method also gives better results than the nondecimated wavelet transform. Nevertheless this application still supports the thesis because the new method is much faster than the nondecimated method. First we describe the experimental support for the thesis. we aim to compare complex wavelets with alternative wavelet transforms. 6. The complex wavelet models were found to be particularly good for segmentation of diﬀerently textured regions and image deconvolution. Now we describe the theoretical support for the thesis. Many peripheral results have already been mentioned in the conclusions section at the end of each chapter and we will not repeat them here. For this application the complex wavelet method produces almost exactly the same results as the nondecimated transform. For every application the experimental results for the complex wavelet methods display an improvement in accuracy over the standard decimated wavelet methods. We have examined four main image processing tasks that are described in chapters 5. The exception is chapter 8 on interpolation. These improvements can be seen qualitatively in the synthesized textures of chapter 4 and most clearly quantitatively in chapter 9. 8. In particular. There are two main reasons 201 .Chapter 10 Discussion and Conclusions The aim of the dissertation is to investigate the use of complex wavelets for image processing.
Approximate shift invariance 5. Therefore qualitative explanations are given rather than mathematical proofs. The second main reason for improved results is the reduction in shift dependence as compared to the standard fully decimated wavelet transform. It is usually clear why this should increase performance although the precise amount of improvement will be strongly dependent on the nature of the input images. In this case a solution based on the Gaussian Pyramid transform would be both faster and less shift dependent. DISCUSSION AND CONCLUSIONS why the complex wavelets are expected to give better results. Near balanced transform 4. Chapter 8 calculates an approximation for the reduction in SNR caused by shift dependence for interpolation that predicts that complex wavelets should achieve a signiﬁcant increase in quality compared to a typical real decimated wavelet. The DTCWT has a number of properties that are beneﬁcial in diﬀerent circumstances: 1. Six directional subbands at each scale (when used to analyse images) 3. The ﬁrst is increased directionality. The extra subbands also give a better model for the object edges that commonly appear in images and this results in the improved deconvolution performance of chapter 9. In chapter 4 we explain why the extra subbands are necessary for synthesizing texture with a diagonal orientation.1 Discussion The experimental comparisons did not always favour a complex wavelet model. The models were found to be too simple for synthesizing more regular textures like brickwork.202 CHAPTER 10. As a natural corollary we explain in chapter 5 why the extra features are useful for segmentating textures with diagonal features. The models were also found to be unnecessarily complicated for interpolating a stationary random process with an isotropic autocorrelation function. Perfect reconstruction 2. 10. Chapter 2 proves that any nonredundant wavelet system based on short ﬁlters will have signiﬁcant shift dependence. Complex outputs .
34]. Consequently many wavelet based methods rely on the nondecimated form of the wavelet transform. IMPLICATIONS OF THE RESEARCH 203 Table 10. Application Texture synthesis (A) Texture segmentation (A) Autocorrelation synthesis (A) Interpolation (A) Deconvolution (A) Texture database (B. The experimental results of this dissertation conﬁrm this connection. B The applications we have tested that are not described in this dissertation but have been published elsewhere [33.10.1: Summary of useful properties for diﬀerent applications We would expect complex wavelets to be most useful for the applications (such as deconvolution) that require all of the properties. 10.2 Implications of the research It is well known that shift dependence often causes a signiﬁcant degradation in performance.C) Levelset segmentation (B) Motion Estimation (C) Denoising (C) Perfect reconstruction √ √ √ √ √ Six subbands √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Near balanced √ Shift invariance √ √ √ √ √ √ √ Complex outputs Figure 10. This dissertation provides evidence that the beneﬁts of such a nondecimated transform can usually be achieved with only a minor increase in redundancy and computational time by means of a decimated complex wavelet transform.2. We include three types of application: A The applications we have tested that are described in the main body of the dissertation. A √ indicates that the property is useful for the corresponding application.1 summarises the importance of these properties for a number of applications. It is the author’s hope that this . C Applications in the literature described in chapter 3.
This is not only on the grounds that the DTCWT provides an eﬃcient substitute for the NDWT. but on the stronger claim that the DTCWT often improves the results over even nondecimated real transforms. DISCUSSION AND CONCLUSIONS dissertation will help to add complex wavelets to the list of good general purpose transforms that are automatically considered for a new problem. .204 CHAPTER 10.
11. This chapter discusses possible future research directions for these tasks and suggests some additional applications. Texturebased segmentation will not always be appropriate and it may be possible to combine the segmentation method described here with more traditional edgebased segmentation algorithms. In some preliminary work along these lines we have found that it is possible to signiﬁcantly improve the coarse estimates from this simple scheme using a reassignment technique [6]. We have proposed a number of simple methods based on the DTCWT that produce good results. Large complex wavelet coeﬃcients correspond to edges in an image and a simple edge detector can be built by simply detecting such large coeﬃcients.Chapter 11 Future work In this dissertation we have shown how complex wavelets can be used on a small selection of image processing tasks. This method is appropriate for the task of classifying satellite imagery into urban and nonurban regions but images taken at more moderate distances of real world objects will contain a mixture of image types. Another possibility is to extend the Hidden Markov Tree (HMT) model [28] to unify texture and edge based segmentation by adding extra hidden states that encode a form of 205 . A single complex wavelet transform could be used for detecting both texture and the object edges.1 Segmentation We have demonstrated the power of the complex wavelet features for supervised texture segmentation. At each position we have responses in the six bandpass subbands which allow the orientation to be estimated.
It may be possible to encode measures of persistence that will tend to group large valued wavelet coeﬃcients together. Hubel and Wiesel performed a famous study into the responses of single neurons at an early stage in the visual system and discovered responses that were frequency and orientation selective and were similar to Gabor functions [49. This grammar could be used to generate regions that can be recognised by their texture content. A piecewise smooth image has large valued wavelet coeﬃcients at the edges of regions while the wavelet coeﬃcients inside the regions have small values. There has been a large amount of research into how the human visual system detects and processes structure in images. By using models such as the HMT to give a better prior model for images we hope that the results will also improve. or even a combination of these. the texture synthesis method performs poorly on piecewise smooth images. 11. FUTURE WORK image grammar. 11. For example. Superresolution is the name for the process of constructing a single high . Such Gabor functions have shapes similar to complex wavelets. Marr [78] proposed the concept of the “full primal sketch”. An alternative direction is the possibility of using the method for a “superresolution” application. It may be possible to connect the low level features provided by complex wavelets with higher level structure extraction processes in order to achieve better texture synthesis or better image understanding.206 CHAPTER 11. For example.3 Deconvolution We mentioned in chapter 9 that the most promising direction for future research in deconvolution is taking account of the correlations between wavelet coeﬃcients. In the synthesized images the energy tends to be spread more uniformly across the image. by the presence of large wavelet coeﬃcients near their boundaries.2 Texture synthesis It would be interesting to determine additional features that can be used to improve the synthesis performance. 50]. In the full primal sketch groups between similar items are made and information about their spatial arrangement is extracted. It may even be possible to use the HMT directly for object recognition if a suﬃciently sophisticated grammar is used.
it may be possible to construct a readable estimate by combining the information from several frames. Clockwise curves will be represented by wavelet coeﬃcients corresponding to negative frequencies.11. For these data sets analogous segmentation techniques can be used to identify diﬀerent biological structures. The basic mathematics is very similar but there is the added complication of having to estimate an appropriate transformation to place all of the images onto a common reference grid. the author is also investigating the application of the DTCWT to 3D data sets (such as from medical imaging techniques). if a low resolution video camera is used to ﬁlm a document then.4.4 Other possibilities We have been using the complex wavelet transform for image processing but it can also be used in spaces of both greater and smaller dimension. An alternative approach to shape modelling is to represent each point (x. . The DTCWT can also be used for 1D applications. Using such an extended DTCWT on the sequence of complex numbers mentioned above will produce (complex valued) wavelet coeﬃcients with the following properties: 1. 2. 4. For example. while individual frames may be unreadable. However. For real numbers the 1D DTCWT only needs to possess 1 high frequency subband for each scale as the spectrum of real signals is the same for positive and negative frequencies. the 1D DTCWT can easily be extended to produce subbands for both positive and negative frequency. We have used the DTCWT to represent regions with smooth boundaries via a levelset approach [34]. Anticlockwise curves will be represented by wavelet coeﬃcients corresponding to positive frequencies. OTHER POSSIBILITIES 207 quality image from multiple low quality views of the same image. For example. A rotation by θ of the contour will alter the phase of every coeﬃcient by θ. 11. y) along a contour by the complex number x + jy to produce a onedimensional sequence of complex numbers. A translation of the contour will only alter the lowpass coeﬃcients. 3.
A rescaling by a factor of r of the contour will change the magnitude of every coeﬃcient by a factor of r. .208 CHAPTER 11. Such a model should be useful for recognition or coding of contours. FUTURE WORK 5.
.2) 2 Wij = σ2 i=1 j=1 (A. M and N are positive integers.1 Ampliﬁcation of white noise by a transform Consider applying independent white noise of variance σ 2 to each of the inputs {x1 .5) . E(wi ) is given by the variance because E(wi ) = 0 and so the expected energy in the wavelet coeﬃcients is i=M i=M 2 wi i=1 E = i=1 2 E wi i=M j=N (A. σ 2 j=N j=1 2 2 Wij ) distribution. . . and W is a real M ×N matrix but there is no restriction on the relative sizes of M and N.1) and so wi has a N(0.4) = j=1 i=1 WT 209 ji Wij (A. The ith wavelet coeﬃcient wi is given by j=N wi = j=1 Wij xj (A. . w ∈ R M . xN } of a real transform represented by w = W x where x ∈ R N .Appendix A Balance and noise gain A. We ﬁrst calculate an expression for the expected energy in the wavelet coeﬃcients.3) Now consider the trace of W T W (given by the sum of the diagonal elements) j=N tr(W W ) = j=1 T WTW j=N i=M jj (A.
210 Appendix A.3 together we get the desired result that the expected total output energy is tr(W T W )σ 2 . the energy of the error is given by the energy of the output of the linear transform P applied to a vector of white noise.dN Consider a real linear transform represented by w = W x. In words. Using the ﬁrst result of this section we can write that the expected energy of the error is given by tr(P T P )σ 2 .5.7) Putting equations A.2 Deﬁnition of d1. g = E { y − x 2} E { v − w 2} tr(P T P )σ 2 = Mσ 2 tr(P T P ) = M N M 1 P 2. = M i=1 j=1 ij A. If W has size M by N and now M ≥ N then we can take the singular value decomposition to give three real matrices . v =w+n If P is a matrix representing reconstrution ﬁltering (that satisﬁes PR so that P W = IN ) then the expected energy of the error is given by E x−y 2 = E = E = E x − Pv 2 2 2 P W x − P (w + n) Pn . .1.14 we are now in a position to write down a ﬁrst expression for the noise gain g.7 and A. Using deﬁnition 2..6) = (A. Suppose now we consider the model described in section 2..2 j=N i=M = j=1 i=1 i=M j=N Wij Wij 2 Wij i=1 j=1 (A. We have wavelet coeﬃcients v consisting of the original wavelet coeﬃcients w = W x plus a vector n containing independent white Gaussian noise of mean zero and variance σ 2 .
S.Appendix A.13) (A.. then the deﬁnition of a frame [30] says that a set of wavelets give a frame if and only if there exists real values for A and B such that 0 < A ≤ B < ∞ and ∀x ∈ R N A x Consider W x 2 .3 211 U.10) = xT W T W x = xT V S T U T USV T x = xT V S T SV T x = yT S T Sy i=N (A...i . (A.15) i=N i=1 2 di yi varies = i=1 2 di y i where y = V T x.14) (A. Note that . Wx 2 2 ≤ Wx 2 ≤B x 2 (A. (A matrix is deﬁned to be orthogonal if U T = U −1 .12) (A. . • V has size N by N and is an orthogonal matrix. A.3 Determining frame bounds If we write a set of wavelets operating on a ﬁnite number of input samples in matrix form. All the di are real and nonnegative and so clearly between dN y 2 and d1 y 2 as dN and d1 are the smallest and largest of the di .) di is deﬁned to (A. V such that W = USV T where • U has size M by M and is an orthogonal matrix.9) Without loss of generality we can order these such that d1 ≥ d2 ≥ . SN. • S has size M by N and is zero except for entries on the main diagonal S1.1 ...11) (A. ≥ dN .8) R n . For a matrix of size n × n this is equivalent [17] to saying that the rows form an orthonormal basis of be the square of the ith singular value: 2 di = Si.N .
20) (A. The input energy is σ 2 for each of the N inputs and so the signal energy gain is tr(W T W )σ 2 1 tr(W T W ) = 2 Nσ N 1 tr(V S T U T USV T ) = N 1 tr(V S T SV T ) = N 1 tr (SV T )T SV T = N 1 = tr SV T (SV T )T N 1 = tr SV T V S T N 1 = tr(SS T ) N i=N 1 di = N i=1 where we have made use of the result tr(AT A) = tr(AAT ).16) with the bounds attainable by x = V eN or x = V e1 .21) (A.24) . Also note that because V T V = V V T = I.22) (A. Therefore the gain in signal energy is given by the average of the di . we can 2 attain these bounds by y = eN or y = e1 . From A.5 by writing ek for the vector in R N with a 1 in the kth place and zeros elsewhere.19) (A.4 Signal energy gain The deﬁnition of a frame says that (except for tight frames) the energy of a signal can change as it is transformed (within certain bounds) and so we deﬁne the signal energy gain as the gain in energy for a theoretical white noise input. This means that the tightest possible frame bounds for the transform are dN and d1 and that the wavelets associated with the transform form a frame if and only if dN > 0. x = xT x = xT V V T x = V T x 2 = y 2. Putting these last results together we discover that ∀x ∈ R N dN x 2 ≤ Wx 2 ≤ d1 x 2 (A.17) (A.1 we know that the expected energy in the wavelet coeﬃcients is given by tr(W T W )σ 2 .23) (A. (A.18) (A.212 Appendix A. A.
The ith row qT is involved 1 N i only in producing xi and so we need to minimise qi equivalent to minimising qi qT = ei (W T W )−1 W T . W T W will be invertible if and only if all the singular values are nonzero which will be true if and only if the associated wavelets form a frame.29) (A.27) (A.1 shows that the noise gain is given by the sum of the squares of the entries in the inversion matrix P . This is i subject to (qT W )T = W T qi = ei where ei is deﬁned as in i A.30) (A. We assume that the wavelets form a frame in order for a PR system to be achievable.Appendix A. If M > N then the transform is redundant and there are many choices for the reconstruction matrix. . Suppose that the inversion matrix P has rows qT .26) (A. We can now substitute the singular value decomposition for W to get Q = (V S T U T USV T )−1 V S T U T = (V S T SV T )−1 V S T U T = (V T )−1 (S T S)−1 V −1 V S T U T = V (S T S)−1 S T U T (A.5 213 A.28) 2 2 subject to qT W x = xi . i Putting all the separate solutions together we discover that the minimum noise gain perfect reconstruction matrix Q is given by Q = (W T W )−1 W T . Putting white noise of variance σ 2 in each wavelet we have a total expected energy of Mσ 2 in the wavelets. We seek the solution with minimum noise gain. qT . From A..5 Reconstruction noise gain bound Consider the problem of inverting w = W x.. This is a standard problem whose solution is given by qi = W (W T W )−1 ei and so S T S is the diagonal N by N matrix with diagonal entries of di and so S T S is always invertible as these di are all nonzero. Note that W T W must be invertible for this solution to exist.25) (A. We can solve for the minimum noise gain reconstruction by separately solving the problem of inverting to get each of the xi .1 we expect an energy of tr(QT Q)σ 2 after reconstruction and so we can now calculate the expected noise gain: tr(QT Q)σ 2 1 = tr(QT Q) Mσ 2 M 1 = tr((V (S T S)−1 S T U T )T V (S T S)−1 S T U T ) M 1 = tr((US(S T S)−1 V T )V (S T S)−1 S T U T ) M (A..31) .3. Section A.
34) (A.32) (A.36) Q represents the transform with minimum noise gain and we conclude that any linear perfect reconstruction transform that is used to invert W has noise gain bounded below by 1 M i=N 1 i=1 di and this lower bound is achievable. Therefore for a tight frame the transform can be inverted by the matrix W T .7 = = = = = 1 tr(US(S T S)−1 (S T S)−1 S T U T ) M 1 tr((S T S)−1 S T U T US(S T S)−1 ) M 1 tr((S T S)−1 S T S(S T S)−1 ) M 1 tr((S T S)−1 ) M i=N 1 1 M i=1 di (A.7 we ﬁnd that the unbalance is tr (P T − W )T (P T − W ) .38) (A. From before we know that the sum of the singular values is N and so d = 1.35) (A. P . used to invert W is given by Consider the unbalance between P T and W.7 Relation between noise gain and unbalance tr(P T P ) . We can expand this expression to get: tr (P T − W )T (P T − W ) = tr P P T − W T P T − P W + W T W = tr(P T P ) − 2tr(P W ) + tr(W T W ) = tr(P T P ) − 2N + N (A. A.6 Consequences of a tight frame If the frame is tight then d1 = dN and so all the singular values are equal.1 we know that the noise gain of any linear perfect reconstruction transform. and this inversion achieves the lower bound on noise gain.37) = tr(P P T ) − tr(W T P T ) − tr(P W ) + tr(W T W )(A.39) (A. We can write S T S = IN and so the matrix Q representing a reconstruction with the least noise gain becomes Q = V (S T S)−1 S T U T = V S T U T = W T . A.40) .33) (A.214 Appendix A. We deﬁne the unbalance to be the sum of the squares of the diﬀerences between the two matrices and so from equation A. M From section A.
Appendix A.41) where we have used that P W = IN (as P is a perfect reconstruction transform) and that tr(W T W ) = N (from the normalisation condition).42) is the noise gain of the reconstruction and U = tr (P − W T )(P T − W ) is the unbalance. We can rearrange this last result to ﬁnd that: g = (N + U)/M where g = tr(P T P ) M (A. .7 215 = tr(P T P ) − N (A.
216 Appendix A.7 .
Although some of the proofs are long.Appendix B Useful results B. The covariance RZ+Y of the sum of the random processes is given by RZ+Y = E {(Za + Ya ) (Zb + Yb)  b − a = d} = E {Za Zb + Za Yb + Ya Zb + Ya Yb  b − a = d} = E {Za Zb  b − a = d} + E {Za Yb  b − a = d} +E {Ya Zb  b − a = d} + E {Ya Yb  b − a = d} = RZ (d) + E {Za  b − a = d} E {Yb  b − a = d} +E {Ya  b − a = d} E {Zb  b − a = d} + RY (d) = RZ (d) + RY (d) where we have made use of E {AB} = E {A} E {B} for independent random variables. Proof. We have also made use of 217 . Lemma 1 If Z and Y are independent zero mean wide sense stationary discrete Gaussian random processes. The results are not original but are included (expressed in the notation of this dissertation) for the sake of completion.1 Summary This appendix contains a number of useful mathematical results. they are all relatively straightforward. then the covariance of Z + Y is given by the sum of the covariances of Z and Y. Let RZ (d) be the covariance of Z for vector displacement d and similarly let RY (d) be the covariance of Y. and that E {Za } = E {Zb } = 0 as the processes are zero mean.
Therefore w(a) = ka for some constant k and we can write the ﬁlter as f (r) = exp{w(r 2)} = exp{kr 2 } thus showing that the ﬁlter must be a Gaussian. noting that Z + Y will also have zero mean. If f (0) = A then it is easy to show the ﬁlter is of the form f (r) = A exp{kr 2 }.1 the equivalence between correlation and covariance for zero mean processes. Lemma 3 1 1 exp − (z − a)T A (z − a) exp − (z − b)T B (z − b) 2 2 ∝ exp − 1 2 z − (A + B)−1 (Aa + Bb) T (A + B) z − (A + B)−1 (Aa + Bb) . Then as the ﬁlter is separable we know that f (r) = g(x)h(y) and if we assume f (0) = 1 we can adjust the scaling of g and h such that g(0) = h(0) = f (0) = 1. Lemma 2 The only separable circularly symmetric ﬁlter is a Gaussian.218 Appendix B. Then we can set y = 0 to ﬁnd f (x) = g(x)h(0) = g(x) and similarly x = 0 to ﬁnd h=g=f This means that we have the following relationship f (r) = f (x)f (y) Taking logarithms and deﬁning w(x2 ) = log f (x) we ﬁnd w(x2 + y 2 ) = w(x2 ) + w(y 2) and so w(a) is a linear function with w(0) = log f (0) = log 1 = 0. Proof. Suppose the ﬁlter is given by f (r) where r is the radius.
1 219 Proof. but we shall prove it directly by multiplying the RHS by the inverse of the LHS. I is the S × S identity matrix. D is a S × 1 vector. then −1 −1 2 C D I/σM 0 + = T T 2 0 D σ 0 C D T D σ 2 − C D T 2 IσM +C −1 C D Proof. This is a special case of the matrix inversion lemma.Appendix B. −1 2 C D I/σM 0 + DT σ 2 0T 0 C D = − − T D σ 2 − C C D T 2 IσM +C −1 C D D −1 2 I/σM 0 D 0T T 0 DT σ 2 C D −1 T + C C D DT σ 2 −1 DT σ 2 C D 2 I/σM 0 0 C D 0 D σ 2 2 IσM +C C D T T 2 IσM +C −1 C D . 0 is a S × 1 zero vector. 1 1 exp − (z − a)T A (z − a) exp − (z − b)T B (z − b) 2 2 1 = exp − (z − a)T A (z − a) + (z − b)T B (z − b) 2 1 = exp − z T Az − 2z T Aa + aT Aa + z T Bz − 2z T Bb + bT Bb 2 1 = exp − z T (A + B) z − 2z T (Aa + Bb) + aT Aa + bT Bb 2 1 T z − (A + B)−1 (Aa + Bb) (A + B) z − (A + B)−1 (Aa + Bb) = k exp − 2 where k = exp − 1 − (Aa + Bb)T (A + B)−1 (Aa + Bb) + aT Aa + bT Bb 2 Lemma 4 If C is a S × S symmetric matrix.
2 Bayesian point inference This section describes the construction of the posterior distribution for the value at a point in the case of noisy irregular sampled data from a wide sense stationary discrete Gaussian random process. . Z2 .1) .220 Appendix B. . .2 = − = 2 2 C/σM D/σM 0 T 0 D −1 +I− C D 2 C/σM + I − I 0 T −1 2 IσM +C −1 C D C 2 (IσM +C) DT σ 2 2 2 C/σM D/σM T DT σ 2 +I− 2 C + IσM T 0T 2 IσM +C 2 (IσM +C) −1 C D −1 2 /σM 0 + 0 0 C D I 0T −1 2 IσM +C C D − 0T C D = 2 2 C/σM D/σM 0 + T 0 −1 +I− C D +I− 2 C + IσM 0 − I 0 T T 2 IσM +C −1 −1 C D 2 /σM 2 (IσM +C) 2 (IσM +C) 0T 2 2 C/σM D/σM T 0T C D 2 /σM C D = = I 0 0 B.2 and we use the same notation as in chapter 8. 2 DT σk (B. . This is a multivariate Gaussian distribution that can be written in the following form: Γ Zk C D s N 0. Suppose we wish to obtain a point estimate for the random variable Zk corresponding to location xk (we assume k > S). ZS corresponding to the sample locations. The observation model has been described in section 8. Consider the a priori joint distribution of this random variable with the random variables Z1 .
Γ = γ  Y = y) = p(Y = y  Zk = z. Γ = γ)p(Zk = z. Γ = γ  Y = y) ∝ exp − where γ z −a A−1 γ z −1 −a A = IS /σ 0T 2 0 0 + C D −1 2 DT σk a = A IS /σ 2 0 0T 0 −y 0 .3 can be expressed as Y s N Γ.Appendix B.2 and so we can expand this equation to p(Zk = z.2) where σ 2 is the (known) variance of the measurement noise. The measurement equation 8. Γ = γ) is deﬁned by equation B.2 221 where we have stacked the ﬁrst S random variables into a column vector Γ. C is the covariance between the values at the ﬁrst S locations. Using Bayes’ theorem we can write p(Zk = z. D is the covariance between Zk and 2 the random variables for the ﬁrst S locations. Γ = γ  Y = y) = k exp − y − γ 2 1 2 /2σ 2 exp − T γ z T C D γ −y z T D 2 σk −1 z z IS σ 2 0 1 γ−y = k exp − 2 z 0T 0 T −1 C D γ 1 γ exp − T 2 2 D σk z z where k is a normalisation constant that ensures that the expression is a valid pdf. σ2 IN (B. Using lemma 3 we can write that 1 2 T p(Zk = z. Γ = γ) p(Y = y) The prior pdf p(Zk = z.1 and the likelihood pdf p(Y = y  Zk = z. Γ = γ) is deﬁned by the distribution in equation B. and σk is the variance of Zk .
If we write Zk as Zk = 0 1 T Γ Zk we can use the algebraic identity proved in lemma 4 to simplify A and calculate that Zk has a normal distribution with mean 0 1 T T a = 0 1 0 1 A − T y/σ 2 0 C D T = − D 2 σk − C D T σ 2 IS + C −1 C D −y/σ 2 = − 0 D 2 σk T − DT σ 2 IS + C −1 −1 C D −y/σ 2 0 −1 = DT y/σ 2 − DT σ 2 IS + C = DT σ 2 IS + C = DT σ 2 IS + C −1 −1 Cy/σ 2 Cy/σ 2 σ 2 IS + C y/σ 2 − DT σ 2 IS + C y. . We are now able to compute the posterior distribution for Zk alone.2 = A −y/σ 2 0 and so we have proved that the joint posterior distribution is a multivariate Gaussian distribution with mean a and covariance matrix A.222 Appendix B.
We attempt to relate each deconvolution technique to the Bayesian framework described in section 9. Let m(k) be the index of this greatest intensity. the last restored image (for cosmetic reasons this output is often smoothed with a 223 . .1 CLEAN The CLEAN algorithm [44] consists of two steps that are repeated until the energy of the residual is below a certain level (Tikhonov regularisation). . The output of the algorithm is x(K) . [x(k) ]i = [x(k−1) ]i + sγ i = m(k) [x(k−1) ]i i = m(k) Let K be the number of iterations before convergence. 2. C. The steps at the k th iteration are 1.1. The emphasis in this review is on the estimates produced by the diﬀerent approaches rather than on the methods used to solve the minimisation problem. Add a point to the restored image at the peak position of strength s multiplied by a damping factor γ known as the loop gain.Appendix C Review of deconvolution techniques This appendix reviews a number of deconvolution methods from a Bayesian perspective. x(k) . Find the strength s and position of the greatest intensity in the residual image y − Hx(k−1) . . The algorithm starts with a blank image x(0) and produces a sequence of restored images x(1) . .1.
1 Gaussian after this restoration process).1 (and the additional constraint that xi ≥ 0 for all i) with the choice of f (x) = − log(β) + α i xi where α is chosen to make the algorithm terminate with K nonzero point sources in the estimate and β is a constant chosen so that the prior corresponds to a valid pdf (with total integral equal to one). This equivalence does not always hold. under certain conditions the CLEAN algorithm uses a prior assumption that all the pixels are independently identically distributed with an exponential distribution. Marsh and Richardson have proved [79] that under certain conditions the CLEAN estimate is equivalent to the MAP maximisation described in section 9. This corresponds to a model in which the prior distribution for each pixel’s intensity is an independent and identically distributed. The conditions for the proof to hold are essentially that the original image consists of suﬃciently wellseparated point sources. . This choice of f (x) corresponds to a prior pdf of p(x) = β exp −α i xi I(X) = β i I(xi ) exp {−αxi } where we deﬁne I(x) to be an indicator function that is zero if any component of x is negative.224 Appendix C. More eﬃcient implementations of this method exist. one sided generalised p Gaussian. two examples are the Clark algorithm [25] and the CottonSchwab algorithm [106]. Jeﬀs and Gunsay provide a counter example [55] of two close point sources that are restored by the CLEAN algorithm to three sources (including a false one half way between the two true sources) whereas the corresponding Bayesian MAP estimate correctly restores the two original sources. and one otherwise.1. As the MAP estimate seems to perform better the authors proposed a maximally sparse restoration technique [55] that explicitly uses the following image prior: p(x) ∝ i I(xi ) exp {−αxp } i where p is a shape parameter that is normally in the range 0 < p < 1 for astronomical denoising. In summary.
g. p → ∞ for uniform). There is some variation in the choice of entropy deﬁnition [125] and another common deﬁnition used for image deconvolution is S(x) = − i xi log xi mi e where mi is the ith component of a default image m. For this choice of entropy the prior pdf can be factorised as p(x) ∝ I(x) i exp xi xi log λ mi e . the maximumentropy principle (MAXENT) is [53. p = 1 for exponential. One way of applying this principle to image deconvolution [29] results in the following deﬁnition of prior probability: p(x) ∝ exp {S(x)/λ} I(x) where S(x) = − i xi log j xj xi j xj and λ is an undetermined parameter.Appendix C. S(x) is known as the conﬁgurational entropy of the image.2 225 The generalised p Gaussian distribution is also known as the BoxTiao distribution and includes many other distributions for speciﬁc choices of the shape parameter (e.2 Maximum Entropy Brieﬂy stated. The maximum entropy method ﬁts naturally into our common Bayesian framework with the choice of f (x) = −S(x). C. p = 2 for Gaussian. we should draw them from that probability distribution that has the maximum entropy permitted by the information we do have. This default image is often chosen to be a low resolution image of the object and will be the maximum entropy estimate in the limit as the measurement noise increases to inﬁnity. 54] when we make inferences based on incomplete information.
The image x should contain nonnegative pixel values.226 Appendix C. In mathematics. λx1 + (1 − λ)x2 ∈ S Combettes uses this method for the problem of restoring an image blurred with a 9 × 9 uniform kernel by means of the following constraints (that each correspond to convex sets) [27]: 1. ∀λ ∈ [0. A factorised joint pdf means that the distribution of each pixel is independent. ∀x2 ∈ S.3 Projection methods There have been several attempts [42.10 gives an example of using the MAXENT principle together with a wavelet transform to give a more appropriate algorithm. 1]. A convex set is a set for which any linear interpolation between two points in the set will also be in the set. (These known values are taken from the DFT of the observed image divided by the gain of the blurring ﬁlter at the corresponding frequencies. Section C. 71. Note that such a conclusion rejects merely the precise application rather than the MAXENT principle itself.3 = i xi mi e xi /λ I(xi ). 2. 131] to perform image recovery by the method of projection onto convex sets. The grounds for the rejection are that these methods do not make use of all the available information about the nature of real images. The simplest method for this is to sequentially project an estimate onto each of the sets in turn and then repeat until all the constraints are satisﬁed. we have argued that real world images contain signiﬁcant correlations and that therefore such a maximum entropy deconvolution is less appropriate.) . The convex sets specify required properties for the restored image and then some algorithm is used to ﬁnd a solution that is in the intersection of all of the sets. It is assumed that the DFT of the image x is known on one fourth of its support for low frequencies in both directions. a set S is convex if and only if ∀x1 ∈ S. C. While this may be appropriate for astronomical images.
The problem with Wiener ﬁltering is the estimation of signal to noise ratios. Multiply each coeﬃcient fi by a gain gi given by m∗ i gi = 2 + 1/SNR mi  i where SNRi is the estimated ratio of signal energy to noise energy for this coeﬃcient and m∗ represents the complex conjugate of mi . It can be easily shown that the best gain (in terms of minimising the expected energy of the error) for a given image x is given by SNRi =  [F x]i 2 . We call the corresponding gain the Oracle gain but this cannot be used in practice because it requires access to the original image. the prior takes some constant value within the intersection. C. It is assumed that the above constraint will be satisﬁed by the restored image. However.4 227 3.4 Wiener ﬁltering Wiener ﬁltering [5] can be implemented by the following algorithm: 1. with a 95% conﬁdence coeﬃcient. The corresponding prior is proportional to the characteristic function of the intersection of the ﬁrst two constraints described above. Invert the Fourier transform of the new coeﬃcients gi fi . 3. In other words. The assumption of Gaussian noise means that. Compute the Fourier transform coeﬃcients fi = [F y]i of the data. Compute the Fourier transform coeﬃcients mi of the blurring ﬁlter. i 4. but is zero outside. the image satisﬁes the constraint y − Hx 2 ≤ρ where ρ takes some value that can be calculated from statistical tables based on the variance σ 2 of the measurement noise and the number of pixels. partly because the ﬁnal solution can depend on the starting conditions (normally chosen to be the observed image). 2.Appendix C. if ρ is reduced until the intersection of the constraint sets contains a single point then the method becomes equivalent to Miller regularisation. For a general value of ρ this scheme does not naturally ﬁt into the Bayesian framework. .
C.1) The previous methods were equivalent to assuming that the pixel intensity values were independent and indentically distributed. These iterative algorithms deﬁne a sequence x(0) . . and then a better way.5 Iterative methods The three most common iterative deconvolution methods are the Van Cittert [112]. but misleading. When these algorithms are used for astronomical images there is also a step that after each iteration sets all the negative entries in xk to zero. . the above cost function cannot be factorised in the same way. x(K) of restored images evolving according to some equation [112].228 Appendix C. Van Cittert proposed the following iteration: x(n+1) = x(n) + α y − Hx(n) (C. In contrast. Landweber [68]. way of viewing these methods within the Bayesian framework. . but with diﬀerent Gaussian distributions for each component. The restored solution is taken to be the restored image at a particular iteration. . and RichardsonLucy [102] methods. If these algorithms converge then the converged solution can be either considered as the MAP estimate corresponding to a ﬂat (improper) prior p(x) ∝ 1 . We ﬁrst describe the usual [81].5 In terms of our Bayesian viewpoint the cost function that corresponds to Wiener ﬁltering is f (x) = i 1 σ 2 SNR i  [F x]i 2 (C. Instead it is the Fourier components of the image that are assumed independent.4) R N to represent a diagonal matrix of size N by N with the entries of a along the diagonal.2) where α is a convergence parameter generally taken as 1. Consider the Van Cittert and Landweber methods.3) y (C. Landweber proposed the iteration x(n+1) = x(n) + αH T y − Hx(n) The RichardsonLucy method uses x(n+1) = diag x(n) H T diag Hx(n) We use the notation diag {a} for a ∈ −1 (C.
The RichardsonLucy method can be viewed in the same way except that it uses a diﬀerent observation model. Based on the idea of random photon arrival times the observations are modelled as independent samples from Poisson distributions where the parameters of the Poisson processes are the unknown source intensities x. Recall that the blurring ﬁlter can be expressed as F H MF .5 229 or the maximum likelihood estimate. In this model the observations in y consist of nonnegative integer counts of the number of photons detected. We can write the iteration separately for each Fourier coeﬃcient as oi (0) (n+1) = oi (1 − αmi ) + αfi (n) Using the initialisation of oi = 0 we can solve this equation to ﬁnd K−1 (K) oi = αfi n=0 (1 − αmi )n . (If the nonnegativity constraint is used then the solution is the MAP estimate for a prior of p(x) ∝ I(x)). To explain the eﬀect of early convergence we assume that we are deconvolving images without using the positivity constraint. The methods are therefore unusual in that it is crucial to terminate the algorithm before convergence is reached. The eﬀect has been explained [15] with an eigenvector analysis but here we give an alternative treatment based on the Fourier transform. The likelihood function is p(yx) = i [F x]yi i exp {− [F x]i } yi ! Although the iterative methods are sometimes justiﬁed [81] by these choices of prior and likelihood the converged estimates can be severely corrupted due to the large ampliﬁcation of noise [15] while the intermediate restorations are better.Appendix C. Let o(n) be the Fourier transform coeﬃcients of the restored images: o(n) = F x(n) Taking the Fourier transform of the Van Cittert iteration we ﬁnd F x(n+1) = F x(n) + α y − Hx(n) o(n+1) = F x(n) + α F y − F F H MF x(n) = o(n) + α f − Mo(n) = o(n) (IN − αM) + αf This is a very simple expression because all the matrices are diagonal.
and the level of blurring mi for that coeﬃcient. The only diﬀerence is the multiplication by H T .5 1 − (1 − αmi )K αmi 1 − (1 − αmi )K = fi mi = αfi The restored image produced by the Van Cittert iteration is given by x(K) = F H o(K) and by comparison with the algorithm for Wiener ﬁltering above we conclude that Van Cittert restoration is equivalent to Wiener ﬁltering with a gain gi given by 1 − (1 − αmi )K gi = .230 Appendix C. Figure C. Similarly we can also take the Fourier transform of the Landweber method. The assumption about the SNR level of a Fourier coeﬃcient is a function of α.1 shows the assumed SNR values for gains mi varying between 0 and 1 if K = 3 and α = 1. This may lead to high frequency noise artefacts in reconstructed images. As this is a real matrix we can write HT = HH = F H MF H = F HMHF Using this result we can rewrite the Landweber method in terms of Fourier coeﬃcients as oi (n+1) = oi (n) 1 − αmi 2 + αm∗ fi i . mi The corresponding assumption about the signal to noise ratio is SNRi = = 1 m∗ i gi − mi 2 1 − mi 2 mi 2 1−(1−αmi )K Assuming that α is small enough for the algorithm to converge then it is clear that gi → 1/mi in the limit as K → ∞ but the algorithm is designed to terminate long before convergence. for small gains the assumed SNR actually increases. K. However. For a typical blurring function that decays with increasing frequency the Van Cittert method eﬀectively assumes that the data has a power spectrum that also decays with increasing frequency.
1: Eﬀective assumption about SNR levels for Van Cittert restoration (K = 3.4 0.8 1 Figure C.Appendix C.2 0. .α = 1).6 Filter gain 0.5 231 70 60 Assumed SNR/dB 50 40 30 20 10 0 0.
2 shows this assumed SNR values for gains mi varying between 0 and 1 if K = 3 and α = 1.5 = fi m∗ i 1 − (1 − αmi 2 ) mi 2 n+1 1 − (1 − αmi 2 ) = fi mi following assumption about the SNR SNRi = = 1 m∗ i gi n+1 Again we conclude that the Landweber method is equivalent to Wiener ﬁltering with the − mi 2 1 − mi 2 mi 2 1−(1−αmi 2 )K Figure C.4 0. 60 The ﬁgure shows that the Landweber method has a smooth decrease in 50 Assumed SNR/dB 40 30 20 10 0 0 0.2 0.6 Filter gain 0.8 1 Figure C. .232 Appendix C.2: Eﬀective assumption about SNR levels for Landweber restoration (K = 3.α = 1). assumed SNR levels even for low gains and therefore the restored results should avoid the high frequency artifacts of the Van Cittert method.
6 233 The RicharsonLucy method is not as simple to express in the Fourier domain due to the presence of multiplications and divisions that are implemented pixelwise on images. In other words. We use the notation O( a ) to represent terms that are of order a . The iteration becomes x(n+1) + 1/ = diag x(n) + 1/ H T diag H x(n) + 1/ −1 (y + 1/ ) We perform this shift in order to be able to approximate multiplications and divisions by additions and subtractions.1. O( a ) represents a polynomial function of in which every exponent of is at least a. We will assume that the image has been rescaled such that the blurring ﬁlter has unity response at zero frequency and thus H1 = H T 1 = 1 (H T corresponds to ﬁltering with a ﬁlter h(−x. the restored image from this shifted RichardsonLucy method will tend to the Landweber solution in the limit as → 0. We have described the link between the iterative methods and Wiener ﬁltering. we claim that early termination corresponds to making approximately the same assumption (of a stationary Gaussian random process) about image structure as in the Landweber method and consequently that early termination of the RichardsonLucy method is approximately a particular case of Wiener ﬁltering.Appendix C. An explicit deﬁnition of the assumed cost function is given by substituting the above expressenions for the assumned SNR levels into equation C. For suﬃciently small values of x(n+1) = −1/ + diag x(n) + 1/ = −1/ + diag x(n) + 1/ H T diag HT IN − this expression can be written as 1 − Hx(n) + O( 2 ) 2 (y + 1/ ) diag Hx(n) + O( 3 ) (y + 1/ ) = diag x(n) H T 1 + H T y − H T diag Hx(n) 1 + O( ) = diag x(n) 1 + H T y − H T Hx(n) + O( ) = x(n) + H T y − Hx(n) + O( ) Comparing this with equation C. . In particular. To demonstrate this claim we imagine applying the RichardsonLucy method to an image after increasing all the intensity values (in the data and the intermediate restorations) by a constant 1/ . y) and so will also have unity response at zero frequency). However.3 we conclude that (if the algorithm is initialised to have x(0) = 0) then the shifted RichardsonLucy method (with removed positivity constraints) is within order of the Landweber method. −y) rather than h(x.
8 1 0.7 C.7 1 0. 115. 61].7 Recent attempts to solve this problem have been based on Hopﬁeld neural networks [133. 0.7 1 −6. This operator is chosen to apply little regularisation where the signal energy is expected to be high. 92] where the idea is to perform gradient descent minimisation of the energy function with the restriction that the change to each intensity value must always belong to the set {−1. spaceinvariant ﬁlters then the energy function can be eﬃciently represented in terms of the Fourier coeﬃcients of the image x.234 Appendix C.6 Constrained Least Squares Constrained least squares methods use f (x) = Cx 2 for some square matrix C known as the regularising operator [15. C. For many images the signal energy is concentrated at low frequencies and so the operator is chosen to act like a high pass ﬁlter. This proves that the performance of the Oracle Wiener ﬁlter is an upper bound on the performance of such methods unless stopping before convergence gives improved results. In this case it is straightforward to show that the estimate that minimises the energy function is given by a Wiener ﬁlter (with the estimated SNR values inversely proportional to the squares of the Fourier coeﬃcients of the regularising operator).7 Total variation and Markov Random Fields The total variation functional JT V (u) is deﬁned on a continuous function u(x. y) as [90] JT V (u) = y x ∇udxdy .7 1 0. If both the blurring operator H and the regularising operator C represent linear. One common choice for the coeﬃcients is a discrete approximation to a 2D Laplacian ﬁlter such as [115] 0. but signiﬁcant regularisation where the noise energy dominates the signal. 1}.
This algorithm and the mirror wavelet basis are described in detail in section C. C. Their mirror wavelet transform is similar to a standard real . Kalifa and Mallat [58] use α = 0 together with a mirror wavelet basis. An algorithm of this type is used in the production channel of CNES satellite images[58]. Donoho [38] uses α = 0 so that all the noise removal must be done by the wavelet denoising. 132] provide a more general way of constructing a prior pdf based on local neighbourhoods that again aims to favour smooth regions while permitting occasional discontinuities.Appendix C.9. Wiener ﬁltering corresponds to α = 1 with no wavelet denoising. and instead favours edges and tends to produce piecewise constant images. H(f) is the Fourier transform of the linear blurring ﬁlter.8 235 JT V (u) is used as the stabilising functional within Tikhonov regularisation and therefore the corresponding f (x) is equal to a discretized version of the above integration. This approach is known as WaRD standing for Waveletbased Regularized Deconvolution. Nowak and Thul used an underregularized linear ﬁlter (0 < α < 1) [87] and Neelamani et al studied the eﬀect of the amount of regularization and found α ≈ 0. Several choices for the amount of regularisation and the wavelet denoising method have been proposed. Both the inverse ﬁlter and the wavelet denoising can be used to reduce the level of noise.5) where Px (f) is the power spectral density of the signal.8 Minimax wavelet deconvolution The minimax wavelet methods generally work via a two stage process that uses a stationary inverse ﬁlter followed by a wavelet denoising technique. The prior pdf corresponding to total variation denoising therefore penalises oscillations in an image as this increases the total variation. To avoid diﬃculties with the nondiﬀerentiability the functional Jβ (u) is often used instead [123] Jβ (u) = y x ∇u2 + β 2 dxdy. The frequency response Gα (f) of the inverse ﬁlter is given by Gα (f ) = 1 H(f) H(f)2 Px (f ) H(f)2 Px (f) + ασ 2 (C.25 usually gave good results [84]. σ 2 is the amount of noise added after blurring. and α is a parameter that controls the amount of regularisation. Markov Random Fields [56.
However. Bayesian techniques require stochastic models of the expected signals. In the Bayesian approach we have an estimate for the relative probability of diﬀerent signals (the prior) and we can compute the total expected risk for an estimator with the expectation taken over all possile signals. Extra ﬁltering can also be applied to the DTCWT to produce a complex wavelet packet transform [52]. Player A has to design a deconvolution algorithm to maximise the SNR of deconvolved images. The risk of the estimator is deﬁned to be [58] ˆ r(F. The extra ﬁltering produces greater frequency localisation for high frequencies. It is claimed [58] that there is no “good” stochastic model for natural images. These methods are harder to place within our common framework because they are motivated by the belief that such a framework is fundamentally ﬂawed and that minimax methods oﬀer an alternative solution. Consider the following game. The estimator is designed to minimise this maximum risk. Note that this is a function of the original signal x. Instead the estimator is based on the maximum risk. Θ) = sup E x∈Θ ˆ F(y) − x 2 . A standard result is that this total risk is minimised by using the posterior mean estimate.1 before being given to A. Suppose that after A has designed the algorithm 1 The idea is that B chooses the test image x. Player B has to choose the test images1 . This approach is standard in the theory of games. Suppose that we have obserˆ vations y ∈ R N from which we wish to construct an estimator F(y) of the original signal x ∈ R N . The estimator is then designed by trying to minimise the maximum risk over Θ.8 wavelet transform except that additional ﬁltering steps are performed on the highpass subbands. deﬁned as ˆ r(F. Instead minimax estimation makes use of a prior set Θ where the signal is guaranteed to be. . the minimax approach avoids using the prior pdf on the grounds that the prior is not a suﬃciently “good” model.236 Appendix C. This test image is then blurred and degraded according to equation 9. The original signal is ﬁxed and the expectation is taken over all possible values for the noise in the observation model. Such a transform has been found to give slightly better performance than the nondecimated version of the mirror wavelet algorithm (and much superior performance to the decimated version) [52]. x) = EN ˆ F(y) − x 2 .
Appendix C. C. The problem with this claim is that it is not clear what “good” means. In summary. We mentioned earlier the claim that there is no “good” model for natural images. but models can often give a reasonable guide to the relative probability of small deviations from a real world image. Unfortunately. The Bayesian method has the potential to give better reconstructions but it is possible that certain images will be very badly restored. but when the model used to produce images is unknown. . Later results will show that the Bayesian model is good in this sense. Furthermore. We agree that realisations from typical Bayesian models do not produce realistic images. using B’s stochastic model for the prior pdf. It can easily be shown that the best approach for A (in terms of maximising SNR) is to use the minimax approach to design the algorithm. The methods tend to be robust but take no advantage of the probable structure within the data (other than limiting the data to the set Θ). the values used in the published experiments were not given. However. For example. The performance of the method depends on the choice of parameter values. wavelet models will prefer a priori a portion of the image being smooth rather than containing high frequency noise. if player B is not so malicious and simply decides to produce test images according to some stochastic model then the best approach for A is to use a Bayesian posterior mean estimate. Bayesian methods attempt to model the prior information about likely image structures and thus give results whose quality depends on the accuracy of the model. The conclusions in these two cases are widely accepted.9 Mirror wavelet deconvolution This section describes the algorithm for deconvolution using mirror wavelets proposed by Kalifa and Mallat [58]. We make use of what we believe to be a reasonable choice but it is possible that other choices could give better performance.9 237 B is allowed to see the program and construct a test image deliberately designed to produce the algorithm’s worst possible performance. for commercial reasons. what is not agreed is an appropriate approach for the case when player B is not malicious. As we are interested in getting good results on typical images we select the Bayesian method. if “good” is taken to mean that the resulting algorithms give high accuracy results then the claim can be experimentally tested. This is the case for most real world images. the minimax approach gives results with a known worst case performance.
238
Appendix C.9
We ﬁrst give a brief description of the mirror wavelet transform and then explain how this transform is used for deconvolution.
C.9.1
Mirror Wavelet Transform
Figure 2.2 of chapter 2 shows the subband decomposition tree for a standard wavelet transform. Such a tree produces some very short wavelets whose frequency response covers the upper half of the spectrum. For some blurring functions there can be a considerable diﬀerence in ampliﬁcation across this range of frequencies. Consequently, at the lower end there may be a high SNR, while for the highest frequencies the SNR may be very low. It therefore seems inappropriate to group all these frequencies within a single subband. The mirror wavelet decomposition addresses this problem by performing recursive ﬁltering on the most detailed subband [58]. Figure C.3 shows the subband decomposition structure for the mirror wavelet. The ﬁlters are given by the symlet wavelets of order 4. These Level 4 Level 3 Level 2 Level 1 x0a x00a x000a
Standard Tree x
H0a
 ↓2

H0a H1a
 ↓2

H0a H1a
 ↓2

H0a H1a
 ↓ 2  x0000a  ↓ 2  x0001a
 ↓ 2  x01a
x00b
 ↓ 2  x001a
x000b
x0b
Mirror Tree

H1a
 ↓2

H0b H1b
 ↓2

H0b H1b
 ↓2

H0b H1b
 ↓ 2  x0000b  ↓ 2  x0001b
 ↓ 2  x01b
 ↓ 2  x001b
Figure C.3: The mirror wavelet tree structure
Appendix C.9
239
orthogonal wavelets possess the least asymmetry and highest number of vanishing moments for a given support width [30]. The ﬁlters in the mirror tree (for levels above 1) are given by the time reverse of the ﬁlters in the standard tree: H0a (z) = H0b (z −1 ) = −0.0758z 3 − 0.0296z 2 + 0.4976z 1 + 0.8037z 0 +0.2979z −1 − 0.0992−2 − 0.0126z −3 + 0.0322z −4 H1a (z) = H1b (z −1 ) = −0.0322z 3 − 0.0126z 2 + 0.0992z 1 + 0.2979z 0 − 0.8037z −1 +0.4976z −2 + 0.0296z −3 − 0.0758z −4 The tree can be inverted by repeated application of the reconstruction block. The mirror wavelets are extended to 2D in such a way as to achieve a ﬁne frequency resolution for increasing frequency. Details of the 2D basis and a fast 2D transform can be found in [58]. Figure C.4 shows the frequency response contours that are given by a three level 2D mirror wavelet transform. The frequencies are normalised so that a frequency of 32 corresponds to half the sampling rate.
C.9.2
Deconvolution Algorithm
The deconvolution algorithm consists of inverse ﬁltering followed by wavelet denoising. As in chapter 9 suppose that the blurring operator is represented by the matrix F H MF where M is a diagonal matrix with diagonal entries m. Deﬁne a new vector u (representing the inverse ﬁlter) by ui = 1/mi if mi  > 0 otherwise
We choose = 0.01 in our experiments. The inverse ﬁltering step produces a new image x0 given by x0 = F H diag {u} F d (this, of course, is implemented using the Fourier transform rather than matrix multiplication).
2 The wavelet denoising is based on estimates σk of the variance of the (inverse ﬁltered)
noise in the subbands of the mirror wavelet transform of x0 . The inverse ﬁltering step tends to considerably amplify the noise for high frequencies for typical blurring functions. These variances can be precisely computed from the Fourier transform of the ﬁlter [58] but in practice it is easier to estimate these values by calculating the mirror wavelet transform of an image containing white noise of variance σ 2 that has been inverse ﬁltered. The average
2 energy of the wavelet coeﬃcients in the corresponding subbands provide estimates of σk .
240
Appendix C.9
30
25
20
15
10
5
5
75% peak energy amplitude.
10
15
20
25
30
Figure C.4: 2D frequency responses of the mirror wavelet subbands shown as contours at
Appendix C.10
241
We deﬁne a “noise subband” to be a subband for which σk > T where T is some threshold level. We choose T = 30 in our experiments. These subbands contain very little useful information. The wavelet denoising process consists of the following steps: 1. Compute the mirror wavelet transform of x0 . 2. Set all wavelet coeﬃcients in noise subbands to zero. 3. Apply a soft thresholding rule to all the wavelet coeﬃcients. For a coeﬃcient wi belonging to subband k the new value wi is given by ˆ wi − βσk wi > βσk wi = ˆ w + βσk wi < βσk i 0 otherwise where we choose β = 1.6 in our experiments. ˆ 4. Invert the mirror wavelet transform to compute the deconvolved estimate x. This is a single pass algorithm involving only one forward and one inverse wavelet transform and hence is fast. In practice (and in our experiments) the shift invariant version of the mirror wavelet transform is always used as it gives better results. This can be viewed theoretically as averaging the results of the decimated version over all possible translations. This averaging is implemented by using the much slower nondecimated form of the mirror wavelet.
C.10
Waveletbased smoothness priors
A second group of methods use wavelet transforms to construct a prior pdf for images. We have shown that the basic model described in chapter 7 (using a common prior variance for every coeﬃcient in a given subband) is approximately equivalent to a stationary Gaussian random process model but there are many alternative priors that can be constructed using wavelets. Some examples of priors that have been used for deconvolution are: 1. Wang et al [124] used the output of an edge detector applied to the noisy data to alter the degree of regularisation in a multiscale smoothness constraint. This algorithm
242
Appendix C.10
used the cost function f (x) =
i
λi  [W x]i 2
where W represents a forward real wavelet transform (Daubechies’ ﬁfth order compactly supported wavelet [30]) and {λi } are scaling parameters chosen using the output of the edge detector. 2. Starck and Pantin have proposed [93] a multiscale maximum entropy method that uses the cost function2 f (x) = −
i
λi [W x]i − mi −  [W x]i  log
 [W x]i  mi
where W represents a nondecimated forward transform (a form of Laplacian pyramid is used where the lowpass ﬁltering is performed with a 5 × 5 mask based on a cubic B spline), {mi } are some constant reference values given by mi = σ/100, and {λi } are weighting factors that alter the degree of regularisation. These factors are chosen based on the size of the wavelet coeﬃcients of the observed image. If the coeﬃcient is below a threshold of 3σ then it is deemed to be in the multiresolution support and the factor is set to some constant value σj that depends only on the scale j of the wavelet coeﬃcient. These coeﬃcients are allowed to vary in order to match the observations. However, if the coeﬃcient is large then the factor is set to zero and it is not allowed to vary. A constraint is added to the problem that requires all coeﬃcients not in the multiresolution support to be equal to the value in the observed image. 3. Banham and Katsaggelos [8] use an autoregressive prior model which evolves from coarse to ﬁne scales. The parameters of the model are based on the output of an edge detector applied to a preﬁltered version of the noisy data. A multiscale Kalman ﬁlter is then used to estimate the original image. 4. Belge et al [11] use a nonGaussian random process model for which the wavelet coeﬃcients are modelled as being independently distributed according to generalised Gaussian distribution laws. The resulting energy function is minimised in a doubly
2
This cost function appears strange because it is not a symmetric function of the wavelet coeﬃcients.
This is probably a mistake but we have chosen to keep the form as given in the reference.
Appendix C. which itself requires wavelet and Fourier transforms to evaluate. Note that an estimate based on this objective function would only approximate the true Pixon estimate as it neglects certain features of the method (for example. . for a particular position in the image there will be. say. Pi˜ a and Puetter use a Pixon method [94.1) for the wavelet coeﬃcients: f (x) = i wip where p is a real scalar parameter controlling the degree of sparseness and {wi } is the set of parameters (wavelet coeﬃcients) that specify the image via the relation x = Pw where P is a reconstruction matrix built out of the kernel functions used in the Pixon method. and p is a parameter chosen near 1.10 243 iterative algorithm. the iterative minimisation algorithm is based upon another iterative algorithm. This algorithm corresponds to the cost function f (x) = i λi  [W x]i p where W represents a forward real wavelet transform (Daubechies’ 8 tap most symmetric wavelets were used [30]). In other words. These kernel functions are deﬁned at a number of diﬀerent scales (typically 4 per octave) and orientations and can be regarded as the reconstruction wavelets corresponding to some redundant wavelet transform. {λi } are scaling parameters for the diﬀerent wavelet coeﬃcients. One example that has already been mentioned in this dissertation is the Hidden Markov Tree (HMT) model discussed in chapter 3. 5. K parameters corresponding to the K diﬀerent subbands but the Pixon method only allows at most one of these parameters to be nonzero). 99] that adapts the number of parameters n describing the image to the smallest number consistent with the observed data. The parameters in the Pixon method are coeﬃcients of certain kernel functions. The ﬁrst three methods use a prior that is a function of the noisy image and are therefore known as empirical Bayes methods. There are many other choices for the image prior that have been used in other applications. Using this interpretation of the kernel functions suggests that the Pixon method is approximately equivalent to using a sparseness prior (of the sort seen in section C.
244 Bibliography .
on Sig. Flandrin. A ﬁlter bank for the directional decomposition of images. 43(5):1068–1089. Ogden. Andrews and B.. W. Proc. NJ: PrenticeHall. Spatially adaptive waveletbased multiscale image restoration. (Eds. H. R. Englewood Cliﬀs. 1984. A. M. [6] F. J.Bibliography [1] E. Adelson. Rayner. P. [4] T. [2] F. 245 . Alabert. [3] B. Pelillo and E. Bergen. RCA Engineer. April 1992. Auger and P. IEEE Trans. R. H. Smith.. Math Geology. The practice of fast conditional simulations through the LU decomposition of the covariance matrix. K. Digital Image Restoration. J. Banham and A. [7] R. [9] S. on Signal Proc. 40(4):882–893. Math Analysis. May 1995. Barker and P. 1993. [5] H. Anderson. An Introduction to Multivariate Statistical Analysis. and J.. T. Improving the Readability of TimeFrequency and TimeScale Representations by the Reassignment Method. 29(6):33–41. 1958. SpringerVerlag. C. C. 5:619–633. Burt. A class of bases in L2 for the sparse representation of integral operators. Pyramid methods in Image processing. R. Hancock). [8] M. SIAM J. IEEE Trans. J. chapter Unsupervised image segmentation using markov random ﬁeld models. 19(5):369–386. M. H. 1997. Image Proc. 1987. John Wiley & Sons. 1223. Katsaggelos. Alpert. Hunt. Apr 1996. 1977. Anderson. IEEE Trans. R. Bamberger and M. in Lecture Notes in Computer Science Vol. J.
L. New York. Cand`s.246 Bibliography [10] C. Discrimination. In 2000 IEEE Int. Becchetti and P. L. Verlag Harri Deutsch. 1999. Wavelet domain image restoration with adaptive edgepreserving regularization. De Oliveira. and R. R. M. pages 1111–1114. Applied and Computational e Harmonic Analysis.N. volume 2. Centro de Estadstica y Software Matemtico Universidad Simn Bolvar. Mersereau. 1997. A. Berkner. E. Master’s thesis. L. M. Lagendijk. IEEE. Miller. Biemond. [12] J. [18] P. 1997. Gormish. 6(2):197–218. A Multiresolution Spline with Application to Image Mosaics. on Im. May 1990. 78(5):856–883. 2:217–236. Springer series in statistics. 2000. Report 200005. Berger. Proc. 1958. Donoho. Schwartz. E. Royal Society London A. [17] I. and M. L. Apr 2000. and Recognition. SpringerVerlag Inc. [13] J. The Microsoft Advanced Technology Vision Research Group. on Image Proc. Trans. M. Objective Bayesian Analysis of Spatially Correlated Data Tech. Bronshtein and K. Novel Statistical Multiresolution Techniques for Image Synthesis. Campisi. 1985. Statistical Decision Theory and Bayesian Analysis. S. [15] J. Berger. [11] M. Belge. De Bonet. 9(4):597–608. J. Boliek. and B. Ridgelets: a key to higherdimensional intermite tency? Phil. A new waveletbased approach to sharpening and smoothing of images in Besov spaces with applications to deblurring. Thun and Frankfurt/Main. V. Burt and E. Adelson. .. Technical report. IEEE Trans. Iterative methods for image deblurring. [19] E. J. J. Cand`s and D. 1983. O. 2000. ACM Transactions on Graphics. Handbook of Mathematics. 1999. Proc. Kilmer.. Sans. [14] K. 357. Harmonic analysis of neural networks. [16] J. In International Conference on Digital Signal Processing 97. Binomial linear predictive approach to texture modelling. Semendyayev. Conf. [20] E.. and E.
Daugman. [27] P. Vision Research. [29] G. Nowak. Daniell and S. Hidden Markov Tree Modeling of Complex Wavelet Transforms. September 1980.. Interpolation and denoising of nonuniformly sampled data using waveletdomain processing. IEEE Trans. 127(5):170–172. and R. L. IEE Proceedings. Baraniuk. IEEE Trans. Choi. S. 6(4):493–506. Astrophys. Acoust. Chellappa and R. [30] I. Twodimensional spectral analysis of cortical receptive ﬁeld proﬁles. Acoust. [26] R. 1980. and N. G.. 1995. Clark. Kingsbury. J. Coifman and D. J. Complete Discrete 2D Gabor Transforms by Neural Networks for Image Analysis and Compression. Lecture Notes in Statistics. Convex Set Theoretic Image Recovery by Extrapolated Iterations of Parallel Subgradient Projections.Bibliography 247 [21] R. [25] B. Baraniuk. Daugman. Speech and Signal Processing . Speech. [24] H. and Signal Proc. In Proceedings of IEEE International Conference on Acoustics. [23] H. In ICASSP 99. Crouse. Speech. Istanbul. Daubechies. L.. Baraniuk. Digital image restoration using spatial interaction models. F. Romberg. IEEE Transactions on Signal Processing (Special Issue on Wavelets and Filterbanks). D. 20:847–856. SIAM. K. June 1982. Kashyap. 30:461–472. [32] J. G. chapter Translationinvariant denoising. IEEE Trans. Texture synthesis using 2D noncausal autoregressive models. Gull. 1985. Signal Processing. An eﬃcient implementation of the algorithm ‘CLEAN’ . Kashyap. [22] R. Chellappa and R. Speech.G. Choi and R. April 1998. Maximum entropy algorithm applied to image enhancement. R. [28] M. 1999.. 36(7):1169–1179.ICASSP’00. 89:377–378. Donoho. Combettes. SpringerVerlag. June 2000.L. [31] J. part E. . March 1997. G. July 1988. Signal Processing. Astron.L. Ten Lectures on Wavelets. 1980. on Acoustics. 1992. Turkey. on Image Processing. 33(1):194–203. IEEE Trans. WaveletBased Signal Processing Using Hidden Markov Models. R.
Algebraic reconstruction techniques (ART) for threedimensional electron microscopy and Xray photography.. Harmonic Anal. [41] J. V. Ideal spatial adaptation by wavelet shrinkage. [37] E. F. University of Cambridge. and N. In ICIP 99. [43] S. Complex wavelet features for Fast Texture Image Retrieval. IEEE Trans. [42] R. Communications of the ACM.. F. G. Kingsbury. T. Kingsbury. Comp. [35] P. July 1999. Stanford. [34] P. Gordon. 2000. pages 344–347. Herman. In ICIP 2000. CA. Carpenter. Munson. Fournier. Deutsch and A. G´mezHern´ndez. C. L. Moﬀatt. N. In Proceedings of the IEE conference on Image Processing and its Applications. D. [39] D. A stochastic approach to the simulation of block conductivo a ity ﬁelds conditioned upon data measured at a smaller scale.. 29:471–481.354. Biol. Texture classiﬁcation using dualtree complex wavelet transform. J. F. Jr. Department of Engineering. Kingsbury. Oxford University Press. Bender. Nonlinear solution of linear inverse problems by WaveletVaguelette Decomposition. Signal Proc. C. de Rivaz and N. Donoho and I. Johnstone. 81:425–455. Jan 1991. C. Mitra. Stanford University. G. Fast segmentation using level set curves of complex wavelet surfaces. 1982. Kingsbury. 1991. December 1970. Biometrika. de Rivaz. A linear. 1995. Theoret. Wavelets for Fast Bayesian Geophysical Interpolation CUED/FINFENG/TR. [36] C. and J. Diethorn and D. App. GSLIB Geostatistical Software Library and User’s Guide. and L. Technical report. Manchester. 1999. 1994.248 Bibliography [33] P. Fussel. July 1999. de Rivaz and N. J. PhD thesis. 1992. 39:55–68. Hatipoglu. 2:101–126. . timevarying system framework for noniterative discretetime bandlimited signal extrapolation. G. and G. C. G. [38] D. Journel. L. Computer Rendering of Stochastic Models. Donoho. S. M. [40] A. R.
[46] L. 1994.Bibliography 249 [44] J. binocular interaction and functional architecture in the cat’s visual cortex. 166:106–154. J. pages 648–651. H. Receptive ﬁelds. [54] E. R. Bergen. Aperture synthesis with a nonregular distribution of interferometer o baselines. Jeﬀs and M. In IEE seminar on timescale and timefrequency analysis and applications. Suppl. Iterative Wiener Filters for Image Restoration. IEEE. 70:939–952. 195:215–243. Igehy and L. [49] D. Gunsay. In ICIP 95. Pyramidbased texture analysis/synthesis. Physiol. volume 3. Image Replacement through Texture Synthesis. 106(4):620– 630. 1974. Jaynes. [52] A. pages 186–189. Information theory and statistical mechanics.. BlancF´raud. INRIA Research Report 3955. 1997. [55] B. pages 299–315. Heeger and J. 1982. D. Proc. T. Restoration of blurred star ﬁeld images by maximally sparse optimization.). In ICIP 97. 1962. IEEE transactions on image processing. J. Bull. Wiesel. and C. N. Signal Processing.. Chin. D. Astron. Hillery and R. J. Application to dyadic interpoe lation. Wiesel. Comput.. Pereira. L. Oct 1995. 1991. Jalobeanu. Zerubia. N. Physiol. Hubel and T. H¨gbom. Astrophys. 2(2):202–211. H. IEEE Trans. Rev. London. T. volume 3. Jaynes. April 1993. N. [45] D. (Lond. Satellite image deconvolution using e complex wavelet packets. Aug. Receptive ﬁelds and functional architecture of monkey striate cortex. Rotationally invariant texture classiﬁcation. [48] A. 39(8):1892–1899. Canagarajah. Herv´. R. Harmonic Anal. 1968. 15:417–426. R. The rationale of maximum entropy methods. D. Multiresolution analysis of multiplicity d. A.). June 2000. . Phys. T. February 2000. [51] H. Hubel and T. [47] P. Hill. (Lond. [53] E. May 1957. and J. Ser. [50] D.
In Proc. 39:914–929. G. In Proc IEE Colloquium on TimeScale and TimeFrequency Analysis and Applications. IEEE Conf. Witkin. IEE. Mining Geostatistics. Bayesian inference in wavelet based methods. 1999. Snakes: Active contour models. on Acoustics. [66] T. In Proc. W. pages 319–322. IEEE Trans.. Terzopoulos. chapter Minimax restoration and deconvolution. London. Schafer. [57] A. 39:914–929. pages 1464–1480. [59] A. G. Mersereau. pages 54–57. G. J. [58] J. [63] N. A.. Sep 1990. Sep 1998. Compound GaussMarkov random ﬁelds for image estimation. 1999. G. Int. Comput. Signal Proc. R. Kingsbury. Biemond. Kam and W. Complex wavelets for shift invariant analysis and ﬁltering of signals. Huijbregts. Submitted by invitation to Applied Computational and Harmonic Analysis. Shift invariant properties of the DualTree Complex Wavelet Transform. The SelfOrganizing Map. 1:321–332. Acoust. Kingsbury. and R. February 2000. Kingsbury. The dualtree complex wavelet transform: a new eﬃcient tool for image restoration and enhancement. [60] M. Kass.. K. A regularization iterative image restoration algorithm. [65] N. Acoust. J.. Kingsbury. Image segmentation: An unsupervised multiscale approach. 29 Feb. 1978.. volume 78. IEEE. Speech. W. AZ. 1988. Speech. 2000. Mallat. Vis. London: Academic Press. Katsaggelos. H. C. M. Kohonen. European Signal Processing Conf. Phoenix. In Proceedings of the 3rd International Conference on Computer Vision.250 Bibliography [56] F. and D. April 1991. Jeng and J. J. Journel and C. . Speech and Signal Processing. Springer. G. volume II. Fitzgerald. IEEE Trans. [61] A. Signal Processing. [64] N. Kalifa and S. April 1991. J. [62] N. Woods. In Proc. June 2000. Pattern Recognition and Image Processing (CVPRIP 2000). Complex wavelets and shift invariance.
Appl. A Wavelet Tour Of Signal Processing. Aug 1996. Technical report. C. [75] S. Motion Estimation using Complex Wavelets. Kwong and P. Cambridge University. Math. P. Mathematical description of the responses of simple cortical cells. Texture features for browsing and retrieval of image data.. An iteration formula for fredholm integral equations of the ﬁrst kind. Complex Daubechies wavelets. WMatrices. Landweber. PhD thesis. . [77] S. Marr. Lang. Guo. High Order Balanced MultiWavelets. Opt. J. Wells Jr. Ma. 18(8):837–842. Burrus. O. In Proceedings of SIGGRAPH 84. [78] D. IEEE Signal Processing Letters. Nonorthogonal Multiresolution Analysis. IEEE Trans. and R. Anal. [73] J. Freeman and Company. 1998. [70] J. S. Odegard. Amer. pages 219–229. Tang. Noise reduction using an undecimated discrete wavelet transform. Soc. 70:1297–1300. Tuy. W. 3(1):10–12. of Appl. Anal. 83:554–565. Texture Synthesis for Digital Painting. 1984. Y. May 1998. [69] M. Int.H. M. A Computational Investigation into the Human Representation and Processing of Visual Information.. 1995. J. Special Issue on Digital Libraries. 1951. E. 73:615–624. In Proceedings of IEEE ICASSP. Mach. Seattle. 1997. Marcelja. [76] B. J. T. 1994. Am. J. Argonne National Laboratory. [68] L. Manjunath and W. An iterative method for the extrapolation of bandlimited functions. Mallat. Patt. J.. Magarey. S. 1982. [74] J. Lebrun and M. Vetterli. H. Academic Press. Lewis. October 1981. and Finite Signals of Arbitrary Length MCSP4490794. [71] A. 1996. [72] J. P. 1980. Mayrand. and Comput.Bibliography 251 [67] M. pages 1529–1532. Lent and H. Lina and M. Vision. Harmonic Analysis. K. Math.
In Proc. In ICIP 99. Abad. Springer Lecture Notes in Statistics. 1996. R. [88] D. 1999..ucar. 1998. Acoust. Harmonic wavelets in vibrations and acoustics.ps. A. Astron. Nason and B. W. Soc.A. [80] K. pages 86–97.ps. pages 2869–2872. In Proceedings of SPIE . . Anal. 1999.. Robust method for texture synthesisbyanalysis based on a multiscale Gabor scheme. Portilla. A. problems and and J. Proc. Silverman. L. Speech. Newland. eds. and J. B. Antoniadis & G. P. H. IEEE Int. Neelamani. Signal Processing – ICASSP 98. Lond. and R. Conf. [84] R. Royle. Astrophys. R. In The Restoration of HST Images and Spectra II. [82] G. J. The objective function implicit in the CLEAN algorithm. Harmonic Wavelet Analysis. [85] D. E. Mateos. 182:174–178. 443:203–225. [86] D. A.252 Bibliography [79] K. Oppenheim).. Navarro and J. Molina. Kobe. J.sunysb. Large random spaﬁelds. 1998. 103:281–300. Soc. D. ftp://ams. 1993.. The Stationary wavelet transform and some statistical applications . Hanisch and R. E. Lond.edu/pub/papers/theses/who. White. Nowak and M. Seattle. [81] R. prediction C.The International Society for Optical Engineering. R. J. nonstationary http://goldhill. Least Squares methods for illposed problems with a prescribed bound. [87] R.gz . Dec 1998.cgd. Thul. Trans. M. Richardson. 1987. Choi. Miller. [89] Wonho Oh. Phil. 1995. tial Nychka. 357(1760):2607–2625. Random Field Simulation and an Application of Kriging to Image Thresholding. [83] R. Marsh and J. 1:52–74. 1994.edu/stats/nychka/manuscripts/large5. WaveletVaguelette restoration in photonlimited imaging. Wikle. volume 2657. Math. Prior Models and the RichardsonLucy Restoration Method. A. pages 118–122. SIAM J. 1970. Newland. in Wavelets and Statistics (ed. WaveletDomain Regularized Deconvolution for IllConditioned Systems. Baraniuk.
C. Nonlinear Total Variation based noise removal techniques.. Cox (Eds). Eds. K. K. ICANN 1996. I. Image restoration using a modiﬁed Hopﬁeld network. R. P. Wiskott. Osher. Nonparametric multiscale Markov random ﬁeld mode for synthesizing natural textures. ASP Conference Series. Powell. Longstaﬀ. Image Processing. Ser. and C. Publications of the Astronomical Society of the Paciﬁc. pages 307–316. In Proc. and E. D. and B. in J. [92] J. In Proceedings of the International Symposium on Signal Processing and its Applications. [95] M. 105:630–637. A. Press. 1988. pages 49–63. Astron. 1996. Numerical Recipes in C. T. 1993. Mason and M. Simoncelli. P. P¨tzsch. 1996. . Malsburg. Katsaggelos. ADASS ’98 in Astronomical Data Analysis Software and Systems VIII. Teukolsky. and Astrophys. 1987. chapter Radial basis functions for multivariable interpolation: a review. and D. P. L. 118:575–585. Puetter. Maurer. Pi˜a and R. January 1992.Bibliography 253 [90] S. Vetterling. In Proc. Rudin. 1992. Phys. 60:259–268. International Journal of Computer Vision.C. Starck. H. 2000. Paget and D. W.G. L. Suppl. Plante. L. pages 744–747. [96] J. J.d. Algorithms for Approximation. A parametric texture model based on joint statistics of complex wavelet coeﬃcients. M. Deconvolution of astronomical images using the multiscale maximum entropy method. S. Puetter and A. Roberts. [91] R. The Pixon Method of Image Reconstruction. [94] R. Paik and A. [99] R. Flannery. Bayesian Image Reconstruction: The Pixon and n Optimal Image Modeling. To appear. [97] M. K. L. IEEE Trans. 1998. Yahil. C. volume 2. ISSPA 96. Portilla and E. [93] E. Oxford: Clarendon Press. pages 143– 167.D. Pantin and J. Reconstruction from o graphs labeled with responses of Gabor ﬁlters. Cambridge University Press. Mehringer. [98] W. T. A. D. September 1996. Fatemi. v.
E. Anal. Schwab. K.. 1984. IEEE Trans. 21(4). Schoenberg.no/∼tranden/comments. Romberg. 1998. Filtering for Texture Classiﬁcation: A Comparative Study. Portilla. H. Journal of the Optical Society of America. J. Special Issue . Slepian. J. Husøy. Spline Functions and the problem of graduation. [106] F. In ICIP 98. J. Spann and R. Acad. applications to lowfrequency radio interferometry.254 Bibliography Comments regarding the Trans. Prolate spheroidal wavefunctions. T. Randen. Tech.edu/software/WHMT/. Bell. [108] E. ShiftInvariant Denoising using WaveletDomain Hidden Markov Trees. Apr 1999.edu/faculty/rs/papers/RLS Papers. Richardson.stat. 62(1):55–59. Adelson. and R. 40:43–64. http://www. A quadtree approach to image segmentation which combines statistical and spatial information.. 89:1076–1081. 1985. 33rd Asilomar Conference. 52:947–950. Pattern Recognition. and Mach.html. Software for Image Denoising using Waveletdomain Hidden Markov Tree Models. P. Romberg and H. April 1999. CA. [104] J. 1998. Paciﬁc Grove.ux. Nat. [110] R. 18:257–269. on Wavelets. Texture Characterization via Joint Statistics of Wavelet Coeﬃcient Magnitudes. 1964. [109] D. Relaxing the isoplanatism assumption in selfcalibration.html . 38:587–607.rice. [105] I. Astron. Sci. G.. BayesianBased Iterative Method of Image Restoration. In Proc.unc. Proc. IEEE Transactions on Information Theory. [100] T. and D. PAMI Article. [103] J. [107] E. Smith. http://www. J. R. http://www. W. on Patt. October 1999. Choi. [102] W. H. 1961. Choi. Heeger. H. [101] T. Simoncelli and J. L. January 1972.dsp.. Randen and J. Syst. Baraniuk. Fourier analysis and uncertainty I. [111] M. Int. Environmental Statistics. K. P.his. Freeman. 1999. 1992. Shiftable MultiScale Transforms. H. Simoncelli. Wilson.
Turk. [117] A. In Proceedings of SIGGRAPH 91. 1993. and C. van Spaendonck. Hopﬁeld Neural Network Based Algorithms for Image Restoration and Reconstruction–Part I: Algorithms and Simulations. Vetterli and J. Proc. T. C. Wavelets and Filter Banks. BSpline Signal Processing:Part I–Theory. Signal and Image denoising via Wavelet Thresholding: Orthogonal and Biorthogonal. New York: Wiley. Prentice Hall. Coates. Kova˘evi´. F. of Mathematics. and Murray Eden. 1992. [113] G. 1997. Strela and A. Y. A. Murtagh.. IEEE Trans. Arsenin. Scalar and Multiple Wavelet Transforms TR9801. S. . Fernandes. [121] M. Imperial College of Science. July 2000.Bibliography 255 [112] J. Solutions of IllPosed Problems. Therrien. 1994. Nonredundant. Technical report. Image restoration with noise suppression using the wavelet transform. IEEE Transactions on Signal Processing. 2000. [122] M. Technology & Medicine. [116] C. directionally selective complex wavelets. on Sig. Dept. Walden. [119] Michael Unser. Generating Textures on Arbitrary Surfaces Using ReactionDiﬀusion. Akram Aldroubi. 1995. M. [115] Yi Sun. Filter Banks allowing perfect reconstruction. 288:342–348. 1991. 41(2):821–833. Englec c wood Cliﬀs:NJ. In 2000 IEEE International Conference on Image Processing. Strang and T.L. 1977. Nguyen. [114] V. 10(3):219–244. Starck and F. W. Signal Processing. 1998. [120] R. 48(7):2105–2118. 1986. Burrus. Wavelets and Subband Coding. Discrete random signals and statistical signal processing. [118] G. WellesleyCambridge Press. Astronomy and Astrophysics. Vetterli. Tikhonov and V. N. Prentice Hall.
July 1988. and B. 6(6):758–767. Finite Prolate Spheroidal Sequences and Their Applications II: Image Feature Description and Segmentation. Med. June 1998. [128] R. Image restoration using a neural network. on Patt. [129] A. Witkin and M. [125] S. pages 27–40. Zhang. Int. 1984. May 1995. AAPG Computer Applications in Geology. 10(2):193–203. The mean ﬁeld theory for EM procedures in blind MRF image restoration. pages 177–199. Burnaman. Vogel and M.256 Bibliography [123] C. and Mach. J. [132] J. 1:1982. E. IEEE Trans. R. J. J. IEEE Trans. K. H.. Nov. Imaging. Anal. Int. Fast. D. Youla and H. IEEE Trans. Image Proc. Wang. IEEE Trans. Maximum entropy image reconstruction. The uncertainty principle in image processing. 1987. Solution of inverse problems in image processing by wavelet expansions. [126] R. Anal. 36(7):1141–1151. on Patt. A. C. Wernecke and L. Wilson and G. and G. Pan. Vaid. robust total variationbased reconstruction of noisy. [124] G. Zhang. Finite Prolate Spheroidal Sequences and Their Applications I: Generation and Properties. IEEE Trans. October 1982. Wolf. Speech. Oman. Wilson and M. April 1976. Chellappa. ReactionDiﬀusion Textures. IEEE Trans. 9(6):787–795. K. Anal. Granlund.. [127] R. D. 1991. C26(4):351–364. R. Kass. D’Addario. . Jenkins. and M. T.. blurred images. [131] D. and Mach. and Mach. IEEE Transactions on Computers. Wilson. Zhou. theory. Image Processing. R. Webb. IEEE Trans. Int. 7(6):813–824. [133] Y.. 4:579–593. Image restoration by the method of convex projections: Part 1. In Proceedings of SIGGRAPH 91. Withers. Acoust. January 1993.. Integration of Well and Seismic Data Using Geostatistics. on Patt. [130] D. Signal Processing. W. March 1988. 1994. Spann. IEEE transactions on image processing.
pages 686–693. and minimax entropy towards a uniﬁed theory for texture modelling. Zhu. random ﬁelds. and D. 1996. Mumford. FRAME: ﬁlters. . In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Wu. Y. C.Bibliography 257 [134] S.
This action might not be possible to undo. Are you sure you want to continue?