Professional Documents
Culture Documents
Is Denoising Dead?: Priyam Chatterjee, Student Member, IEEE, and Peyman Milanfar, Fellow, IEEE
Is Denoising Dead?: Priyam Chatterjee, Student Member, IEEE, and Peyman Milanfar, Fellow, IEEE
Is Denoising Dead?
Priyam Chatterjee, Student Member, IEEE, and Peyman Milanfar, Fellow, IEEE
Abstract—Image denoising has been a well studied problem in erature on such performance limits exists for some of the more
the field of image processing. Yet researchers continue to focus at- complex image processing problems such as image registration
tention on it to better the current state-of-the-art. Recently pro- [7], [8] and super-resolution [9]–[12]. Performance limits to
posed methods take different approaches to the problem and yet
object or feature recovery in images in the presence of point-
their denoising performances are comparable. A pertinent ques-
tion then to ask is whether there is a theoretical limit to denoising wise degradation has been studied by Treibitz et al. [13]. In
performance and, more importantly, are we there yet? As camera their work, the authors study the effects of noise among other
manufacturers continue to pack increasing numbers of pixels per degradations and formulate expressions for the optimal filtering
unit area, an increase in noise sensitivity manifests itself in the form parameters that define the resolution limits to recovering any
of a noisier image. We study the performance bounds for the image given feature in the image. While their study is practical, it
denoising problem. Our work in this paper estimates a lower bound does not define statistical performance limits to denoising
on the mean squared error of the denoised result and compares the
performance of current state-of-the-art denoising methods with general images. In [14], Voloshynovskiy et al. briefly analyze
this bound. We show that despite the phenomenal recent progress the performance of MAP estimators for the denoising problem.
in the quality of denoising algorithms, some room for improve- However, our bounds are developed in a much more general
ment still remains for a wide class of general images, and at certain setting and, to the best of our knowledge, no comparable study
signal-to-noise levels. Therefore, image denoising is not dead—yet. currently exists for the problem of denoising. The present
Index Terms—Bayesian Cramér–Rao lower bound (CRLB), study will enable us to understand how well the state-of-the-art
bias, bootstrapping, image denoising, mean squared error. denoising algorithms perform as compared to these limits.
From a practical perspective, it will also lead to understanding
the fundamental limits of increasing the number of sensors in
I. INTRODUCTION the imaging system with acceptable image quality being made
possible by noise suppression algorithms.
I MAGE denoising has been a well-studied problem in
the image processing community and continues to attract
researchers with an aim to perform better restoration in the
Before we analyze image denoising statistically, we first de-
fine our image formation model as
presence of noise. With the rise in the number of image sensors (1)
(or pixels) per unit area of a chip, modern image capturing
devices are increasingly sensitive to noise. Camera manufac- where is the actual pixel intensity at location (indexed by
turers, therefore, depend on image denoising algorithms to ) and is the observed pixel intensity. We assume that the cor-
reduce the effects of such noise artifacts in the resultant image. rupting noise is independent identically distributed (IID) and
Recently proposed denoising methods use different approaches sampled from a zero mean density of known variance . The
to address the problem. Of them, the best performing methods aim of denoising algorithms is to recover the noise-free pixel
[1]–[6] can be shown to share a common framework in that intensity . Most recent denoising algorithms [1]–[6] work on
they work by combining similar patches to effect denoising, image patches2, and, hence, we define the patch-based image
although the parameters of said framework are estimated in model as
rather different ways. These state-of-the-art algorithms produce
very impressive, though quite comparable results; and this begs (2)
the question: Have we reached some limit of performance? Is where is the actual patch intensity written in a (column-
denoising dead? Given the importance of this problem, our stacked or raster-scanned) vectorized form, is the vectorized
present work studies denoising in a statistical framework to noisy image patch and is a vectorized noise patch with a co-
derive a fundamental bound on the performance of denoising variance . Note that the noise vectors are uncorrelated
algorithms and compare the current state-of-the-art to it.1 Lit- only when the patches are nonoverlapping, something that we
assume in developing our bounds formulation.
Manuscript received May 21, 2009; revised October 28, 2009. First published Although the measurement model of (2) is linear in the
November 20, 2009; current version published March 17, 2010. This work was unknown image, the most successful methods [1]–[6] to date
supported in part by the U.S. Air Force under Grant F49620-03-1-0387. The
associate editor coordinating the review of this manuscript and approving it for have taken a nonlinear estimation approach to this inverse
publication was Dr. Michael Elad. problem, resulting in state-of-the-art performance. Of the
The authors are with the Department of Electrical Engineering, University of various approaches, some of the recently proposed weighted
California, Santa Cruz, Santa Cruz, CA 95064 USA (e-mail: priyam@soe.ucsc.
edu; milanfar@soe.ucsc.edu).
patch-averaging filters prove to be quite effective for denoising.
Color versions of one or more of the figures in this paper are available online Of them, the recently proposed method of nonlocal means
at http://ieeexplore.ieee.org. (NLM) [1] attempts to locate patches similar to a patch of
Digital Object Identifier 10.1109/TIP.2009.2037087
2Patches are defined as a neighborhood of pixels in a small fixed sized
1Supporting software to compute the bounds derived in this paper, and addi- window. Such image patches can be overlapping or nonoverlapping. Most
tional results are available at http://www.soe.ucsc.edu/~priyam/bounds/. denoising algorithms work with overlapping patches.
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
896 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
CHATTERJEE AND MILANFAR: IS DENOISING DEAD? 897
(3)
as the error in modeling in each case. For a good fit, the effectively by the regression model. A higher value of R thus indicates a higher
level of predictability of the bias by the affine model.
5Here, local refers to the members of a cluster of geometrically similar 7For general images such as the house and parrot images (Fig. 5) that contain
patches of diverse geometric structure, the R values are computed separately
patches across the image. As a result, for general images made up of geometri-
cally nonhomogeneous patches, we have to use a different and for each
cluster. That is to say, the bias is modeled as a different affine function for each
M u on clusters of geometrically similar patches. This will become apparent later in
Section V where we discuss the bound calculation process for general images.
cluster. The mean R values across 5 clusters are reported in Table I.
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
898 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010
Fig. 2. Visual comparison of the actual bias obtained from BM3D [6] and K-SVD [4] and reconstructed bias using affine model fit for the 512 512 grass 2
2
image. The patch size chosen was 11 11. We can see that the histograms for the modeling errors in both the cases are centered around zero and have short tails.
(a) Original image; (b) noisy image; (c) denoised (BM3D); (d) denoised (K-SVD); (e) BM3D; (f) histogram of (e); (g) K-SVD; (h) histogram of (g).
III. OPTIMAL BIAS BAYESIAN CRAMÉR–RAO on a Bayesian version of the classical CRLB [27]. In its simplest
LOWER BOUND (OB-CRLB) form, the CRLB is a lower bound on the variance of an unbiased
estimator of , subject to the regularity condition
In the statistics and signal processing literature, a number of
bounds exist to evaluate performance limits of estimation. While
(5)
some bounds were developed for the estimation of a determin-
istic parameter (for instance, those proposed by Seidman [21],
Cramér [22] and Rao [23], [24]), others, such as those devel- on the conditional probability density function , as-
oped by Ziv et al. [25], address the Bayesian setting where the suming that is twice differentiable with respect to .
parameter of interest is a random variable. One primary dif- An important point to note here is that our CRLB formulation
ference between the two cases lies in the meaning of MSE for differs from that defined by van Trees [26], [28] where the joint
which the lower bound is established. In the deterministic case pdf is directly used. The two pdf’s are related by
the bound is a function of the parameter of interest, whereas in (6)
the Bayesian case it is a numerical value obtained by integrating
over the random parameter [18] ( in our case). As a result, where is the probability density function on . We work
Bayesian versions have been derived for many of the bounds with the conditional pdf to formulate a bound on the
developed for the deterministic case [26]. In our work, we build MSE in the conditional sense and integrate it to get the overall
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
CHATTERJEE AND MILANFAR: IS DENOISING DEAD? 899
(Bayesian) MSE, as we illustrate below in (12). Assuming for as (12), shown at the bottom of the page. It is interesting to note
now that an unbiased estimator of exists, the bound on the that in the above formulation the pdf can be thought of as
(conditional) covariance of the estimate is given by the CRLB the prior information on . Most denoising methods make use of
as informative priors in the form of smoothness or sparsity penal-
ties and other constraints to achieve improved performance. Our
(7) Bayesian approach thus takes into account the effect of such
priors in calculating the lower bound on the MSE. Knowledge
where the operator in the matrix case implies that the differ- of as a prior has been used by Young et al. [30] to derive
ence of the two matrices has to be positive semi-definite and a Bayesian MSE bound for the estimation of a scalar param-
where is the conditional Fisher information matrix (FIM) eter that is known to lie within a fixed interval. Recently, their
given by results have been generalized for an unconstrained vector case
by Ben-Haim et al. [31]. It would appear that the effective cal-
(8)
culation of the above Bayesian bound necessitates the complete
knowledge of the prior density , as is the case for [30], [31].
The estimator which achieves this lower bound is said to be effi- This is related to the subject of statistical modeling of images,
cient. While this provides us with a simple method of evaluation which has seen much activity [32]–[38] and is still the subject of
of performance limits for an estimation problem, it cannot be some controversy. Happily, as described in Section IV-A below,
applied directly to our denoising problem. As illustrated previ- we are able to avoid the need for complete knowledge of such
ously, most denoising methods are biased in nature and this bias priors. More specifically, only a few low order moments of the
needs to be taken into account to obtain a useful lower bound. density are needed for our calculations, and as we will
For such cases, the CRLB on the covariance of the biased esti- show, these can be effectively estimated directly from a given
mate is given by (noise-free) image. The bound formulation of (12) used in our
work is related to those used in [30] and [31] but differs from
(9) the Bayesian CRLB (B-CRLB) of van Trees8 [26], [28], as al-
luded to earlier. To disambiguate the two, we refer to our formu-
(10) lation as the Optimal Bias B-CRLB (OB-CRLB). We calculate
the lower bound on the MSE based on the OB-CRLB formula-
where denotes the identity matrix and (10) comes from making tion in the next section.
use of our bias model of (3). It is useful to note here that the
estimator covariance for the affine model is only influenced by
the parameter (which can also be termed as the gradient of IV. LOWER BOUND ON THE MSE
the bias) and not by the constant term . As such, a negative
In this section, we derive the bound using expressions for the
definite gradient on the bias lowers the minimum achievable
bias model parameters ( and ) that minimize the right hand
estimator variance compared to that of the unbiased case given
side of (12). We also derive an analytical expression for the FIM
by (7). Performance limits for image reconstruction problems
and discuss how we derive the covariance of image patches that
based on the biased CRLB have been studied by Fessler et al.
is needed to derive the MSE bound.
[29] using a constraint on the bias gradient. Using the relation
in (10), we can calculate a lower bound on the conditional MSE
in estimating as A. Deriving the Bayesian MSE Bound
The MSE of any estimator is a function that depends on the
variance as well as the bias term. To obtain the lower bound on
the MSE, we thus need to establish optimal values for and
that minimize (12). This is in line with the approach advocated
(11) 8The FIM used in the B-CRLB formulation of van Trees [26], [28] is cal-
culated from the joint pdf p(y; z) whereas in our case (and also [30], [31]) it
is calculated from the conditional pdf p(yjz). Hence, the B-CRLB of [28] is
where denotes the trace of a matrix. Now, by the law of more restrictive in the sense that p(y; z) has to be twice differentiable. In our
total expectation, the overall Bayesian MSE can be expressed case twice differentiability is necessary only for the conditional pdf.
(12)
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
900 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010
(13)
Solving these simultaneous equations results in expressions for where the last equality is derived from the matrix inversion
the optimum bias model parameters lemma [39]. Equation (17) thus allows us to get a neat expres-
sion for the lower bound on the MSE for the denoising problem,
(15) that is
(16) (18)
It is important to note that the covariance is not of any esti- It is interesting to analyze the implications of the obtained ex-
mated vectors but the second moment from the pdf of the pression. This lower bound is a function of both the FIM and
random vector . The derivations are detailed in Appendix B. the covariance of the parameter vector . Within a cluster of ge-
Thus, we are able to obtain expressions for and that re- ometrically similar patches, the covariance of is an indication
sult in the theoretical lower bound on the MSE for any affine-bi- of the variability of the geometric structures encountered in the
ased denoiser.9 Note that it is not necessary that any denoiser image (or within a cluster). For images that are mostly smooth,
with the said bias and variance characteristics actually exist. we can expect to have a smaller variance whereas images con-
That is to say, no “Bayes-efficient” estimator that achieves this taining more geometric variability will yield larger . This
derived lower bound may actually exist. Next, we obtain an ex- is also in keeping with our expectations and experimental find-
pression for the lower bound on the MSE using the optimized ings that smooth images lacking much detail are easier to de-
parameters for our bias model by inserting and in the noise than those containing much texture (Table II).
expression for [given in (13)]. Our bounds are derived assuming an affine-biased estimator.
Once we have obtained expressions for the FIM and the pa- One type of estimator having this bias is an affine estimator
rameters for the affine model of the optimal bias function, we which, in the case of Gaussian noise, can be shown to be the only
can proceed to find an expression for the optimal lower bound class of estimators having an affine bias function [19]. More-
on the MSE. We rewrite the right hand side of (12) by plugging over, the expression for the lower bound is precisely that of the
in the obtained expressions of the parameters from (15) and (16) Linear Minimum Mean Square Error (LMMSE) estimate for the
as problem [27]. In theory, this bound is achievable by an affine es-
timator with exact knowledge of the first and second order mo-
ments of . In practice, however, the moments can only be
estimated from the given noisy image, leading to a sub-optimal
performance. Also, the expression for the lower bound corre-
sponds to the MSE of the Bayesian Minimum Mean Square
Error (BMMSE) estimate of when the prior pdf is as-
sumed to be Gaussian [27]. We, of course, make no such as-
sumption on the prior. Moreover, the bounds formulation does
not even assume complete knowledge of the entire distribution
of , unlike the Bayesian MSE bound derived by Ben-Haim et
al. [31]. Our affine model of the bias allows us to assume only
the availability of the first and second order moments of for
the computation of the lower bound. Extending our approach to
the case where the bias is higher order will incorporate corre-
spondingly higher order moments of the distribution of (see
Appendix C). For practical computation of the bound, we use
the noise-free image to estimate the covariance of . In Sec-
tion IV-C, we explain this process in detail. But first, we derive
an analytical expression for the FIM, assuming Gaussian white
M Itasis interesting
9 to note that this optimization indeed yields a negative definite
can be seen in (15). noise.
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
CHATTERJEE AND MILANFAR: IS DENOISING DEAD? 901
TABLE II
SOME IMAGES RANKED ACCORDING TO IMPROVEMENT IN DENOISING YET TO BE ACHIEVED, AS PREDICTED BY OUR BOUNDS.
THE NOISE STANDARD DEVIATION IS 25 AND THE BOUNDS ARE CALCULATED USING 11 11 PATCHES 2
B. Fisher Information Matrix assuming that similar patches are taken into account in de-
The expression for the MSE bound in (18) holds true for any noising any given patch. Note that (21) is only an approximate
noise distribution, which in turn needs to be taken into account expression for the FIM. The FIM takes this exact form only
in deriving an analytical expression for the FIM . Hence, our when identical patches are considered. It is also important to
framework can be used to derive bounds for any noise distribu- reiterate that (21) holds only when we assume that the patches
tion. In this paper, however, we only consider the case of addi- are nonoverlapping. In the case where the image patches are
tive white Gaussian noise (AWGN). Although we assume the considered to be overlapping, the calculation of the FIM be-
noise to be IID pointwise, this does not allow us to immediately comes more complicated and the issue of it being singular arises.
claim statistical independence of all the noise patches across In this paper, we only deal with the nonoverlapping case where
the entire image. In fact, if the patches are allowed to overlap, the noise patches can be considered to be IID.
data from one patch will be duplicated in neighboring patches. The expression for the FIM [and, hence, the bound in (18)]
To make our derivation of the FIM simple, we will assume the thus takes into account the strength of the noise, as well as the
image patches to be nonoverlapping. This allows us to assert that number of radiometrically similar patches that are considered in
the noise patches are mutually independent. Since the corrupting denoising any given patch. In (21), we obtain an expression for
noise patches of size are sampled from a multivariate the FIM considering radiometrically similar patches being
Gaussian, we can write the pdf as available in denoising any given patch. However, the number
of such similar patches will vary widely from image to image,
and also from patch to patch within the same image. For ex-
ample, the corner regions of the box image [Fig. 1(a)] have fewer
(19) matching patches than the smoother regions. Thus, using a fixed
value of for the entire image is unwise. In this section, we de-
scribe how a value of is chosen adaptively in a given image.
where is the total number of (nonoverlapping) patches. As But first, we define a measure of similarity between two patches.
explained earlier, is a random variable and vectors are in- We consider two patches and to be similar if they can be
stances of the variable sampled from a certain (unknown) dis- expressed as
tribution. In the denoising problem, one is required to estimate
each of the instances in an image, and, hence, the FIM is cal-
(22)
culated on a per patch basis. Many denoising algorithms [1], [2],
[6] infer information about a single patch by taking into account
where is a small threshold value. It can be seen that one
multiple similar10 patches. Such algorithms in essence estimate
needs to choose a proper threshold to ensure few false posi-
the vector from multiple similar noisy vectors. In such a
tives and negatives in the similarity detection process. Further,
scenario, we obtain an expression for the FIM as
the threshold should also take into account the number of pixels
present in each patch. For our experiments, we choose to be
where (20) such that all patches that are identified to be similar to
differ (on average) in less than of the range of intensity
values in each pixel location. Assuming this range to be within
(21) 0 to 255, an expression for the threshold is
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
902 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010
Although the FIM is derived for nonoverlapping patches, to be where is the mean of all the vectors that make up
more realistic, we consider overlapping patches in our calcula- the bootstrap sample set .
tion of . This leads to a practical estimate of the number of 4) Compute using the estimated from (25).
patches that is available to any denoising algorithm. Fig. 4 shows 5) Repeat steps 2 through 4, times.
the spatial distribution of values for the house and parrot im- In each of the iterations, an estimate of the covariance of and
ages calculated with 11 11 patches. As can be expected, a corresponding estimate of are obtained as the bootstrap
takes much larger values for the smoother regions than the edge estimates. Finally, these bootstrap estimates of are aver-
and textured regions. While the FIM is calculated on a per patch aged to obtain the estimated MSE bound (denoted as ). The
basis, the covariance in (25) is estimated from all the patches in confidence interval of the MSE bound estimate can be readily
the image, as we describe in the next section. calculated as the variance of the bootstrap estimates. The 95%
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
CHATTERJEE AND MILANFAR: IS DENOISING DEAD? 903
where is the standard error of the bootstrapped estimate where is the estimate of the bound for the entire image,
. We now have a formulation for estimating the lower and are the estimates of the bounds on the MSE and the
bound and its associated confidence interval for an image (or a SSE respectively for the th cluster, denotes the cardinality
cluster) where the patches are structurally similar. In the next of the set , denotes the set of all patches in the image, and
section we extend this formulation to more general images. is the weight corresponding to the th cluster in the aver-
aging process. An expression for the 95% confidence interval
V. BOUNDS FOR GENERAL IMAGES can be obtained by calculating the standard deviation of
Until now, we have assumed that the image patches we deal the estimte
with are geometrically similar across the entire image (that is,
samples from a single ), although the patch intensities may (29)
differ. This was necessary only for the sake of clarity of the pre-
sentation. To make our formulation applicable to geometrically
heterogeneous images (Fig. 5), we need to first cluster the image where is the standard deviation of the estimate. The
into geometrically similar regions. Analysis of denoising perfor- 95% confidence interval, as shown before in (27), is the Normal
mance can be considered independently for each such region of interval
similar structure. The performance limits on denoising for a par- (30)
ticular image can then be calculated as an aggregate of the MSE
bounds for each of the clusters. In our previous work [5], we We now have an expression for the MSE bound and its asso-
introduced a scheme of clustering an image into geometrically ciated confidence interval computed from an independent anal-
similar regions for image denoising. Later in this section we give ysis of each cluster. Referring to our discussions on the achiev-
a brief description of our automatic “geometric” clustering. In ability of the bound in Section IV-A, we expect the bound to
the present scenario, however, we are chiefly interested in com- be theoretically achievable by a linear MMSE estimator in each
puting a performance bound. Therefore, we can assume an ideal cluster. However, in addition to perfect knowledge of the first
“oracle” clustering method (which may be user-defined) to char- and second order moments of , such an estimator now has
acterize the various clusters in a given image. Assuming avail- to have access to “perfect” clustering as well. Due to all these
ability of such a clustering, we proceed to calculate the MSE nuances one can only hope to come up with an affine estimator
bound for any general image that is composed of such (not with performance close to the bound, thus making our formula-
necessarily contiguous) clusters. tion a valid lower bound. In the next section, we briefly describe
one particular method of performing clustering based on under-
A. Calculating the Bounds lying patch geometry.
Clustering the given image decomposes it into multiple seg-
ments such that patches of similar geometric structure are cap- B. Practical (“Nonoracle”) Clustering
tured in the same cluster (see Fig. 7). In such a case, we can as- Clustering is a much studied research problem that has led to
11This interval formulation is accurate only if the distribution of
is close Q a number of different methods [43], [44]. Image clustering is a
subset of this huge field where researchers have devoted con-
to Normal. Our experiments indicate that the histograms of the bootstrapped
Q values for different images indeed closely approximate a Gaussian. siderable attention to the choice of a clustering method as well
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
904 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
CHATTERJEE AND MILANFAR: IS DENOISING DEAD? 905
Fig. 8. MSE bounds computed on simulated images and compared with the performance of some state-of-the-art methods (BM3D [6], K-SVD [4], SKR [3],
2
K-LLD [5]). The bounds are calculated using 11 11 patches. (a) Stripes image; (b) box image; (c) bounds for stripes image; (d) bounds for box image.
lower than the performance of the state-of-the-art denoising lated images. This fact is captured by our bounds formulation
algorithms. as well.
As a next step, we calculate the MSE bounds for another, Next, we show experimental results and bounds calculations
more interesting, simulated image. Fig. 8(b) shows the box for some general images (Fig. 5). Such images are typically
image (of size 200 200) where, as opposed to the stripes made up of regions of both smooth and textured areas. Thus,
image, the edges vary in directionality. Clearly, such an image clustering is needed to group together patches of similar geo-
requires the use of multiple clusters to capture the different metric structure. In our experiments, we cluster each image into
geometric structures. As shown earlier in Fig. 1, we make use five clusters using the technique outlined in Section V-B. The
of four clusters to capture the smooth, horizontal and vertical MSE bound is then calculated on a per cluster basis from which
edges, and the corner regions. Fig. 8(d) shows the calculated a final estimate of the MSE bound is obtained for the entire
MSE bounds for the box image for different noise standard image, along with a confidence interval for the estimate. Fig. 10
deviations and compares them to the performance of denoising shows the MSE bounds obtained for these images using a patch
methods. It can be seen that this image is more difficult to size of 11 11 and its comparison to performances of some
denoise than the stripes image and the predicted MSE bound state-of-the-art methods. The bounds for different images also
is considerably lower than the MSE obtained by any of the give us an idea of the relative difficulty and the amount of im-
state-of-the-art denoising methods. provement that can be expected to be achieved in denoising the
We now present experimental results obtained using images images. This can be seen in Table II where the images are ranked
containing relatively uniform natural texture. These images based on the relative efficiency which we define as the ratio of
typically contain (stochastic) repetitive patterns. Patches from the predicted MSE bound and the MSE of the best denoising
such images can be considered to be geometrically similar and algorithm for each image. There it can also be seen that images
hence can be grouped together in a single cluster. However, containing a fair amount of texture are denoised quite well as
the patches typically contain more natural variability in their compared to the predicted bounds and little room for improve-
structure than the simulated stripes image. It can be seen in ment exists. However, at higher signal-to-noise levels, there is
Fig. 9(c) that most of the methods perform quite comparably to still room for improvement even for these images. This can be
the predicted bounds in the heavily textured grass image. The seen from Fig. 11 where we plot the relative efficiency (in deci-
bound in Fig. 9(d) for another texture image (cloth) is lower bels) across different noise levels. The plots there also indicate
than the best performing method (BM3D), but not significantly that better denoising can be expected for much smoother im-
so. These seem to indicate that, theoretically, the performance ages, even though the absolute MSEs of denoising methods are
can not be vastly improved for such class of images. Also note comparatively lower than those for textured images (Fig. 10).
that the MSE for each of the methods (and our bounds predic- Until now, the bounds that we have shown were obtained with
tion) are much higher than those obtained for the simulated a fixed patch size (11 11). Not surprisingly, the patch size
images. This is because the (stochastic) variability in the image plays a role in calculation of the MSE bounds. Too large a patch
patches makes them harder to denoise than the simpler simu- size might capture regions of widely varying geometric structure
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
906 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010
Fig. 9. Bounds for texture images compared to denoising performance of some state-of-the-art denoising methods. A single cluster is considered for this experi-
ment with a patch size of 11 11. (a) Grass image; (b) cloth image; (c) bounds for grass image; (d) bounds for cloth image. 2
in a single patch and also result in fewer similar patches being can be seen for the general images of Fig. 5 where clearly
present in the image. On the other hand, too small a patch size is not a good choice. As a general rule, choosing a value of
can lead to degraded denoising performance resulting from the to lie within 5 and 10 leads to a stable estimate of the bound
lack of geometric structure captured by each patch. In practice, without incurring unnecessary time penalty in clustering.
noise greatly impairs the search for nearest neighbors when too The experimental results in this section show that although
small a patch size is considered. In our work, search for sim- the current state-of-the-art methods perform denoising quite
ilar patches is carried out on the noise-free image resulting in well, there is still room for improvement in some cases. In that
larger values of when using smaller patches. But this effect sense, given the tremendous variety of images, denoising is a
is typically stabilized with patch sizes of 11 11 or beyond. problem that is still open. In practice, images containing more
Fig. 12(a) illustrates this effect on different images. Note how texture (e.g., parrot and Barbara images) are harder to denoise
the bound on the predicted MSE increases at different rates as than smoother ones (e.g., house image) and this is shown in
the patch size grows from 5 5 to 19 19 for the images. In the denoising performance of all the state-of-the-art denoising
our comparisons, we calculated the bounds with a fixed patch methods to which we compared. However, our bounds seem
size of 11 11 which is a reasonable choice for denoising as it to indicate that very little room for improvement exists for
can capture the underlying patch geometry while offering suffi- images rich in texture. One probable reason for this is that for
cient robustness in the search for similar patches. naturally occurring textures, few similar patches may exist,
Another parameter that influences our predicted lower bound leading to poor denoising. As an extreme case, consider images
is the number of clusters. Clustering ensures that patches of where, on average, is close to 1. Denoising then has to be
similar geometric structure are grouped together. In Fig. 12(b), performed from essentially a single observation of each patch
we show the effect of the predicted bounds as a function of in- and, hence, not much denoising can be expected. Our formu-
creasing number of clusters. Note how, in most cases, the MSE lation also cannot be expected to predict an informative bound
bounds change little once the number of clusters is chosen to for such extreme cases. However, for most general images, our
be or higher. This may encourage one to think that it formulation predicts meaningful bounds, as can be seen from
might be best to use a much larger number of clusters the various experiments shown in this section.
. However, with a smaller , we can ensure the pres-
ence of enough patches in the clusters so as to obtain a reason- VII. CONCLUSION
ably accurate estimate of the covariance matrix for each cluster. In this paper, we studied performance bounds on denoising.
At the same time, we do not compromise on the requirement We formulated a method of calculating the lower bounds on the
that patches of similar geometric structure be grouped together MSE that accounts for the strength of the corrupting noise, the
in each cluster. On the other hand, choosing too small a value number of observations that are typically available to estimate a
for results in an erroneous bound as dissimilar patches may denoised patch, as well as the variability of the geometric struc-
be clustered together and the covariance matrix is then learned tures in the image. It was derived taking into account the effect
assuming that all vectors are sampled from a single . This of prior information, without the knowledge of the entire prior
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
CHATTERJEE AND MILANFAR: IS DENOISING DEAD? 907
Fig. 10. Comparison of some state-of-the-art methods with our bounds formulation for some general images. The patch size is fixed at 11 2 11 and the number
K
of clusters ( ) used is 5 in all the cases. (a) House image; (b) parrot image; (c) Lena image; (d) Barbara image.
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
908 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010
Fig. 12. MSE bounds for noise standard deviation 25 as a function of (a) varying patch size with K = 1 for the grass and cloth images, and K = 5 for house
and parrot images and (b) varying number of clusters with patch size 11 11. 2
be as accurate, it can still serve as an indicator of the optimal This allows us to calculate the bias of such nonlinear weighted
parameters to use for denoising. averaging methods as
APPENDIX A
MATHEMATICAL JUSTIFICATION FOR AFFINE BIAS
Most current state-of-the-art methods perform denoising of
any given patch by searching for similar patches in the
noisy image. Here, we show that such class of nonlinear de-
noising methods produce biased estimates and that the bias for
such methods can be shown to be an affine function of the under-
lying patch . In this derivation we assume that for two patches (36)
, to be similar, their noise-free versions will have to be
similar and can be written as
where and .
such that (32) As can be seen from the above derivation, to first order, the
bias is an affine function of . While the parameters of the
where is some small threshold value and is a vector. The affine bias (namely, and ) are different for each patch, we
denoised estimate of the patch is obtained by performing a make the simplifying assumption that the same and pro-
weighted averaging over all (say ) such similar noisy patches. vide an adequate approximation of the bias within members of
In general, this can be written as the same cluster. This assumption is also statistically justified in
Section II of the paper.
(33)
APPENDIX B
OPTIMAL PARAMETERS FOR AFFINE BIAS FUNCTION
where is a (data-dependent) weight matrix that measures
the similarity between patches and . Using the data model In this section we derive expressions for and that mini-
of (2), and (32) above, we can express (33) as mize the cost function of (13). This can be obtained by solving
a system of simultaneous equations [shown in (14)]. To do this,
we first solve for
(34)
(35) (37)
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
CHATTERJEE AND MILANFAR: IS DENOISING DEAD? 909
(43)
(38)
.. ..
. .
(39)
(44)
where is the covariance of .
Thus, we obtain the optimal bias parameters that minimize the where is the th column of the identity matrix containing all
function as zeros except a one at the th position and
. We can then write (43) as
(40)
(41)
(45)
APPENDIX C
HIGHER ORDER BIAS MODEL As before, we take the derivatives of the right hand side of (45)
In Section II, we assumed that the bias can be modeled rea- with respect to the unknown parameters ( , and ) and
sonably well by an affine function of . This allows us to derive solve the equations to get expressions for the optimal parameters
the corresponding optimal bias function in Section IV-A of the that minimize . Differentiating the right hand side of (45) with
paper and, finally, an expression for the MSE bound. Although respect to , , and , we get three simultaneous equations
we have shown experimentally that the bias from some of the
recent denoising methods can be effectively modeled as affine, (46)
the question about the effect of higher order models remains. In
this section, we briefly study the implications of such a higher
order model for the bias. For simplicity, we model the bias func-
tion to be a restricted second order model given by (47)
and
(42)
(49)
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
910 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010
(55)
(50)
(52)
where we denote where is related to the multidimensional kurtosis (fourth
to be the higher order moment that is related to the multidimen- order moment) of the pdf and is defined as
sional skewness of the pdf of . Now, we use the expression for
as given in (49) and plug it in (50) to obtain
(53)
This allows us to rewrite (51) as
(54)
REFERENCES
[1] A. Buades, B. Coll, and J. M. Morel, “A review of image denoising
methods, with a new one,” Multiscale Model. Simul., vol. 4, no. 2, pp.
490–530, 2005.
[2] C. Kervrann and J. Boulanger, “Optimal spatial adaptation for patch-
based image denoising,” IEEE Trans. Image Process., vol. 15, no. 10,
pp. 2866–2878, Oct. 2006.
[3] H. Takeda, S. Farsiu, and P. Milanfar, “Kernel regression for image
processing and reconstruction,” IEEE Trans. Image Process., vol. 16,
no. 2, pp. 349–366, Feb. 2007.
[4] M. Elad and M. Aharon, “Image denoising via sparse and redundant
representations over learned dictionaries,” IEEE Trans. Image Process.,
vol. 15, no. 12, pp. 3736–3745, Dec. 2006.
[5] P. Chatterjee and P. Milanfar, “Clustering-based denoising with locally
learned dictionaries,” IEEE Trans. Image Process., vol. 18, no. 7, pp.
1438–1451, Jul. 2009.
[6] K. Dabov, A. Foi, V. Katkovnik, and K. O. Egiazarian, “Image de-
noising by sparse 3-D transform-domain collaborative filtering,” IEEE
Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, Aug. 2007.
[7] M. D. Robinson and P. Milanfar, “Fundamental performance limits in
image registration,” IEEE Trans. Image Process., vol. 13, no. 9, pp.
1185–1199, Sep. 2004.
[8] M. Xu, H. Chen, and P. K. Varshney, “Ziv-Zakai bounds on image reg-
istration,” IEEE Trans. Signal Process., vol. 57, no. 5, pp. 1745–1755,
May 2009.
[9] M. D. Robinson and P. Milanfar, “Statistical performance analysis
(51) of super-resolution,” IEEE Trans. Image Process., vol. 15, no. 6, pp.
1413–1428, Jun. 2006.
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.
CHATTERJEE AND MILANFAR: IS DENOISING DEAD? 911
[10] S. Baker and T. Kanade, “Limits on super-resolution and how to break [36] A. Srivastava, A. B. Lee, E. P. Simoncelli, and S.-C. Zhu, “On advances
them,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 9, pp. in statistical modeling of natural images,” J. Math. Imag. Vis., vol. 18,
1167–1183, Sep. 2002. pp. 17–33, 2003.
[11] Z. Lin and H.-Y. Shum, “Fundamental limits of reconstruction-based [37] S. Lyu and E. P. Simoncelli, “Modeling multiscale subbands of pho-
superresolution algorithms under local translation,” IEEE Trans. Pat- tographic images with fields of Gaussian scale mixtures,” IEEE Trans.
tern Anal. Mach. Intell., vol. 26, no. 1, pp. 83–97, Jan. 2002. Pattern Anal. Mach. Intell., vol. 31, no. 4, pp. 693–706, Apr. 2009.
[12] Z. Lin, J. He, X. Tang, and C.-K. Tang, “Limits of learning-based super- [38] Y. Weiss and W. Freeman, “What makes a good model of natural im-
resolution algorithms,” Int. J. Comput. Vis., vol. 80, no. 3, pp. 406–420, ages?,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition,
Aug. 2008. Minneapolis, MN, Jun. 2007, pp. 1–8.
[13] T. Treibitz and Y. Y. Schechner, “Recovery limits in pointwise degra- [39] M. A. Woodbury, Inverting Modified Matrices, Statistical Research
dation,” presented at the IEEE Int. Conf. Computational Photography, Group, Princeton Univ., Princeton, NJ, Memorandum Rep. 42, 1950.
San Francisco, CA, Apr. 2009. [40] B. Efron, “Bootstrap methods: Another look at the Jacknife,” Ann.
[14] S. Voloshynovskiy, O. Koval, and T. Pun, “Image denoising based Statist., vol. 7, no. 1, pp. 1–26, 1979.
on the edge-process model,” Signal Process., vol. 85, no. 10, pp. [41] M. R. Chernick, Bootstrap Methods: A Guide for Practitioners and
1950–1969, Oct. 2005. Researchers, ser. Probability and Statistics, 2nd ed. Hoboken, NJ:
[15] J. Mairal, G. Sapiro, and M. Elad, “Learning multiscale sparse repre- Wiley, Nov. 2007.
sentations for image and video restoration,” SIAM Multiscale Model. [42] L. Wasserman, All of Statistics: A Concise Course in Statistical Infer-
Simul., vol. 7, no. 1, pp. 214–241, Apr. 2008. ence, ser. Springer Texts in Statistics, 1st ed. New York: Springer,
[16] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Non-local 2004.
sparse models for image restoration,” presented at the IEEE Int. Conf. [43] S. Lloyd, “Least squares quantization in PCM,” IEEE Trans. Inf.
Computer Vision, Tokyo, Japan, Sep. 2009. Theory, vol. 28, no. 2, pp. 129–137, March 1982.
[17] S. Kay and Y. C. Eldar, “Rethinking biased estimation,” Signal [44] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Trans.
Process. Mag., vol. 25, no. 3, pp. 133–136, May 2008. Pattern Anal. Mach. Intell., vol. 17, no. 8, pp. 790–799, Aug. 1995.
[18] Y. C. Eldar, “Rethinking biased estimation: Improving maximum like- [45] H. J. Seo and P. Milanfar, “Training-free, generic object detection using
lihood and the Cramér–Rao bound,” Found. Trends Signal Process., locally adaptive regression kernels,” IEEE Trans. Pattern Anal. Mach.
vol. 1, no. 4, pp. 305–449, 2008. Intell., to be published.
[19] Y. C. Eldar, “MSE bound with affine bias dominating the Cramér–Rao [46] C. M. Stein, “Estimation of the mean of a multivariate normal distribu-
bound,” IEEE Trans. Signal Process., vol. 56, no. 8, pp. 3824–3836, tion,” Ann. Statist., vol. 9, no. 6, pp. 1135–1151, Nov. 1981.
Aug. 2008. [47] I. Motoyoshi, S. Nishida, L. Sharan, and E. H. Adelson, “Image sta-
[20] N. Draper and H. Smith, Applied Regression Analysis, ser. Probability tistics and the perception of surface qualities,” Nature, vol. 447, pp.
and Statistics, 3rd ed. Hoboken, NJ: Wiley, Apr. 1998. 206–209, May 2007.
[21] L. P. Seidman, “Performance limitations and error calculations for pa- [48] L. Sharan, Y. Li, I. Motoyoshi, S. Nishida, and E. Adelson, “Image
rameter estimation,” Proc. IEEE, vol. 58, pp. 644–652, May 1970. statistics for surface reflectance perception,” J. Opt. Soc. Amer. A, vol.
[22] H. Cramér, Mathematical Methods of Statistics. Princeton, NJ: 25, no. 4, pp. 846–865, Apr. 2008.
Princeton Univ. Press, 1946. [49] A. Hyvärinen, J. Hurri, and P. O. Hoyer, Natural Image Statistics: A
[23] C. R. Rao, “Information and the accuracy attainable in the estimation Probabilistic Approach to Early Computational Vision, ser. Computa-
of statistical parameters,” Bull. Calcutta Math. Soc., vol. 37, pp. 81–89, tional Imaging and Vision. London, U.K.: Springer, 2009, vol. 39.
1945.
[24] C. R. Rao, “Minimum variance and the estimation of several param-
eters,” in Proc. Cambridge Philosophical Society, 1946, vol. 43, pp.
280–283.
[25] J. Ziv and M. Zakai, “Some lower bounds on signal parameter estima- Priyam Chatterjee (S’07) received the B.Tech.
tion,” IEEE Trans. Inf. Theory, vol. IT-15, no. 3, pp. 386–391, May degree in Information technology from the Univer-
1969. sity of Kalyani, India, and the M.Tech. degree in
[26] Bayesian Bounds for Parameter Estimation and Nonlinear Filtering/ electrical engineering from the Indian Institute of
Tracking, H. L. van Trees and K. L. Bell, Eds., 1st ed. Piscataway, Technology (IIT), Bombay, India, in 2003 and 2006,
NJ: IEEE, Aug. 2007. respectively. He is currently pursuing the Ph.D.
[27] S. M. Kay, Fundamentals of Statistical Signal Processing: Estima- degree in electrical engineering at the University of
tion Theory, ser. Signal Processing. Upper Saddle River, NJ: Pren- California, Santa Cruz.
tice-Hall, 1993, vol. 1. His research interests are in image and video
[28] H. L. van Trees, Detection, Estimation, and Modulation Theory. New processing (denoising, interpolation, deblurring, and
York: Wiley, 1968. super-resolution).
[29] J. A. Fessler and A. O. Hero, “Cramér–Rao lower bounds for biased
image reconstruction,” in Proc. 36th Midwest Symp. Circuits and Sys-
tems, Aug. 1993, vol. 1, pp. 253–256.
[30] T. Y. Young and R. A. Westerberg, “Error bounds for stochastic esti- Peyman Milanfar (SM’98–F’10) received the
mation of signal parameters,” IEEE Trans. Inf. Theory, vol. IT-17, no. B.S. degree in electrical engineering/mathematics
5, Sep. 1971. from the University of California, Berkeley, and
[31] Z. Ben-Haim and Y. C. Eldar, “A lower bound on the Bayesian MSE the Ph.D. degree in electrical engineering from the
based on the optimal bias function,” IEEE Trans. Inf. Theory, vol. 55, Massachusetts Institute of Technology, Cambridge.
no. 11, pp. 5179–5196, Nov. 2009. He is a is Professor of electrical engineering at
[32] J. S. D. Bonet and P. Viola, “A non-parametric multi-scale statistical the University of California, Santa Cruz (UCSC).
model for natural images,” in Advances in Neural Information Pro- Prior to UCSC, he was at SRI (formerly Stanford
cessing. Cambridge, MA: MIT Press, 1997, pp. 773–779. Research Institute) and a Consulting Professor of
[33] W. Hong, J. Wright, K. Huang, and Y. Ma, “Multiscale hybrid linear computer science at Stanford University, Stanford,
models for lossy image representation,” IEEE Trans. Image Proces., CA. In 2005, he founded MotionDSP, Inc. to bring
vol. 15, no. 12, pp. 3655–3671, Dec. 2006. state-of-art video enhancement technology to consumer and forensic markets.
[34] S.-C. Zhu, “Statistical modeling and conceptualization of visual pat- His technical interests are in statistical signal, image and video processing, and
terns,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 6, pp. computational vision.
691–712, Jun. 2003. Dr. Milanfar won a National Science Foundation CAREER award in 2000.
[35] E. P. Simoncelli and B. A. Olshausen, “Natural image statistics and He is an Associate Editor for the IEEE TRANSACTIONS ON IMAGE PROCESSING.
neural representation,” Annu. Rev. Neurosci., vol. 24, pp. 1193–1216, He is a member of the Signal Processing Society’s Image, Video, and Multidi-
May 2001. mensional Signal Processing (IVMSP) Technical Committee.
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:475 UTC from IE Xplore. Restricon aply.