You are on page 1of 12

THE IMAGE DEBLURRING PROBLEM: MATRICES, WAVELETS, AND MULTILEVEL METHODS

DAVID AUSTIN1 , MALENA I. ESPAÑOL2 , AND MIRJETA PASHA2

A BSTRACT. The image deblurring problem consists of reconstructing images from blur and noise contaminated available
data. In this AMS Notices article, we provide an overview of some well known numerical linear algebra techniques that are
use for solving this problem. In particular, we start by carefully describing how to represent images, the process of blurring
an image and modeling different kind of added noise. Then, we present regularization methods such as Tikhonov (on the
arXiv:2201.09831v1 [math.NA] 24 Jan 2022

standard and general form), Total Variation and other variations with sparse and edge preserving properties. Additionally, we
briefly overview some of the main matrix structures for the blurring operator and finalize presenting multilevel methods that
preserve such structures. Numerical examples are used to illustrate the techniques described.

1. I NTRODUCTION goes beyond Hubble’s story. For instance, image deblur-


ring is widely used in many applications, such as pattern
After the launch of the Hubble Space Telescope in
recognition, computer vision, and machine intelligence.
1990, astronomers were gravely disappointed by the qual-
Moreover, image deblurring shares the same mathemat-
ity of the images as they began to arrive. Due to mis-
ical formulation as other imaging modalities. For in-
calibrated testing equipment, the telescope’s primary mir-
stance, in many cases we have only limited opportunities
ror had been ground to a shape that differed slightly from
to capture an image; this is particularly true of medical
the one intended resulting in the misdirection of incoming
images, such as computerized tomography (CT), proton
light as it moved through the optical system. The blurry
computed tomography (pCT), and magnetic resonance
images did little to justify the telescope’s $1.5 billion price
imaging (MRI), for which equipment and patient avail-
tag.
ability are scarce resources. In cases such as these, we
Three years later, space shuttle astronauts installed a
need a way to extract meaningful information from noisy
specially designed corrective optical system that essen-
images that have been imperfectly collected.
tially fixed the problem and yielded spectacular images
In this article, we will describe a mathematical model
(see Figure 1). In the meantime, mathematicians de-
of how digital images become blurred as well as several
vised several ways to convert the blurry images into high-
mathematical issues that arise when we try to undo the
quality images. The process of mathematical deblurring
blurring. While blurring may be effectively modeled by a
is the focus of this article.
linear process, we will see that deblurring is not as sim-
ple as inverting that linear process. Indeed, deblurring
belongs to an important class of problems known as dis-
crete ill-posed problems, and we will introduce some tech-
niques that have become standard for solving them. In ad-
dition, we will describe some additional structures in the
linear operators that cause the required computations to
be feasible.

2. D IGITAL IMAGES AND BLURRING


We start by describing digital images and a process by
F IGURE 1. Hubble’s view of the M100 which they become blurred.
galaxy, soon after launch on the left and As illustrated in Figure 2, the lens of a digital cam-
after corrective optics were installed in era directs photons entering the camera onto a charge-
1993. NASA / ESA coupled device (CCD), which consists of a rectangu-
Many factors can cause an image to become blurred, lar p × q array of detectors. Each detector in the
such as motion of the imaging device or the target ob- CCD converts a count of the photons into an electri-
ject, errors in focusing, or the presence of atmospheric cal signal that is digitized by an analog-to-digital con-
turbulence [7]. Indeed, the need for image deblurring verter (ADC). The result is a digital image stored as
1
a p × q matrix whose entries represent the intensi- contained in a single pixel (i, j) spilling over into an ad-
ties of light recorded by each of the CCD’s detectors. jacent pixel (k, l) according to the Gaussian
Lens
 2  2 !
1 1 k−i 1 l−j
CCD exp − −
N 2 s 2 s
 2 !  2 !
1 1 k−i 1 j−k
= exp − exp − ,
N 2 s 2 s
ADC
(1)
where s is a parameter that controls the spread in the in-
tensity and N is a normalization constant so that the total
intensity sums to 1. As we will see later, the fact that the
two-dimensional Gaussian can be written as a product of
one-dimensional Gaussians has important consequences
for our ability to efficiently represent the blurring process
F IGURE 2. A simple model of how a as a linear operator.
digital image is created. Though we visually experience a grayscale im-
A grayscale image is represented by a single matrix age as a p × q matrix of pixels X, we will math-
with integer entries describing the brightness at each lo- ematically represent an image as a pq-dimensional
cation. A color image is represented by three matrices vector x by stacking the columns of X on top
that describe the colors in terms of their red, green, and of one another. That is, x = vec(X) =
blue constituents. Of course, we may see these matrices (X11 , . . . , Xp1 , X21 , . . . , Xp2 , . . . , X1q , . . . , Xpq )T
by zooming in on a digital image until we see individual with Xij being the intensity value of the pixel at row
pixels. i and column j. The blurring process is linear as the num-
Perhaps due to imperfections in the camera’s lens or ber of photons that arrive at one pixel is the sum of the
the lens being improperly focused, it is inevitable that number of misdirected photons intended for nearby pix-
photons intended for one pixel bleed over into adjacent els. Consequently, there is a blurring matrix A that blurs
pixels, and this leads to blurring of the image. To illus- the image x into the image b = Ax. Deblurring refers
trate, we will consider grayscale images comprised of ar- to the inverse process of recovering the original image x
rays of 64 × 64 pixels. In Figure 3, the image on the left from its blurred counterpart b.
shows a single pixel illuminated while on the right we see Each column of A is the result of blurring a single
how photons intended for this single pixel have spilled pixel, which means that each column of A is formed from
over into adjacent pixels to create a blurred image. the image on the right of Figure 3 by translating to a differ-
ent center. When that center is near the edge of the image,
some photons will necessarily be lost outside the image,
and there are a few options for how to incorporate this fact
into our model. In real-life settings, it is possible to have
knowledge only over a finite region, the so called Field
of View (FOV), that defines the range that a user can see
from an object. It is then necessary to make an assump-
tion on what is outside the FOV by means of the boundary
conditions. For instance, one option to overcome the loss
outside the FOV is to simply accept that loss, in which
case we say that the matrix has zero boundary conditions.
This has the effect of assuming that the image is black
(pixel values are zero) outside the FOV, which can lead to
an artificial black border around a deblurred image (see
F IGURE 3. The intensity from a sin- the image on the left on Figure 4).
gle pixel, shown on the left, is spread In some contexts, it can be advantageous to as-
out across adjacent pixels according to sume reflexive boundary conditions, which assumes
a Gaussian blur, as seen on the right. that the photons are reflected back onto the image. In
There are several models used to describe blurring, but other scenarios of interest, periodic boundary condi-
a simple one that we choose here has the light intensity tions, which assumes the lost photons reappear on the
2
opposite side of the image as if the image repeats it- As seen in Figure 6, this level of noise cannot be easily
self indefinitely in all directions outside the FOV, are a detected.
suitable fit. Nevertheless, in practical settings, we pe-
riodically extend only some pixel values close to the
boundary (see the image on the right on Figure 4).

F IGURE 4. Image with assumed zero F IGURE 6. The blurred image btrue
boundary conditions is shown on the on the left with a small amount of
left and one with assumed periodic Gaussian white noise added to obtain b
boundary condition is shown on the on the right.
right. The red box represents the FOV. Two other models of noise are illustrated in Figure 7.
Under low light intensities, as encountered in astropho-
tography, the number of photons that arrive on the CCD
3. A DDING NOISE while the image is exposed may differ from the num-
Let us consider the grayscale image x true
shown on the ber expected. A Poisson distribution provides an effec-
left of Figure 5 and its blurred version b true
= Axtrue tive description of the resulting image as seen on the left.
shown on the right. If the digital image is transmitted over a communication
channel, some of the transmitted bits may be corrupted in
transmission resulting in “salt and pepper” noise demon-
strated on the right.

F IGURE 5. An image xtrue on the left


is blurred to obtain btrue on the right.
If we had access to btrue , it would be easy enough to
recover xtrue by simply solving the linear system Ax =
btrue . However, the conversion of photon counts into an
electrical signal by the CCD and then a digital reading by F IGURE 7. Poisson noise is added to
the ADC introduces electrical noise into the image that is the blurred image on the left and salt
ultimately recorded. The recorded image b is therefore and pepper noise on the right.
a noisy approximation to the true blurred image btrue , Because our recorded image b is a good approximation
so we write b = btrue + e, where e is the noise vec- of btrue , we might naively expect to find a good approxi-
tor. There are various models used to describe the kind of mation of xtrue by solving the linear system of equations
noise added. For instance, we might assume that the noise Ax = b. However, its solution, which we call xLS , turns
is Gaussian white noise, which means that the entries in out to be very different from the original image xtrue as is
e are sampled from a normal distribution with mean zero. seen in Figure 8. As we will soon see, this behavior results
In our example, we assume that kek2 = 0.001kbtrue k2 , from the fact that deblurring is a discrete linear ill-posed
that is, the white noise is about 0.1% of the image btrue . problem.
3
values from a group of smaller ones. The plot of the sin-
gular values in Figure 9 shows that the difference between
the largest and smallest singular values is about thirteen
orders of magnitude, which indicates that A is highly ill-
conditioned.

F IGURE 8. On the left we see the orig-


inal image xtrue while the right shows
xLS , the solution to the equation Ax =
b, where b is the noisy recorded image.

We are now faced with two questions: how can we re-


construct the original image xtrue more faithfully and how
can we do it in a computationally efficient way. For in-
stance, today’s typical phone photos have around 10 mil- F IGURE 9. The singular values σ` of A.
lion pixels, which means that the number of entries in the Second, the singular vectors u` and v` become more
blurring matrix A is about 100 trillion. Working with a and more oscillatory as ` increases. Figure 10 shows
matrix of that size will require some careful thought. images V` representing eight right singular vectors v`
(v` = vec(V` )) of the blurring matrix constructed above
4. D ISCRETE L INEAR I LL -P OSED P ROBLEMS and demonstrates how the frequency of the oscillations
increases as ` increases. Since the blurring matrix A
The singular value decomposition (SVD) of the matrix spreads out any peaks in an image, it tends to dampen
A offers insight into why the naively reconstructed image high frequency oscillations. Therefore, a right singular
xLS differs so greatly from the original xtrue . Consider vector vi representing a high frequency will correspond
both vectors xtrue and b of size pq and define m = pq. to a small singular value σ` since Av` = σ` u` .
Now suppose that A ∈ Rm×m has full rank and its SVD
is given by
m
X
T
A = UΣV = σ` u` v`T ,
`=1

where U = (u1 , . . . , um ) ∈ Rm×m and V =


(v1 , . . . , vm ) ∈ Rm×m are matrices having orthonormal
columns so that UT U = VT V = I, and
Σ = diag(σ1 , . . . , σm ), σ1 ≥ σ2 ≥ . . . ≥ σm > 0.
The scalars σ` are the singular values of A and the vec-
tor u` and v` are the left and right singular vectors of
A, respectively. The singular value decomposition pro- F IGURE 10. Eight right singular vec-
vides orthonormal bases defined by the columns of U and tors v` .
V so that A acts as scalar multiplication by the singular The third property of discrete linear ill-posed prob-
values: Av` = σ` u` . Since the singular values form a lems is known as the discrete Picard condition, which
non-increasing sequence, the decomposition concentrates says that the coefficients of btrue expressed in the left
the most important data in the beginning singular vectors, singular basis, |uT` btrue |, decay as fast as the singular
an observation that is key to our work in reconstructing values σ` . This is illustrated on the left of Figure 11,
xtrue . which shows both |uT` btrue | and the singular values σ` .
Discrete linear ill-posed problems are characterized by Since uT` btrue = σ` v`T xtrue , the discrete Picard condi-
three properties, which are satisfied by the deblurring tion holds when the original image xtrue is not dominated
problem. First, the singular values σ` decrease to zero by high frequency contributions v`T xtrue for large `. This
without a large gap separating a group of large singular is a reasonable assumption with most digital photographs.
4
measure of the amount of noise in the reconstructed im-
age.

5. R EGULARIZATION
As an alternative to accepting xLS as our reconstructed
image, we can compute approximations to xtrue by filter-
F IGURE 11. The coefficients ing out the noise e while retaining as much information
|uT` btrue | are seen on the left while the as possible from the measured data b. This process is
known as regularization, and there are several possible
coefficients |uT` b| are on the right.
approaches we can follow.
By contrast, the coefficients |uT` b| = |uT` btrue + uT` e|
A first natural idea is to introduce a set of filtering fac-
appear on the right of Figure 11. The discrete Picard con-
tors φ` on the SVD expansion and construct a regularized
dition means that the coefficients |uT` btrue | decrease to
solution as
zero. However, the coefficients of the added white noise m
uT b
|uT` e| remain relatively constant for large i. At some point
X
xreg = φ` ` v ` ,
the noise contributed by e overwhelms the information σ`
`=1
contained in btrue . with the filter factors being φ` ≈ 0 for large values of `
Writing both vectors xLS and xtrue as a linear combi- when the noise dominates and φ` ≈ 1 for small values of
nation of the right singular vectors v` , we see that `, which are the terms in the expansion where the compo-
Xm
uT` btrue nents of both b and btrue are the closest.
true
x = v` One option is to define φ` = 1 for ` smaller than some
σ`
`=1 cut-off k < m and φ` = 0 otherwise. That is, we can sim-
and ply truncate the expansion of xLS in terms of right sin-
m m gular vectors in an attempt to minimize the contribution
X uT b X uT (btrue + e)
xLS = `
v` = `
v` from the terms |uT` e|/σ` for large `. Then, the obtained
σ` σ`
`=1 `=1 regularized solution would be
m m
X uT` btrue X uT e
` k
= v` + v` reg
X uT b`
σ` σ` x = v` .
`=1 `=1 σ`
m `=1
X uT` e
= xtrue + v` . This solution is known as the truncated SVD (TSVD).
σ`
`=1 Remember, however, that the singular values in a dis-
Because the singular values approach 0, the coefficients crete ill-posed problem approach 0 without there being
|uT` e|/σ` grow extremely large, as seen in Figure 12. a gap that would form a natural cut-off point. Instead,
Therefore, xLS includes a huge contribution from high- Tikhonov regularization chooses the filtering factors
frequency right singular vectors v` , which means that xLS σ2
is very oscillatory and not at all related to the original im- φ` = 2 ` 2
σ` + λ
age xtrue that we seek to reconstruct.
for some parameter λ whose choice will be discussed
later. For now, notice that φ` ≈ 1 when σ`  λ and
φ` ≈ 0 when σ`  λ. This has the effect of truncating
the singular vector expansion at the point where the sin-
gular values pass through λ but doing so more smoothly.
This leads to the regularized solution
m
X σ`
xreg = (uT b)v` ,
σ2 + λ2 `
`=1 `

which may also be rewritten as

F IGURE 12. The coefficients of xtrue xreg = (AT A + λ2 I)−1 AT b.


and xLS . This demonstrates that xreg solves the least squares prob-
We also note here that the size of the coefficients shown lem
in Figure 12 cause the norm kxLS k2 to be extremely large.
As we will see shortly, we will consider this norm as a xreg = argmin{kAx − bk22 + λ2 kxk22 }. (2)
x
5
This is a helpful reformulation of Tikhonov regulariza-
tion. First, writing xreg as the solution of a least squares
problem provides us with efficient computational alterna-
tives to finding the SVD of A. Moreover, the minimiza-
tion problem provides insight into choosing the optimal
value of the regularization parameter λ, as we will soon
see.
By the way, this formulation of Tikhonov regulariza-
tion shows its connection to ridge regression, a data sci-
ence technique for tuning a linear regression model in the
presence of multicollinearity to improve its predictive ac-
curacy.
Let us investigate the meaning of (2). Notice that
F IGURE 13. The L-curve in our sam-
ple deblurring problem. The three in-
kAxtrue − bk22 = kbtrue − bk22 = kek22 , (3) dicated points correspond to values of
λ = λA , λB , and λC .
Since the filtering factors satisfy φi ≈ 1 when σ  λ
which is relatively small. We therefore consider the resid- and φ` ≈ 0 when σ`  λ, we will view the point where
ual kAx − bk22 as a measure of how far away we are from σ` = λ as an indication of where we begin filtering. Let
the original image xtrue . Remember that |uT` e|/σ` , the us first consider the value λ = λA , which produces the
contributions to xLS from the added noise, cause kxLS k2 point on the L-curve
T
indicated in Figure 13. The resulting
to be very large. Consequently, we think of the second coefficients |u` b| and φ` |uT` b| are shown in Figure 14.
term in (2) as measuring the amount of noise in x. Filtering begins roughly where the plot of singular values
The regularization parameter λ allows us to balance crosses the horizontal line indicating the value of λA .
these two sources of error. For instance, when λ is small,
the regularized solution xreg , which is the minimum of
(2), will have a small residual kAxreg − bk2 at the ex-
pense of a large norm kxreg k2 . In other words, we tol-
erate a noisy regularized solution in exchange for a small
residual. On the other hand, if λ is large, the regularized
solution will have a relatively large residual in exchange
for filtering out a lot of the noise.

6. C HOOSING THE REGULARIZING PARAMETER λ


When applying Tikhonov regularization for solving F IGURE 14. The choice λ = λA leads
discrete ill-posed inverse problems, a question of high in- to under-smoothing.
terest is: How can we define the best regularization pa- While we have filtered out some of the noise, it appears
rameter λ? A common and well known technique is to that there is still a considerable amount of noise present.
create a log-log plot of the residuals kAxreg − bk2 and This is reflected by the position of the corresponding point
the norm kxreg k2 as we vary λ. This plot, as shown in on the L-curve since the norm kxreg k2 is relatively large.
Figure 13, is usually called an L-curve due to its charac- This choice of λ is too low, and we say that the regularized
teristic shape [15]. solution is under-smoothed.
As mentioned earlier, small values of λ lead to noisy Alternatively, let us consider the regularized solution
regularized solutions while large values of λ produce large constructed with λ = λC as indicated on Figure 13. This
residuals. This means that we move across the L-curve leads to the coefficients |uT` b| and φ` |uT` b| shown in Fig-
from the upper left to the lower right as we increase λ. ure 15.
6
high-frequency contributions that we have necessarily fil-
tered out.

F IGURE 17. The reconstructed image


F IGURE 15. The choice λ = λC leads xreg using the regularization parameter
to over-smoothing. determined by the L-curve is seen on
In this case, we begin filtering too soon so that, while the right, along with the original image
we have removed the noise, we have also discarded some xtrue on the left for comparison.
of the information present in b, which is reflected in the The L-curve furnishes a practical way to identify the
relatively large residual kAxreg − bk2 . This choice of λ optimal regularizing parameter as there are techniques
is too large, and we say the regularized solution is over- that allow us to identify the point of maximal curvature by
smoothed. computing the regularized solution for just a few choices
Finally, considering the case where λ = λB gives the of λ. However, this technique should not be applied un-
regularized solution that appears at the sharp bend of the critically as there are cases in which the optimal regular-
L-curve in Figure 13. The resulting coefficients shown in ized solution does not converge to the true image as the
Figure 16 inspire confidence that this is a good choice for added error approaches zero.
λ. An alternative technique, known as the Discrepancy
Principle [8], relies on an estimate of the size of the er-
ror kek2 . Remember from (3) that we have kAxtrue −
bk2 = kek2 . Moreover, the SVD description of xreg pro-
vides a straightforward explanation for why the residual
kAxreg − bk2 is an increasing function of λ. If we know
kek2 , we simply choose the optimal λ to be the one where
kAxreg − bk2 = kek2 .
Other well-known methods for choosing the parameter
λ include the Generalized Cross Validation (GCV) [10],
that chooses λ to maximize the accuracy with which we
can predict the value of a pixel that has been omitted, the
unbiased predictive risk estimator (UPRE) [19], and more
recently, methods based on learning when training data is
available [4, 5].
F IGURE 16. The choice λ = λB gives
the optimal amount of smoothing. 7. OTHER R EGULARIZATION T ECHNIQUES
In this case, decreasing λ causes us to move upward on Looking at the variational formulation (2) of Tikhonov
the L-curve; we are adding noise without improving the regularization, it is easy to see how it can be extended to
residual. Increasing λ causes us to move right on the L- define other regularization methods by, for example, using
curve; we are losing information as the residual increases different regularization terms.
without removing any more noise. Therefore, λ = λB is
our optimal value. 7.1. General Tikhonov Regularization. The Tikhonov
Figure 17 shows the image xreg obtained using this op- regularization formulation (2) can be generalized to
timal parameter λ = λB . While it is not a perfect recon- min{kAx − bk22 + λkLxk22 }, (4)
struction of the original image xtrue , it is a significant im- x
provement over the recorded image b. Accurately repro- by incorporating a matrix L, which is called the regular-
ducing the sharp boundaries that occur in xtrue requires ization matrix. Its choice is problem dependent and can
7
significantly affect the quality of the reconstructed so- formulated as
lution. Several choices of the regularization matrix in-
min{kAx − bk22 + λkLxk1 },
volve discretization of the derivative operators or framelet x
and wavelet transformations depending on the applica- where L is the TV operator, which once discretized, it
tion. The only requirement on L is that it should satisfy is the same as (5). Looking at Figure 18, we can see
that even though we are using the same operator L, the
N (A) ∩ N (L) = {0},
norms used in the regularization terms are different and
where N (M) denotes the null space of the matrix M. that makes a huge difference. But there is a higher cost to
The general Tikhonov minimization problem (4) has the finding the TV solution, due to the fact that the minimiza-
unique solution tion functional is not differentiable. Still, there are many
algorithms to find its minimum. Here, we apply the It-
xλ = (AT A + λLT L)−1 AT b eratively Reweighted Least Squares (IRLS), that solves a
for any λ > 0. sequence of general form Tikhonov problems. So, instead
of solving only one Tikhonov problem, we solve many.
The TV approach is also very commonly used in com-
pressed sensing where the signal to be reconstructed is
sparse in its original domain or in some transformed do-
main [2, 16]. Recently it has been used in the context of
regularizing large-scale dynamic inverse problems [18] as
well as in learning when training data is available [1].

8. M ATRIX S TRUCTURES
8.1. BTTB Structure. Because of the large-scale prob-
lems, it is useful to consider the structure of the matrix
F IGURE 18. The reconstructed im-
A. For instance, when considering a spatially invariant
ages xreg using the optimal regulariza-
blur and assuming that the image has zero boundary con-
tion parameter and the discretization of
ditions, the matrix A has Block-Toeplitz-Toeplitz-Block
the first derivative operator with the 2-
(BTTB) structure [12], that is, a p by p block-Toeplitz ma-
norm regularization on the left, and TV
trix with each block being a p by p Toeplitz matrix,
regularization on the right.  
In Figure 18, we reconstruct the image by applying the A0 A−1 A−2 . . . A−(p−1)
discretization of the two-dimensional first derivative op-  A1 A0 A−1 . . . A−(p−2) 
 
erator for zero boundary conditions, that is, the matrix L A= 2
 A A 1 A0 . . . A−(p−3)  ,
takes the form of  .. .. .. .. .. 
 . . . . . 
 −1 1  Ap−1 Ap−2 Ap−3 . . . A0
 
I ⊗ L1 −1 1 where for ` = 1, . . . , p we have
L= with L1 =  .. .. 
L1 ⊗ I . .
 `
a a `
a `
. . . a`

−1 1 0 −1 −2 −(p−1)
(5)  a` a`0 a`−1 . . . a`−(p−2) 
 1 
where ⊗ is the Kronecker product [11] defined, for matri- 
A` =  2 a `
a `
a `
. . . a` 
1 0 −(p−3)  .

ces B and C, by  .. .. .. .. .. 
   . . . . . 
b11 C b12 C . . . b1m C a`p−1 a`p−2 a`p−3 . . . a`0
 b21 C b22 C . . . b2m C 
B⊗C= .

.. ..  .
 Notice that to generate these matrices, we only need
 .. . .  the first row and column of each matrix A` , what is called
bm1 C bm2 C . . . bmm C the Toeplitz vector a` = (a`−(p−1) , . . . , a`p−1 ).

7.2. Total Variation Regularization. In many applica- 8.2. BCCB Structure. Another feasible structure arises
tions, the image to be reconstructed is known to be piece- when still considering a spatially invariant blur but assum-
wise constant with regular and sharp edges, like the one ing that the image has periodic boundary conditions. In
we are using as an example in this article. Total varia- this case, the matrix A has a Block-Circulant-Circulant-
tion (TV) regularization is a popular choice that allows Block (BCCB) structure, that is, a p by p block-circulant
the solution to preserve edges. Such regularization can be matrix with each block being a p by p circulant matrix,
8
reordering might be needed. So, the Tikhonov solution

A1 A2 A3 ... Ap
 can be written as
 Ap A1 A2 ... Ap−1  dc dTr
  
  Xλ = Vc (Uc BUr ) VrT ,
T
A = Ap−1 Ap A1 ... Ap−2 
, (dc dTr )2 + λ1

 .. .. .. .. .. 
 . . . . .  where 1 is a matrix of all ones, and dr and dc are the
A2 A3 A4 ... A1 diagonals of Σr and Σc , respectively.
where for ` = 1, . . . , p Notice that if Ar and Ac are Toeplitz matrices, then
 ` Ar ⊗ Ac has BTTB structure, and if Ar and Ac are cir-
a`2 a`3 a`p

a1 ... culant matrices, then Ar ⊗ Ac has BCCB structure.
`
 ap
 ` a`1 a`2 ... a`p−1 

`
A` = ap−1 ap
 a`1 ... a`p−2 
.
 . .. .. ..  9. M ULTILEVEL M ETHODS
 .. ..
. . . . 
Because dealing with images of large size is difficult,
a`2 a`3 a`4 ... a`1
researchers keep working on finding ways to solve these
Notice that to generate these matrices, we only need large-scale inverse problems efficiently. One possible way
the first row of each matrix A` , a` = (a`1 , . . . , a`p ), so we is by developing multilevel methods. The main idea of a
do not need to store every entry of the matrix A, only the multilevel method is to define a sequence of systems of
vectors that define the matrices A` . We might never need equations decreasing in size,
to build the matrix A either to get a regularized solution.
Furthermore, BCCB matrices are diagonalizable by the A(n) x(n) = b(n) , 0 ≤ n ≤ L,
two-dimensional Discrete Fourier Transform (DFT) with where the superscript n denotes the n-th level (n = 0 be-
eigenvalues given by the Fourier transform of the first col- ing the original system), and to get an approximate solu-
umn of A. For instance, the Tikhonov solution can be tion, a correction, or some information at each level where
written as the computational cost would be smaller than solving the
  
conj(â) original system. At each level, the right-hand side is de-
Xλ = IDFT 2
DFT(B) , fined by
|â| + λ1
where Xλ and B are the matrices representing xλ and b(n+1) = R(n) b(n)
b, respectively (i.e., xλ = vec(Xλ ) and b = vec(B)), and the matrix by
â = mDFT(as ), conj(·) denotes the componentwise com-
plex conjugate, |â|2 is the m×m matrix whose entries are A(n+1) = R(n) A(n) P(n) ,
the squared magnitudes of the complex entries of â, 1 is where R(n) is called the restriction operator and P(n) is
a matrix of all ones, and denotes component-wise mul- the interpolation operator.
tiplication between matrices. Notice that IDFT stands for There are many ways of defining and using this se-
Inverse Discrete Fourier Transform. For more details, we quence of systems of equations to solve many different
direct the reader to [20, Chapter 5]. mathematical problems. To learn more about multilevel
methods for image deblurring applications, we recom-
8.3. Separable Blur Operator. Another special case is
mend reference [3, 6, 9, 17]
when the blurring operator A is separable, which means
that the blurring can be separated in the horizontal and
9.1. Wavelet-based approach. Here, we consider the
vertical directions as illustrated in Equation (1) that is, the
use of wavelet transforms as restriction and interpola-
matrix A can be written as
tion operators. In particular, we will work with the Haar
A = Ar ⊗ Ac . Wavelet Transform (HWT) because, as we will see, it
keeps the structures of the matrices involved. The one-
If we have the SVDs Ar = Ur Σr VrT and Ac =
dimensional HWT is the p × p orthonormal matrix W
Uc Σc VcT , we basically have the SVD of A by writing
defined by
A = Ar ⊗ Ac (6)  1 1 0 0 ... ... 0 0 
0 0 1 1 ... ... 0 0
= (Ur Σr VrT ) ⊗ (Uc Σc VcT ) (7)  .. .. .. .. . . . . .. ..  
1  . . . . . .. .  
T W1
W = √  01 −1
= (Ur ⊗ Uc )(Σr ⊗ Σc )(Vr ⊗ Vc ) , (8)  0 0 0 ... ... 1 1 
0 0 ... ... 0 0 = ,
W2

2
 0. 0. 1. −1 ... ... 0 0 
except that the elements in the diagonal matrix Σr ⊗ Σc
.. .. .. ... . . . . .. ..

might not be in the decreasing order, and therefore some . .. .
0 0 0 0 ... ... 1 −1
9
where W1 and W2 have dimension p/2 × p. The two-
dimensional HWT can be defined in terms of the one-
dimensional HWT by W2D = W ⊗ W ∈ Rm×m
with m = p2 . So, we define the restriction operator by
R = W1 ⊗ W1 ∈ R(m/4)×m and the interpolation oper-
ator by P = RT .
To motivate the use of HWT as a restriction operator,
on the left of Figure 19, we can see the coarser version of
xtrue defined by x(1) = Rxtrue . This is a 32 × 32 image
that still shows basically the same information as the orig-
inal 64 × 64 image with the letter H. So, reconstructing a F IGURE 20. The reconstructed coarse
coarse version could be enough for some tasks. Another images xreg of size 32 × 32, obtained
reason is that, we actually do this many times when com- using the optimal regularization param-
pressing images to save storage space in the computer. So eter and the discretization of the first de-
imagine getting the image b, and before transmitting it, rivative operator with the 2-norm regu-
we compute its coarser version b(1) = Rb and transfer larization on the left, and TV regular-
just b(1) (see right of Figure 19). The image b(1) might ization on the right.
have enough information to recover x(1) . A similar result can be shown for circulant matrices.
Corollary 1. Let C be a p × p circulant matrix and
p = 2s . Then, the 2s−1 × 2s−1 matrix C1 = W1 CW1T
is also circulant.
Let us consider the case when A = Ar ⊗Ac ∈ Rm×m .
Then, by property of the Kronecker product we have that
A(1) = (W1 ⊗ W1 )(Ar ⊗ Ac )(W1 ⊗ W1 )T
= (W1 Ar W1T ) ⊗ (W1 Ac W1T )
= A(1) (1)
r ⊗ Ac .

The matrix A(1) ∈ R(m/4)×(m/4) is separable. Fur-


thermore, by Theorem 1, if Ar and Ac are Toeplitz ma-
(1) (1)
trices, then Ar and Ac are too. Applying this same
F IGURE 19. The compressed true im- argument again and again, we obtain that A(n) is separa-
age x(1) on the left and blurred and ble by two Toeplitz matrices and therefore BTTB for all
noisy image b(1) on the right. levels n = 0, . . . , L. Similarly, by Corollary 1, A(n) is
The question now is what is the right blurring matrix separable by two circular matrices and therefore BCCB
to recover x(1) from b(1) . Using our example, we will for all levels n = 0, . . . , L. Therefore, the initial structure
show that using A(1) = RAP does a good job. Figure of the matrix is inherited to all the levels.
20 shows the Tikhonov and TV solutions of the system For the case when we have BCCB structures, we can
A(1) x = b(1) . Notice that we are solving a system with a solve the corresponding Tikhonov systems at all levels us-
1, 024 × 1, 024 matrix instead of the original system with ing Fourier-based methods. If we are dealing with sepa-
a 4, 096 × 4, 096 matrix. rable matrices with Toeplitz structure, we could go down
In the following theorems, we want to show that the use several levels until we can compute the SVD of Ar =
(n)
of HWT keeps the nice structures of matrices mentioned (n) (n) (n) T (n) (n) (n) (n) T
Ur Σr (Vr ) and Ac = Uc Σc (Vc ) , and
before.
use that
A(n) = A(n)
r ⊗ Ac
(n)
Theorem 1. [9] Let T be a p × p matrix with Toeplitz
(n) (n) (n)
structure and with Toeplitz vector t, and p = 2s . Then, = (U(n) (n) T
r Σr (Vr ) ) ⊗ (Uc Σc (Vc ) )
(n) T

s−1 s−1 1 T
the 2 ×2 matrix T = W1 TW1 is also Toeplitz = (U(n) (n) (n) (n) (n) (n) T
r ⊗ Uc )(Σr ⊗ Σc )(Vr ⊗ Vc ) .
with Toeplitz vector t1 = t̃(1 : 2 : 2p − 3), where t̃ = T̃t
with T̃ being the Toeplitz matrix with Toeplitz vector with This gives us basically the SVD of A(n) , except that the
all zeros and t0 = t−2 = 1/2 and t−1 = 1. elements in the diagonal matrix Σ(n)
r ⊗ Σ(n)
c might not
10
be in the decreasing order, and therefore some reordering linear operators, and large-scale computation. The visual
would be needed. nature of the problem provides compelling motivation for
There are also other efficient methods such as com- students and allows the efficacy of various techniques to
puting the closest BCCB matrix and using it as precon- be easily assessed. The references [13, 14, 20] contain
ditioner (see [20, Chapter 4] or [14] for more details). excellent introductions to this subject. Furthermore, the
code to generate the figures appearing in this article is
available at https://github.com/ipiasu/AMS_
10. C ONCLUSIONS AND OUTLOOK
Notices_AEP.
Our intention has been to introduce readers to the prob- In addition, this field is an active area of research that
lem of image deblurring, the mathematical issues that aligns with the recent developments in machine learning
arise, and a few techniques for addressing them. As men- and convolutional neural networks. Many classical tech-
tioned in the introduction, these techniques may be ap- niques that have been traditionally used in image deblur-
plied to a wide range of related problems. Indeed, we ring can serve as a tool to speed up the computations of
outlined a number of alternative strategies, for example, the more recent methods that aim to learn models and pa-
in choosing appropriate boundary conditions or in select- rameters when training data are available or train a net-
ing the best regularization parameter, as some strategies work to classify images. More particularly, we mentioned
are better suited to a specific range of applications. earlier in the manuscript that Tikhonov regularization is
We believe this subject is accessible to both undergrad- related to ridge regression, a fundamental tool in machine
uate and graduate students and can serve as a good intro- learning. The connection to machine learning goes much
duction to inverse problems, working with ill-conditioned deeper.

11. ACKNOWLEDGMENTS
This article will appear on the AMS Notices.

R EFERENCES
[1] Harbir Antil, Zichao Wendy Di, and Ratna Khatri. Bilevel optimization, deep learning and fractional laplacian regularization with applications
in tomography. Inverse Problems, 36(6):064001, 2020.
[2] Emmanuel J Candes, Justin K Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communi-
cations on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8):1207–1223, 2006.
[3] Raymond H Chan and Ke Chen. A multilevel algorithm for simultaneously denoising and deblurring images. SIAM Journal on Scientific
Computing, 32(2):1043–1063, 2010.
[4] Julianne Chung, Matthias Chung, Silvia Gazzola, and Mirjeta Pasha. Efficient learning methods for large-scale optimal inversion design. arXiv
preprint arXiv:2110.02720, 2021.
[5] Julianne Chung and Malena I Español. Learning regularization parameters for general-form tikhonov. Inverse Problems, 33(7):074004, 2017.
[6] Marco Donatelli and Stefano Serra-Capizzano. On the regularizing power of multigrid-type algorithms. SIAM Journal on Scientific Computing,
27(6):2053–2076, 2006.
[7] Vijayan Ellappan and Vishal Chopra. Reconstruction of noisy and blurred images using blur kernel. In IOP Conference Series: Materials
Science and Engineering, volume 263, page 042024. IOP Publishing, 2017.
[8] Heinz Werner Engl. Discrepancy principles for tikhonov regularization of ill-posed problems leading to optimal convergence rates. Journal of
optimization theory and applications, 52(2):209–215, 1987.
[9] Malena I Español and Misha E Kilmer. Multilevel approach for signal restoration problems with toeplitz matrices. SIAM Journal on Scientific
Computing, 32(1):299–319, 2010.
[10] Gene H Golub, Michael Heath, and Grace Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technomet-
rics, 21(2):215–223, 1979.
[11] Gene H Golub and Charles F Van Loan. Matrix Computations, 4th ed. Johns Hopkins University Press, Baltimore, 2013.
[12] Per Christian Hansen. Regularization tools: a matlab package for analysis and solution of discrete ill-posed problems. Numerical algorithms,
6(1):1–35, 1994.
[13] Per Christian Hansen. Discrete inverse problems: insight and algorithms. SIAM, 2010.
[14] Per Christian Hansen, James G Nagy, and Dianne P O’leary. Deblurring images: matrices, spectra, and filtering. SIAM, 2006.
[15] Per Christian Hansen and Dianne Prost O’Leary. The use of the l-curve in the regularization of discrete ill-posed problems. SIAM journal on
scientific computing, 14(6):1487–1503, 1993.
[16] Michael Lustig, David L Donoho, Juan M Santos, and John M Pauly. Compressed sensing MRI. IEEE signal processing magazine, 25(2):72–
82, 2008.
[17] Serena Morigi, Lothar Reichel, Fiorella Sgallari, and Andriy Shyshkov. Cascadic multiresolution methods for image deblurring. SIAM Journal
on Imaging Sciences, 1(1):51–74, 2008.
[18] Mirjeta Pasha, Arvind K Saibaba, Silvia Gazzola, Malena I Espanol, and Eric de Sturler. Efficient edge-preserving methods for dynamic
inverse problems. arXiv preprint arXiv:2107.05727, 2021.
11
[19] Rosemary A Renaut, Saeed Vatankhah, and Vahid E Ardestani. Hybrid and iteratively reweighted regularization by unbiased predictive risk
and weighted GCV for projected systems. 39(2):B221–B243, 2017.
[20] Curtis R Vogel. Computational methods for inverse problems. SIAM, 2002.
1 D EPARTMENT OF M ATHEMATICS , G RAND VALLEY S TATE U NIVERSITY, A LLENDALE , MI.
2 S CHOOL OF M ATHEMATICAL AND S TATISTICAL S CIENCES , A RIZONA S TATE U NIVERSITY, T EMPE , AZ.
E-mail addresses: austind@gvsu.edu, malena.espanol@asu.edu, mpasha@asu.edu

12

You might also like