Tikhonov Regularization PDF

Regularization Methods
An Applied Mathematician’s Perspective

Curt Vogel
Montana State University
Department of Mathematical Sciences
Bozeman, MT 59717-2400
SAMSI Opening Workshop – p.1/33

Outline
Well (and Ill-) Posedness
Regularization
Optimization approach (Tikhonov)
Bayesian connection (MAP)
Filtering approach
Iterative approach
Regularization Parameter Selection
Applied Math Wish List
Focus on linear problems.

Well-Posedness
Definition due to Hadamard, 1915: Given mapping
, equation

is well-posed provided
(Existence) For each , such that ;

(Uniqueness) ; and

(Stability) is continuous.
Equation is ill-posed if it is not well-posed.

Linear, Finite-Dimensional Case
( matrix).

exists

det

well-posed

..

.
Existence imposed by considering least squares solutions
-
--
-
('&
%$

#"

!
/
+,
*
)
Uniqueness imposed by taking the min norm least squares

solution
5
3-
0 -
! -
-4
('&
%$
0
#"
/

1!

Infinite-Dimensional Example

(Compact) diagonal operator on (Hilbert) space
6
<

8
9
8 7
@

: 8
/ A

: 8
=
:
:
/
/
/
/
/
/

?>=

Define by
6
6

8
8

8
B:
:
/
/
/
:
:
/
/
/
/

Formal (unbounded) inverse is

9C 7
;
B

C
C

:
:
:
/
/
/
:
/
/
/
:
so we have uniqueness (and existence of solutions for
certain ).


Don’t have stability!
Take
;

D
I :

:
/
/
/
:
:
:
/
/
/
GF
K H
J
Then , but

--
--
/ A

Also don’t have existence of solns to for all .

7
;
E.g., , but
D
DB
D
EL
D
D
D

:
:
/
/
/
:
:
:
/
/
/

7
.
6
D
D
D
:
:
:
/
/
/

Does this matter?
Example was contrived.
Practical computations are discrete, finite dimensional.

Can replace (finite dimensional) by pseudo-inverse
5
But ...
Discrete problems approximate underlying infinite

dimensional problems (Discrete problems become
increasingly ill-conditioned as they become more
accurate).
In Inverse Problems applications is often compact,
and it acts like the diagonal operator in the above
example (Compact operators can be diagonalized using
the SVD; diagonal entries decay to zero). SAMSI Opening Workshop – p.7/33
Regularization
Remedy for ill-posedness (or ill-conditioning, in discrete
case).
Informal Definition: “Imposes stability on an ill-posed
problem in a manner that yields accurate approximate
solutions, often by incorporating prior information”.
More Formal Definition: Parametric family of “approximate
inverse operators” with the following property. If
NOM

, and , we can pick parameters

WVU
QP
X

T

RS

such that
TZ
Y
N M

P
/T
N
RS
,

Tikhonov Regularization
Math Interpretation. In simplest case, assume are

:
Hilbert spaces. To obtain regularized soln to ,

choose to fit data in least-squares sense, but penalize

solutions of large norm. Solve minimization problem
-
--
--
X -
-
('&
U
%$

#"

[
N
[
*
)
;
]
]
7
^
U
/
X

I
G
H
` _
a
is called the regularization parameter.

b
X

Geometry of Linear Least Squares
d d
d ed d
d ed d
d e
d ed d
d e
d e
d e
d e
d f
d f
d f
d f
d e
d e
d e
d e
h heeh h
k e k k
D
D
8
d e
d e
d e
d e
k e
/
7
;
c
U

d e
d e
d e
d e
e
e

D
d e
d e
d e
d e
8
/
/
def
def
def
def
F
I
F
GF
GF
I
G
g H
H
)
j
2
Contours for ||Ax−y|| Contours for ||Ax−(y−η)||2
2 2
1.8 1.8
1.6 1.6
1.4 1.4
1.2 1.2
1 1
y
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x x
Geometry of Tikhonov Regularization
;-
--
7
;
--
-
-
c

D
VU
U

/
F
I
G
H
N
Contours for ||Ax−y||2 + α*||x||2 Contours for ||Ax−(y+η)||2 + α*||x||2
2 2
1.8 1.8
1.6 1.6
1.4 1.4
1.2 1.2
1 1
y
y
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x x

Bayesian Interpretation (MAP)
Bayes Law: Assume are jointly distributed continuous

:
random variables.
7
-
;
7
-
7
;
7
;

l

GF
GF
v I
GF
pr GF
I
H
u = H
H
rs
rs
s
=
r
=
=
QpqJ
nJ
q
m
nq
mno
nq
tn

mn
)
Maximum a posteriori (MAP) extimator is max of posterior
pdf. Equivalently, minimize w.r.t.

7
-
;
7
-
7
;
yxw
yxw
yxw
$

l
l

.
.
F
I
v G
v H
H
nz v
p{
n =K
nz v
=
=
n
q
m
nq
First term on rhs is “fit-to-data” term; second is
“regularization” term.

If

If |
}

c
7

; l
7 VU
-

Normal
-- l 7
; 7

and ; :
V ~
B7
|
. ^
B7 ;
- l
- ~ D l
j
; ~
D
Tikhonov cost functional is

U ;

X -
-
Normal
7
--
:

:
, then prior is
. - ~
- j
X ^ . -
; -

--
~ ~ .
j BE
-
-
~
Illustrative Example
BE

~
j
SNR

/
, conditional pdf is

Singular Value Decomposition
Important tool for analysis and computation. Gives
bi-orthogonal diagonalization of linear operator,
]

/

9
7
;
In matrix case, , diag ,

:

:

:
/
/
/
:
/
/
/

and with

:

:
/
/
/

/
/
/
:
]

=
=
=
=
=
:
:
]
]

^

^

: 0
:

=
: =
:
Tikhonov Filtering
In the case of Tikhonov regularization, using the SVD
(and assuming matrix with for
]

b

=
simplicity),
;
]
]
7
N M
^
U
X

;
]
]
7

X
^

U

;
]
]
7

U
X

D
=
]
7
;
diag

X
=
=
GF
I
H

`
o
97
= ;
If , then , so

D

X
N
TZ
Y
5
]
7
E
9
;
diag as
N M

D

X
=
/
Tikhonov Filtering, Continued

7
;
Plot of Tikhonov filter function shows that
o
N
o

N
Tikhonov regularization filters out singular components that
are small (relative to ) while retaining components that are
X
large.
Filter Functions for TSVD and Tikhonov Regularization
0.8
w (s2)
0.6
α
0.4
0.2
0
−5 −4 −3 −2 −1 0 1
10 10 10 10 10 10 10
s2

Truncated SVD (TSVD) Regularization
TSVD filtering function is

: X
¡!

7
=
:
= ;
N

D
/ X
=
:
Has “sharp cut-off” behavior instead of “smooth roll-off
behavior” of Tikhonov filter.

Iterative Regularization
Certain iterative methods, e.g., steepest descent, conjugate
gradients, and Richardson-Lucy (EM), have regularizing
effects with the regularization parameter equal to the
number of iterations. These are useful in applications, like
3-D imaging, with many unknowns.
An example is Landweber iteration, a variant of steepest
descent. Minimize the least squares fit-to-data functional
-
7
;
--
-
c

B
£
using gradient descent iteration, initial guess , and

--
E-
-
fixed step length parameter .

D
0¤@

Landweber Iteration
{
{
7
;
grad
¥
c

DB

.¤

:
:
:
:
/
/
/
{
{
]
7
;

.¤

.
{
]
]
7
;
^
¤U

.¤

= ;{

]
7
;
diag

D
/
.¤

.
F
I
G
H
s
s
r
pq ¨
v
§u¦
qQpJ =

p

Landweber Filter Functions for k=10(−) and k=100(−−)
0.8
w (s )
2
0.6
k
0.4
0.2
0
−4 −3 −2 −1 0 1
10 10 10 10 10 10
s2
Effect of Regularization Parameter
Illustrative Example: 1-D deconvolution with Gaussian
. ª

«ª7
;
BE
;
kernel and discrete data
¬

©

7
P8 ;
ª7
;
noise
®
ª®
D
¯
ª
U
©
/

=
.=
T
RS
:
/
/
/
:
£
True Solution and Discrete Noisy Data

1.2
0.8
0.6
0.4
0.2
−0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SAMSI Opening Workshop – p.20/33
x axis
Graphical Representation of SVD
Singular Values Singular Vectors
Singular Values of K Singular Vector v1 v4
0
10 0.2 0.2
−2
10 0.15 0.1
−4
10 0.1 0
−6
10 0.05 −0.1
−8 0 −0.2
10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x axis x axis
i
σ
−10 v v
10 10 20
0.2 0.3
−12 0.2
10
0.1
0.1
−14 0
10
0
−0.1
−16
10 −0.2
−0.1
−0.3
−18
10 −0.2 −0.4
0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
index i x axis x axis

Tikhonov Solutions vs
;
7
Tikhonov regularized solution is .
°]
^
U
8
X

N
Solution error is .
N ±
8
.P 8
T
N
RS
Norm of Tikhonov Solution Error α = 0.0014384
4 1.2
3.5 1
3 0.8
2.5 0.6
||eα||
xα(t)
2 0.4
1.5 0.2
1 0
0.5 −0.2
−6 −4 −2 0 0 0.2 0.4 0.6 0.8 1
10 10 10 10
α t axis
α = 1e−06 α=1
2 1.2
1.5 1
1 0.8
0.5 0.6
xα(t)
xα(t)
0 0.4
−0.5 0.2
−1 0
−1.5 −0.2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
t axis t axis

N ·
Y
TZ
N ²

N Y
TZ

Solution Error:
N
. N
Predictive Error:
P
. QP
N M
O
RS °
Regularized Solution:
T RS
°
T

F 7
N ²
N M QP
diag
7 RS
³ G
7 uo ¨ T
. N
Linear, Additive Noise Data Model:
H VU
´´= ^ 7
N M
O QP ;
RS = ;
. E
9
^ I T
; ;
Error Indicators
U =
QP ¶µ³
uq
G ]°
RS F NOM
u V
T H
= I
U tp
´´
NOV M
Unbiased Predictive Risk Estimator
The Influence Matrix is
TZ
Y
7
;
¸
N M
O
X
:
so we can write the predictive error as
7
;
7
;
¸
¸
U
N ·
QP
X
XV

T
RS
Residual is
TZ
Y
7
;
7
;
V ;
. °
^
U
N ¹

ºP
X
X

.
T
N
RS
I
G
qQpJ H
p

»
Let denote expected value operator. Assume is
¼
P
T
RS
V ¼7
;
deterministic (or independent of ), assume , and

V

7
;
note that is symmetric. Then ...

¸
X

¼-
-
N ¹
--
So up to const

7 ¼
X V -- .
; B
-- --
V ¼ F
Y 7
Last equality follows if

V TZ V ¼
¸
= 7
-- -- ] X
;
N ¹ N ¹ ¸
7
-- -- .
XV
; ^
;
~
: j UB UB P
: U
½
~ V ¼ ¼- ¿(¾¾ G RS
¯ ¯ j V - ` T --
] - ¾¾ H
-
À : À ¸
7 / U
trace
XV
;
¸ V ¼
UPRE, Continued
7
X ]
;
¸
, an unbiased estimator for
7
-- X
N · V ;
-- I
is

Comments about UPRE
UPRE regularization parameter selection method, also
known as Mallow’s method, is to pick to mimimize
X
¦
7
;
.

X
--
--
Predictive error norm and solution error norm
N ·
--
--
need not have the same minimizer, but the mins

N ²
are often quite close.

There is a variant of UPRE, called generalized cross
validation (GCV), which requires minimization of
--
--
N ¹
TZ
Y
7
;

;
/

;
trace ^
¸
X
.

This does not require prior knowledge of variance .
~
j
Illustrative Example of Indicators
:
7
;
2-D image reconstruction problem, noise N ,
^
V
~
|
j
Tikhonov regularization. o-o indicates soln error norm;
7
;
indicates GCV; – indicates ; and indicates

X
.
.
.
.
predictive error norm.
U(α) is solid line; P(α) is dashed line; GCV(α) is dot−dashed line.
7
10
6
10
5
10
4
10
3
10
2
10
−6 −5 −4 −3 −2 −1 0 1
10 10 10 10 10 10 10 10
Regularization Parameter α
Mathematical Summary
There exists a well-developed mathematical theory of
regularization.
There are a number of different approaches to
regularization.
optimization-based (equivalent to MAP)
filtering-based
iteration-based
There are robust schemes for choosing regularization
parameters.
These techniques often work well in practical applications.

But Things Can Get Ugly ...
Astronomical Imaging Application. Light intensity
Â Ã
Ã
Â Ã
Á97
Á 7
9
Á 7
9
;
Á ®
®
^
.Á
/ Â

.
:
:
F
I
F
I
G
ÆÅ H
H
Ä
¨
n
tJ
p
This is measured by a ccd array (digital camera), giving
data
Á 7
9
= ;
“noise”
®
U
Â

=
: =

/
For high contrast imaging (dim object near very bright
object), accurate modeling of noise is critical.
With ordinary (and even weighted) least squares, dim
object is missed.

Model for Data from CCD Array
7
;
®
D
¯
U
= WVU
Ç

=
=
:
:
/
/
/
:
Photon count for “signal”
7
| ;
Á97
= Ê;

Poisson
7É

Ç
Â

=
: =
: =

=
/
Background photon count
7
;
Poisson fixed, known
| È
È
=
/
Instrument “read noise”

7
N fixed, known

V
~
|
=
/
Imaging Example, Continued
°7
Ì-
;
Log likelihood ( ) is messy
yxw
$

Ë
Ï
Ð

<
È
Ì7

;

= U
±

Ï
¨
ÓÒ
g
r
Ì7
;
°
yxw
Í

±
$
Î

.
ÀÑ

>=
>£
Light source (object) intensity is nonnegative.
«ª7
;
Constraint .

8
With “pixel” discretization, dimension is very large, e.g.,

Ö
7
;
°7
;
size size or more.

B
ÔÕ

Problem is ill-posed. Need regularization (prior), e.g., Ö

X -
× -

--

b
X
:
/
Regularization parameter (strength of prior) is unknown.
Applied Mathematician’s Wish List
Optimization-based regularization methods (Tikhonov,
MAP) require soln of minimization problems. Need fast,
robust, large-scale, nonlinear constrained numerical
optimization techniques.
When the parameter-to-observation map is nonlinear,
regularization functionals may be non-convex. Need
optimization methods which yield the global minimizer
(not just a local min) and are fast, robust, ....
Need indicators of reliability (e.g., confidence intervals)
for regularized solutions.
Need good priors.
Need fast, robust schemes for choosing regularization
parameters.
Challenge for the Statistics Community
Can MCMC techniques provide fast, robust alternatives
to optimization-based regularization methods?
Relevant Reference: J. Kaipio, et al, "Statistical inversion

and Monte Carlo sampling methods in electrical impedance
tomography", Inverse Problems, vol 16 (2000), pp.
1487-1522.
Relevant Caveat: There is no free lunch.

Tikhonov Regularization PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tikhonov Regularization PDF

Uploaded by

Copyright:

Available Formats

Regularization Methods

An Applied Mathematician’s Perspective

SAMSI Opening Workshop – p.1/33

Focus on linear problems.

SAMSI Opening Workshop – p.2/33

(Existence) For each , such that ;

Equation is ill-posed if it is not well-posed.

SAMSI Opening Workshop – p.3/33

Existence imposed by considering least squares solutions

Uniqueness imposed by taking the min norm least squares

SAMSI Opening Workshop – p.4/33

SAMSI Opening Workshop – p.5/33

Also don’t have existence of solns to for all .

SAMSI Opening Workshop – p.6/33

Discrete problems approximate underlying infinite

, and , we can pick parameters

SAMSI Opening Workshop – p.8/33

SAMSI Opening Workshop – p.9/33

SAMSI Opening Workshop – p.11/33

SAMSI Opening Workshop – p.12/33

Tikhonov cost functional is

SAMSI Opening Workshop – p.13/33

SAMSI Opening Workshop – p.16/33

SAMSI Opening Workshop – p.17/33

SAMSI Opening Workshop – p.18/33

True Solution and Discrete Noisy Data

SAMSI Opening Workshop – p.21/33

SAMSI Opening Workshop – p.22/33

note that is symmetric. Then ...

SAMSI Opening Workshop – p.24/33

Last equality follows if

SAMSI Opening Workshop – p.25/33

need not have the same minimizer, but the mins

are often quite close.

These techniques often work well in practical applications.

SAMSI Opening Workshop – p.28/33

SAMSI Opening Workshop – p.29/33

With “pixel” discretization, dimension is very large, e.g.,

size size or more.

Problem is ill-posed. Need regularization (prior), e.g., Ö

Relevant Reference: J. Kaipio, et al, "Statistical inversion

Relevant Caveat: There is no free lunch.

SAMSI Opening Workshop – p.33/33

You might also like