You are on page 1of 33

Regularization Methods

An Applied Mathematician’s Perspective


Curt Vogel
Montana State University
Department of Mathematical Sciences
Bozeman, MT 59717-2400

SAMSI Opening Workshop – p.1/33


Outline
Well (and Ill-) Posedness
Regularization
Optimization approach (Tikhonov)
Bayesian connection (MAP)
Filtering approach
Iterative approach
Regularization Parameter Selection
Applied Math Wish List

Focus on linear problems.

SAMSI Opening Workshop – p.2/33


Well-Posedness
Definition due to Hadamard, 1915: Given mapping
, equation










is well-posed provided

(Existence) For each , such that ;






(Uniqueness) ; and






 





(Stability) is continuous.

Equation is ill-posed if it is not well-posed.

SAMSI Opening Workshop – p.3/33


Linear, Finite-Dimensional Case
( matrix).















exists

 
  
 
det

 







well-posed





 
 
..

 
  
.

Existence imposed by considering least squares solutions

-
--

-
('&
%$


#"



!

/
+,
*
)

Uniqueness imposed by taking the min norm least squares


solution

5
3-
0 -

! -
-4
('&
%$
0

#"

/



1!

SAMSI Opening Workshop – p.4/33


Infinite-Dimensional Example


(Compact) diagonal operator on (Hilbert) space

6
<



8 
9
8 7

@




: 8

/ A


: 8

=
:
:
/
/
/

/
/
/


?>=


 

Define by
6

6


8
8





8

B:

:
/
/
/
:

:
/
/
/

/

Formal (unbounded) inverse is


9C 7

;
B

C

C


:
:

:
/
/
/
:

/
/
/
:
so we have uniqueness (and existence of solutions for
certain ).

SAMSI Opening Workshop – p.5/33


Don’t have stability!
Take

;



D

I : 



:
/
/
/
:
:

:
/
/
/
GF

K H
J
Then , but






--

--

/ A



Also don’t have existence of solns to for all .





7

;
E.g., , but
D
DB

D
EL

D
D
D




:

:
/
/
/

:
:
:
/
/
/


7

.
6
D
D
D
:
:
:
/
/
/

SAMSI Opening Workshop – p.6/33


Does this matter?
Example was contrived.
Practical computations are discrete, finite dimensional.


Can replace (finite dimensional) by pseudo-inverse
5

But ...

Discrete problems approximate underlying infinite


dimensional problems (Discrete problems become
increasingly ill-conditioned as they become more
accurate).
In Inverse Problems applications is often compact,
and it acts like the diagonal operator in the above
example (Compact operators can be diagonalized using
the SVD; diagonal entries decay to zero). SAMSI Opening Workshop – p.7/33
Regularization
Remedy for ill-posedness (or ill-conditioning, in discrete
case).
Informal Definition: “Imposes stability on an ill-posed
problem in a manner that yields accurate approximate
solutions, often by incorporating prior information”.
More Formal Definition: Parametric family of “approximate
inverse operators” with the following property. If
NOM





, and , we can pick parameters



WVU

QP

X


T


RS


such that
 TZ
Y

N M




P

/T
N

RS
,

SAMSI Opening Workshop – p.8/33


Tikhonov Regularization
Math Interpretation. In simplest case, assume are


:
Hilbert spaces. To obtain regularized soln to ,



choose to fit data in least-squares sense, but penalize



solutions of large norm. Solve minimization problem

-

-- 
--

X -
 -
('&

U
%$


#"




[
N

[
*
)

;
]

]
7

^
U

/
X


I
G

H
` _
a
is called the regularization parameter.

b
X

SAMSI Opening Workshop – p.9/33


Geometry of Linear Least Squares

d d
d ed d
d ed d

d e
d ed d
d e
d e

d e
d e
d f
d f

d f
d f
d e
d e

d e
d e
h heeh h

k e k k
D

D
8
d e
d e

d e
d e
k e
/
 7
;
c

U

d e
d e

d e
d e
e

e


D
d e
d e

d e
d e
8
/

/
def
def

def
def
F

I
F

GF

GF

I
G
g H

H
)

j
2
Contours for ||Ax−y|| Contours for ||Ax−(y−η)||2
2 2

1.8 1.8

1.6 1.6

1.4 1.4

1.2 1.2

1 1
y

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x x
SAMSI Opening Workshop – p.10/33
Geometry of Tikhonov Regularization

;- 

-- 
 7
;

--

-
 -
c


D
VU

U


/
F

I
G

H
N
Contours for ||Ax−y||2 + α*||x||2 Contours for ||Ax−(y+η)||2 + α*||x||2
2 2

1.8 1.8

1.6 1.6

1.4 1.4

1.2 1.2

1 1
y

y
0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x x

SAMSI Opening Workshop – p.11/33


Bayesian Interpretation (MAP)
Bayes Law: Assume are jointly distributed continuous


:
random variables.

 7
-
;

7
-

 7
;

7
;

l

GF

GF

v I

GF

pr GF

I
H

u = H

H
rs

rs

s
=
r
=

=
QpqJ

nJ

q
m
nq
mno

nq

tn


mn

)
Maximum a posteriori (MAP) extimator is max of posterior
pdf. Equivalently, minimize w.r.t.


 7
-
;

7
-

 7
;
yxw

yxw

yxw
$


l

l

.

.
F

I
v G

v H

H
nz v

p{

n =K

nz v
=

=
n

q
m
nq
First term on rhs is “fit-to-data” term; second is
“regularization” term.

SAMSI Opening Workshop – p.12/33


If

If | 
}


c
 7

; l
7 VU
 -

Normal

-- l 7
;  7

 and ; : 
 V ~
B7
| 
. ^
B7 ;
- l 
- ~ D  l 
j
; ~
 D

Tikhonov cost functional is


U € ;

X - €
 - ‚
Normal

7 ‚
--  ƒ 
„ : 
ƒ
:
, then prior is

. - ~ „
- j
X ^ . -
 ;  -

   -- 
~ ~ .
j BE
-  
-
 ~
Illustrative Example

BE
…

~
j
SNR

  …

/
, conditional pdf is

SAMSI Opening Workshop – p.13/33


Singular Value Decomposition
Important tool for analysis and computation. Gives
bi-orthogonal diagonalization of linear operator,

]
ˆ‡
†


/
Š ‰


9
Π7

;
In matrix case, , diag ,

‡
†


: Š


: Œ






:
/
/
/

:
/
/
/

 ‰

and with
ˆ

: 



:
/
/
/


Œ Ž

Ž
Œ

Œ

/
/
/

:
Š ]
Š





=

=
=

=
=
:

:
]

ˆ]
Š 

 

‘
’

^
ˆ

^

: 0Š

: 



=

: =

:
SAMSI Opening Workshop – p.14/33
Tikhonov Filtering
In the case of Tikhonov regularization, using the SVD
(and assuming matrix with for
]
ˆ‡
†


b



Œ


=
simplicity),

;
]

]
7
N M

^
U
X


;
]

]
ˆ7

ˆ‡

‡
†

X ˆ

†
U


;
]

]
7
‡
‡

‡
ˆ

†
U
X


Π

D
=

]
7

;
diag
ˆ

†


Π

Œ
X

=
=
GF

I
H
– •
—
”
˜

o
Œ97 

= ;

If , then , so


D



X

N ™

 TZ
Y

5
]
7
ΠE
9
;

diag as
N M

†
D





X
=

/
SAMSI Opening Workshop – p.15/33
Tikhonov Filtering, Continued

•
œ
š›

Π7

;
Plot of Tikhonov filter function shows that

o
N ™

o 
Ÿž•
N
Tikhonov regularization filters out singular components that
are small (relative to ) while retaining components that are

X
large.
Filter Functions for TSVD and Tikhonov Regularization

0.8
w (s2)

0.6
α

0.4

0.2

0
−5 −4 −3 −2 −1 0 1
10 10 10 10 10 10 10
s2

SAMSI Opening Workshop – p.16/33


Truncated SVD (TSVD) Regularization
TSVD filtering function is

Π


: X
 ¡!
š

Π7

=
:
= ;
N ™

Π
D

/ X
=
:
Has “sharp cut-off” behavior instead of “smooth roll-off
behavior” of Tikhonov filter.

SAMSI Opening Workshop – p.17/33


Iterative Regularization
Certain iterative methods, e.g., steepest descent, conjugate
gradients, and Richardson-Lucy (EM), have regularizing
effects with the regularization parameter equal to the
number of iterations. These are useful in applications, like
3-D imaging, with many unknowns.
An example is Landweber iteration, a variant of steepest
descent. Minimize the least squares fit-to-data functional

-
 7
;

--

-
c


B 

£
using gradient descent iteration, initial guess , and





-- 
E-
-
fixed step length parameter .


D
0¤@

SAMSI Opening Workshop – p.18/33


Landweber Iteration

ž {

{
 7

;
grad

¥
c


DB






:

:
:
:
/
/
/
{

{
]
7

;








.
{
]

]
7

 ;
^

¤U





= ;{
Π

]
7

;
ˆ diag

†
D

/



.
F

I
G

H
s

s
r

pq ¨

v
§u¦

qQpJ =

“p


Landweber Filter Functions for k=10(−) and k=100(−−)

0.8
w (s )
2

0.6
k

0.4

0.2

0
−4 −3 −2 −1 0 1
10 10 10 10 10 10
SAMSI Opening Workshop – p.19/33
s2
Effect of Regularization Parameter
Illustrative Example: 1-D deconvolution with Gaussian

. ª 


ǻ7
;

BE

;
kernel and discrete data

¬
‚
©

­


Π7

P8 ;

ª7
;
noise
®

ª®

D
¯
ª

U
©

/ 



=

.=

T
RS

:
/
/
/
:
£

True Solution and Discrete Noisy Data


1.2

0.8

0.6

0.4

0.2

−0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SAMSI Opening Workshop – p.20/33
x axis
Graphical Representation of SVD
Singular Values Singular Vectors
Singular Values of K Singular Vector v1 v4
0
10 0.2 0.2

−2
10 0.15 0.1

−4
10 0.1 0

−6
10 0.05 −0.1

−8 0 −0.2
10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x axis x axis
i
σ

−10 v v
10 10 20
0.2 0.3

−12 0.2
10
0.1
0.1

−14 0
10
0
−0.1
−16
10 −0.2
−0.1
−0.3
−18
10 −0.2 −0.4
0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
index i x axis x axis

SAMSI Opening Workshop – p.21/33


Tikhonov Solutions vs

 ;
7
Tikhonov regularized solution is .

°]
^
U
8

X

N
Solution error is .

N ±

8

.P 8

T
N

RS
Norm of Tikhonov Solution Error α = 0.0014384
4 1.2

3.5 1

3 0.8

2.5 0.6
||eα||

xα(t)
2 0.4

1.5 0.2

1 0

0.5 −0.2
−6 −4 −2 0 0 0.2 0.4 0.6 0.8 1
10 10 10 10
α t axis

α = 1e−06 α=1
2 1.2

1.5 1

1 0.8

0.5 0.6
xα(t)

xα(t)
0 0.4

−0.5 0.2

−1 0

−1.5 −0.2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
t axis t axis

SAMSI Opening Workshop – p.22/33


N ·

Y
 TZ
N ²

N Y
 TZ 


Solution Error:
N
. N

Predictive Error:
P 
. QP
N M
O
RS °
Regularized Solution:

T RS
°
T
 
 ˆ

F 7
N ²
N M QP
diag

 7 RS
³ G
7 uo ¨ T
. N ™
Linear, Additive Noise Data Model:

H VU
´´= ^ Œ 7
N M
O QP ;

RS = ;
. ΠE
9
^ I T
; ;
Error Indicators

U =
QP ¶µ³
uq †
G ]°
RS F NOM
u V
T H
= I
U tp
´´

NOV M
SAMSI Opening Workshop – p.23/33
Unbiased Predictive Risk Estimator
The Influence Matrix is

 TZ
Y
7
;
¸

N M
O
X

:
so we can write the predictive error as

7
;

7
;
¸

¸
U
N ·

QP
X

XV


T
RS
Residual is
 TZ
Y

7
;

7
;

V ;
. °

^
U
N ¹



ºP
X

X


.
T
N

RS

I
G
qQpJ H
p
“

»
Let denote expected value operator. Assume is
¼

P

T
RS
V ¼7
;
deterministic (or independent of ), assume , and


V


7
;

note that is symmetric. Then ...


¸
X

SAMSI Opening Workshop – p.24/33


¼-
-
N ¹
-- 

So up to const
†
7 ¼
X V -- . 
; B
--  --
V ¼‰ F
Y 7

Last equality follows if


V   TZ V ¼‰
¸
= 7
‹ -- -- ] X
;
N ¹ N ¹ ¸
 7
--  --  .
XV
; ^
 ;
 ~ ‹
: j UB UB P
:  U
½
~ V ¼‰ ¼- ¿(¾¾ G RS
¯ ¯ j V - ` T -- 
   ] - ¾¾ H
- •
À : À ¸
7 / U
trace

XV
;
¸ V ¼‰
UPRE, Continued

7 ‹
X ]
;
¸
, an unbiased estimator for

7
-- X 
N · V ;
--  I ‹
is

SAMSI Opening Workshop – p.25/33


Comments about UPRE
UPRE regularization parameter selection method, also
known as Mallow’s method, is to pick to mimimize

X
¦
7
;

.
†
X

--

--
Predictive error norm and solution error norm

N ·
--

--

need not have the same minimizer, but the mins


N ²

are often quite close.


There is a variant of UPRE, called generalized cross
validation (GCV), which requires minimization of

-- 
--
N ¹
 TZ
Y
7
;
ˆ

;

/
‰


trace ^

¸
X
.


This does not require prior knowledge of variance .

~
j
SAMSI Opening Workshop – p.26/33
Illustrative Example of Indicators

: 
7

;
2-D image reconstruction problem, noise N ,

^
V

~
|

j
Tikhonov regularization. o-o indicates soln error norm;

7
;
indicates GCV; – indicates ; and indicates

†
X
.
.

.
.
predictive error norm.
U(α) is solid line; P(α) is dashed line; GCV(α) is dot−dashed line.
7
10

6
10

5
10

4
10

3
10

2
10
−6 −5 −4 −3 −2 −1 0 1
10 10 10 10 10 10 10 10
Regularization Parameter α
SAMSI Opening Workshop – p.27/33
Mathematical Summary
There exists a well-developed mathematical theory of
regularization.
There are a number of different approaches to
regularization.
optimization-based (equivalent to MAP)
filtering-based
iteration-based
There are robust schemes for choosing regularization
parameters.

These techniques often work well in practical applications.

SAMSI Opening Workshop – p.28/33


But Things Can Get Ugly ...
Astronomical Imaging Application. Light intensity

 Ã

Ã
 Ã
Á97

Á 7
9

Á 7
9

;
Á ®

®
^

/ Â


.
:

:
F

I
F

I
G
ÆÅ H

H
Ä

¨

tJ
p
This is measured by a ccd array (digital camera), giving
data
Á 7
9

= ;
“noise”
®

U
Â

=

: =


/
For high contrast imaging (dim object near very bright
object), accurate modeling of noise is critical.
With ordinary (and even weighted) least squares, dim
object is missed.

SAMSI Opening Workshop – p.29/33


Model for Data from CCD Array

 7
;
®

D
¯
U

= WVU
Ç





=

=
:

:
/
/
/
:
Photon count for “signal”
 7
| ;

Á97

= Ê;

‹
Poisson


Ç

Â

=

: =

: =


=
/
Background photon count

7
;
Poisson fixed, known
| È

È
=

/
Instrument “read noise”



7

N fixed, known

V

~
|
=

/
SAMSI Opening Workshop – p.30/33
Imaging Example, Continued

°7
Ì-

;
Log likelihood ( ) is messy

yxw
$


Ë

 Ï

Ð


<

Ȑ
Ì7
‰

;

= U
±

•
”

 Ï

¨˜
ÓҀ
g
– r
Ì7

;
°

yxw
Í


±
$
Î


.

ÀÑ

>=


Light source (object) intensity is nonnegative.
ǻ7
;

Constraint .

Ž
8

With “pixel” discretization, dimension is very large, e.g.,


Ö
 7
;

°7
;

size size or more.


 B

ÔÕ


Problem is ill-posed. Need regularization (prior), e.g., Ö


X -
× -

--


b
X
:

/
Regularization parameter (strength of prior) is unknown.
SAMSI Opening Workshop – p.31/33
Applied Mathematician’s Wish List
Optimization-based regularization methods (Tikhonov,
MAP) require soln of minimization problems. Need fast,
robust, large-scale, nonlinear constrained numerical
optimization techniques.
When the parameter-to-observation map is nonlinear,
regularization functionals may be non-convex. Need
optimization methods which yield the global minimizer
(not just a local min) and are fast, robust, ....
Need indicators of reliability (e.g., confidence intervals)
for regularized solutions.
Need good priors.
Need fast, robust schemes for choosing regularization
parameters.
SAMSI Opening Workshop – p.32/33
Challenge for the Statistics Community
Can MCMC techniques provide fast, robust alternatives
to optimization-based regularization methods?

Relevant Reference: J. Kaipio, et al, "Statistical inversion


and Monte Carlo sampling methods in electrical impedance
tomography", Inverse Problems, vol 16 (2000), pp.
1487-1522.

Relevant Caveat: There is no free lunch.

SAMSI Opening Workshop – p.33/33

You might also like