You are on page 1of 25

Linearized inverse Problems

(Weakly nonlinear problems)

Using a Taylor expansion then away we go...


Linearized inverse problems

Nonlinear inverse problem

dobs,i = gi(m)
Choose a reference model mo and perform a Taylor expansion of g(m)

m = mo + δ m
gi(mo + δ m) = gi(mo) + ∇giδ m + . . .
" #T
∂gi ∂gi
∇gi = , ,...
∂m1 ∂m2

Linearized inverse problem

δ d = dobs − g(mo)

δ d = Gδ m
∂gi
Gi,j =
∂mj 79
Linearized inverse problems

Data prediction error

φ(m) = (d − g(m))T Cd−1(d − g(m))

Linearized problem
δ d = Gδ m
Least squares solution

φ0(δ m) = (δ d − Gδ m)T Cd−1(δ d − Gδ m)

It can be shown that φ0 (m) is a


quadratic approximation to φ(m)
about the reference model mo.

Linearized problems need to


be solved iteratively

δ m = (GT Cd−1G)−1GT Cd−1δ d

−1 −1 T −1
δ mn+1 = (GT
n Cd Gn ) Gn Cd δ dn

80
Linearized inverse problems

Linearization can succeed...

... and linearization can fail.

The starting point for an


iterative procedure can be
all important.

81
Example: Earthquake location
−1 −1 T −1
δ mn+1 = (GT
n Cd G n ) G n Cd δ d n

m = [x, y, z, to]T

d = [tarr,1, tarr,2, . . . , tarr,N ]T

Z
1
tarr,i = to + dl
Ri v(x)

∂gi Derivative of the i th arrival time with


Gi,j = respect to the j th hypocentral co-ordinate
∂mj
82
Example: Earthquake location

Z
1 m = [x, y, z, to]T
tr = dl
R v(x)
d = [t1, t2, . . . , tN ]T
What is the data model parameter relationship ?

Assume homogeneous 3-D Earth model

D(m)
tr =
v
Di(x, y, z)
ti = to +
v
What are the Frechet derivatives ?

∂di
Gi,j =
∂mj
?
−1 −1 T −1
δ mn+1 = (GT
n Cd G n ) G n Cd δ d n

83
Example: Linearized inversion

−1 −1 T −1
δ mn+1 = (GT
n Cd Gn) Gn Cd δ dn 84
Example: Earthquake location

CM = (GT Cd−1G)−1 Cd = σ 2I

Where do significant the trade offs occur ?


85
Discrete non-unique inverse
problems

Non-uniqueness: When there is no one answer to the question...

86
Example: Travel time tomography

Seismic travel times are observed at the surface, and we want


to learn about the Earth’s structure at depth. Travel times are
related to the wave speeds of rocks through the expression
Z Z
1
t= dl = s(x)dl
R v(x) R

The raypath, R also depends on the velocity structure, v(x). R


can be found using ray tracing methods.

Is this a continuous or discrete inverse problem ?


Is it linear or nonlinear ?
6 87
Travel time tomography example

We can linearize the problem about a reference model so(x) or


vo(x). We get either...

Z Z
1
δt = δs(x)dl or δt = − 2 δv(x)dl
Ro Ro vo

M
X
δm(x) = δmj φj (x)
j=1
(
1 If x in block j
φj (x) =
0 otherwise

M
X Z M
X
δti = δmj φj (x)dl = δmj Gi,j
j=1 Ro,i j=1

How do elements of the matrix Gi,j relate to the rays ?


88
Travel time tomography example

The element of the matrix Gi,j is the integral of the j-th basis
function along the i-th ray. Hence for our chosen basis
functions it is the length of the i-th ray in the j-th block.

δti = Gi,j δmj

δ d = Gδ m
⎡ ⎤
l1,1 l1,2 · · · , l1,M
⎢ l2,1 l2,2 · · · , l2,M ⎥
⎢ ⎥
G=⎢ .. .. ... .. ⎥
⎣ ⎦
lN,2 lN,2 · · · , lN,M

δdj = toi − tci(so) Travel time residual for i-th path


δmj = sj − so,j Slowness perturbation in j-th cell
li,j = Length of i-th ray in j-th cell
89
Travel time tomography example

One ray and two blocks

δti = Gi,j δmj

Non-uniqueness

δt1 = l1,1 ∗ δs1 + l1,2 ∗ δs2


90
Travel time tomography example

Many rays and two blocks

δti = Gi,j δmj

Uniqueness ? NO !

δti = li,1 ∗ δs1 + li,2 ∗ δs2 (i = 1, N )


91
Travel time tomography example

Can we resolve both slowness perturbations ?

δt1 = l1,1 ∗ δs1 + l1,2 ∗ δs2

δt2 = l2,1 ∗ δs1 + l2,2 ∗ δs2

δ d = Gδ m
l1,1 l2,1
= ⇒ |G| = 0
l1,2 l2,2
G has a zero determinant and hence problem is under-
determined

Zero eigenvalues => Linear dependence between equations =>


no unique solution. An infinite number of solutions exist !

Same argument applies to all rays that enter and


exit through the same pair of sides.
92
Travel time tomography example

Two rays and two blocks

δti = Gi,j δmj

Uniqueness ? YES

δti = li,1 ∗ δs1 + li,2 ∗ δs2 (i = 1, 2)


93
Travel time tomography example

Two rays and two blocks

δti = Gi,j δmj CM = (GT Cd−1G)−1

Model variance
is low but cell
size is large

Over-determined Linear Least squares problem

δti = li,1 ∗ δs1 + li,2 ∗ δs2 (i = 1, N )


94
Travel time tomography example

Many rays and many blocks

δti = Gi,j δmj

Model variance is
higher but cell
size is smaller

Model variance
and resolution
trade off

Simultaneously over and under-determined


Linear Least squares problem
Mix-determined problem
95
Recap:

In a linear problem, if the number of data is less than the number


of unknowns then the problem will be under-determined.

If the number of data is more than the number of unknowns the


system may not be over-determined. The number of linearly
independent data is what matters. This is the true number of
pieces of information.

Linear discrete problems can be simultaneously over and


under-determined. This is a mix-determined problem.

There is a trade-off between the variance (of the solution) and the
resolution (of the parametrization).

96
Discrete ill-posed problems

What does the data misfit function look like in a non-unique problem ?

1
ψ(m) = (d − Gm)T Cd−1(d − Gm)
2

Gm1 = 0
d = G(mo + m1) = Gmo
97
Discrete non-unique problems

What happens if the normal equations have no solution ?

mLS = (GT Cd−1G)−1GT Cd−1d = G−g d

Recall that the inverse of a matrix is proportional to the reciprocal


of the determinant
" # " #
a b −1 1 d −b
G= |G| = ad − cb G =
c d |G| −c a

The determinant is the product of the eigenvalues. Hence the inverse


does not exist if any of the eigenvalues of GT Cd−1G are zero
We have seen examples of this in the tomography problem

This is an ill-posed or
under-determined problem
with no unique solution

98
The Minimum Length solution
If the problem is completely under-determined we can minimize
the length of the solution subject to it fitting the data.

M in L(m) = mT m : d = Gm
Lagrange multipliers says minimize φ(m, λ)
φ(m, λ) = mT m + λT (d − Gm)
...and we get
h i
mM L = GT (GGT )−1d G= l1 l2

We get the same solution from here


Example

T = l1s1 + l2s2
φ = s2 2
1 + s2 + λ(T − l1s1 − l2s2)
s l
⇒ 1= 1
s2 l2
l1 T l2 T
s1 = 2 + l2 )
s 2 = 2 + l2 )
(l2 1 (l2 1 99
Minimum Length and least squares solutions

mLS = (GT G)−1GT d mM L = GT (GGT )−1d

mest = G−g d
Model resolution matrix
mest = Rmtrue

R = G−g G
Least squares

R = (GT G)−1GT G = I
Minimum length

R = GT (GGT )−1G

100
Example: Minimum Length resolution matrix

Model resolution matrix

mM L = GT (GGT )−1d

mest = G−g d = G−g Gmtrue


mest = Rmtrue

R = G−g G

R = GT (GGT )−1G
à ! " à !#−1
l1 ³ ´ l1 ³ ´
R= = l1 l2 l1 l2
l2 l2
à !
1 2 l l
l1 Unlike the least squares case the
R= 1 2
2 + l2 ) 2 model resolution matrix
(l1 2
l2l1 l2 is not the identity
à !
If l1 = l2 1 1 1
⇒R=
2 1 1 101
Minimum Length and least squares solutions

mLS = (GT G)−1GT d mM L = GT (GGT )−1d

mest = G−g d
Data resolution matrix
dpre = Ddobs
D = GG−g
Least squares

D = G(GT G)−1GT
Minimum length

D = GGT (GGT )−1 = I


There is symmetry between the least squares and minimum length solutions.
Least squares complete solves the over-determined problem and has perfect
model resolution, while the minimum length solves the completely
under-determined problem and has perfect data resolution. For mix-determined
problems all solutions will be between these two extremes.

102

You might also like