15 views

Uploaded by Asaduz Zaman

MSc thesis

- Underbelly Fairing Composites Optimization VersionII
- Conceptual Framework of the Study Final
- Pvt Keyword
- Axioms: Neutrosophic Multi-Criteria Decision Making
- Isprs Archives XL 3 W4 123 2016
- IJETR021313
- Satisficing, Optimization, and Adaptive Systems
- Column Optimization
- Levenberg_Marquardt
- Worked_Examples_03.doc
- 37 Paper AskarbekPazylbekov
- 03 RA47043EN70GLA1 Physical RF Optimization
- Conceptual Framework of the Study final.docx
- An Introduction to Optimization Chong Solution Manual PDF
- 60 Comparison Of
- Cascading Flowlines and Layout Modules
- Bokanowski Picarelli Zidani Revised
- IMC Mining Integrated Pit Dump and Haulage
- Exercises Nosol
- Distribution network reconfiguration for loss reduction using fuzzy controlled evolutionary

You are on page 1of 79

Registration

Department of Computing

Imperial College London

MEng. Computing

June 2009

Acknowledgements

Without the helps from some of the kindest, smartest and most enthusiastic people, I would never be able to produce any work as better

as this project.

First of all, I thank my supervisors, Prof. Berc Rustem and Prof.

Daniel Rueckert, for unlimited idea and support they have given me.

Prof. Duncan Gillies for being my personal tutor and second marker.

Dr. Daniel Kuhn for being patient to hear my talking and Dr. George

Tzallas-Regas for any time discussion and suggestion.

I also owe a debt to Prof. Rasmus Larsen for sending me a very useful

tutorial and Dr. Stefan Klein for his advice and explanation.

Finally, my thanks to all of my friend who have made an exciting time

at Imperial.

Abstract

medical image processing, face recognition, object flow and tracking.

The objective is to minimize the difference between two images and

produce the best transformation to match a deformed image to a reference image. To find this transformation, optimization is needed. In

this paper, we analyse and present a framework for two optimization

approaches: non-linear deterministic and stochastic approximation.

For non-linear deterministic optimization, we will examine most of the

widely use algorithms, such as Gauss Newton, Levenberg-Marquadrt,

QuasiNewton and Nonlinear Conjugate Gradient, and present some

modifications, Recursive Subsampling technique and Weighted technique, to enhance the rate of convergence for particular type of applications.

In addition, we will propose a novel approach to Stochastic Approximation that based on Difference Sampling. This technique avoids

bias in the case where there is only a small distortion at local parts

of the image. Therefore it will reduce the variance in approximating

the solution compared to Uniform Random Sampling. The stochastic

optimization method will be analysed is Robbins-Monro.

The results show a better convergence of Weighted/Recursive Subsampling technique and Difference Sampling technique compared to

traditional methods.

Special attention is paid to nonrigid 2D monomodal image registration.

Contents

1 Introduction

2 Image Registration

2.1

2.2

Local Deformation . . . . . . . . . . . . . . . . . . . . . . . . . .

7

8

2.3

Registration framework . . . . . . . . . . . . . . . . . . . . . . . .

10

2.3.1

Cost Function F . . . . . . . . . . . . . . . . . . . . . . . .

10

2.3.2

2.3.3

Gradient g . . . . . . . . . . . . . . . . . . . . . . . . . . .

Transformation W(p) . . . . . . . . . . . . . . . . . . . . .

11

12

2.3.4

Optimization . . . . . . . . . . . . . . . . . . . . . . . . .

12

2.3.5

Pre-condition . . . . . . . . . . . . . . . . . . . . . . . . .

13

3 Deterministic Optimization

3.1 Gauss-Newton (GN) . . . . . . . . . . . . . . . . . . . . . . . . .

14

16

3.2

Levenberg-Marquardt (LM) . . . . . . . . . . . . . . . . . . . . .

16

3.3

Quasi-Newton (QN) . . . . . . . . . . . . . . . . . . . . . . . . .

17

3.4

3.5

Step-size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

18

3.6

19

23

4.1

4.2

4.3

Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . 26

iv

23

CONTENTS

5 Stochastic Approximation

5.1

39

Stochastic Approximation . . . . . . . . . . . . . . . . . . . . . .

5.1.1 Robbin-Monro and the derivative . . . . . . . . . . . . . .

39

41

5.1.2

Decaying sequence . . . . . . . . . . . . . . . . . . . . . .

41

5.2

Difference Sampling . . . . . . . . . . . . . . . . . . . . . . . . . .

42

5.3

Sampling Strategy . . . . . . . . . . . . . . . . . . . . . . . . . .

5.3.1 Deterministic Sampling . . . . . . . . . . . . . . . . . . . .

44

44

5.3.2

Stochastic Sampling . . . . . . . . . . . . . . . . . . . . .

45

46

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

54

5.4

6.2

6.3

Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

6.2.1

Cost Function F . . . . . . . . . . . . . . . . . . . . . . . .

54

6.2.2

6.2.3

Image Gradient I . . . . . . . . . . . . . . . . . . . . . .

Transformation W(p) and Jacobian W

. . . . . . . . . . .

p

54

55

6.2.4

Other evaluations . . . . . . . . . . . . . . . . . . . . . . .

55

User Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

7 Conclusion

58

60

References

69

List of Tables

3.1

14

3.2

20

4.1

28

vi

List of Figures

1.1

Pictures of Lena . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

1.3

2.1

2.2

2.3

2.4

Traditional Registration Framework . . . . . . . . . . . . . . . . .

9

13

3.1

21

3.2

22

4.1

23

4.2

24

4.3

4.4

Convergence rate of single and subsampling methods . . . . . . .

25

29

4.5

30

4.6

31

4.7

4.8

Knee: Convergence of different grid size . . . . . . . . . . . . . . .

32

33

4.9

34

35

4.12 Lena: Converegence of UnWeight and Weight methods . . . . . .

36

37

38

vii

LIST OF FIGURES

5.1

44

5.2

strategy and Stochastic strategy . . . . . . . . . . . . . . . . . . .

45

5.3

46

5.4

47

5.5

5.6

Convergence of RLD by Deterministic and Stochastic methods . .

50

51

5.7

52

5.8

53

61

62

A.4 Knee: Average Performance of SubSampling methods . . . . . . .

62

63

63

64

viii

Chapter 1

Introduction

Given two images 1.1(a) and 1.1(c). The two images are not similar, we want to

(a) Original

(b) Difference

(c) Deformed

find out if image 1.1(c) represents the person in image 1.1(a). The grid line in

each image shows the original pixel alignment. Image 1.1(b) shows the difference

of two images. If we can find a transformation grid, Figure 1.2. We could apply

the transformation on image 1.1(c) and see if the two images represent the same

person (with some acceptable threshold).

We aim to find a transformation W that spatially align the two images, such

that the deformed image 1.1(c) could be warped back to the referenced image

1.1(a). The difference image of above examples is obtained by taking the absolute

intensity difference between every pair of pixels of input images. The range

(a) Original

(b) Difference

(c) Deformed

of difference 0 . . . 1 represents pixel intensities, from black to white. The value

of difference image can be defined by Sum of Square value (SSD). SSD value

of 1.1(b) is 595. After registration, we found the new difference 1.3(b) with

SSD = 26.9.

The above process is Image Registration. It is a common task used in various

types of applications: object recognition, optical flow, medical image processing.

The process of finding an optimal transformation could be very expensive, which is

an disadvantage for many applications. Some clinical process such as brain shift

estimation based on intra-operatively acquired ultrasound (36) require almost

real-time registration. Therefore, a fast registration is desired. In this paper, we

pay attention to intensity based registration for 2D images.

The various medical imaging modalities availability allows us to obtain more

details of humans body functioning and anatomy. For instance, Magnetic Resonance Imaging (MRI) systems give a detailed description of brain anatomy,

while Positron Emission Tomography (PET) techniques depict the functioning

images from different modalities (multimodal image processing (45)) to simultaneously obtain all available information. Some classic image registration techniques

(30) include:

Spatial Registration where all dimensions are spatial. Most applications

focus on 3D-3D registration of two images that can be presented by two

tomographic datasets, or the registration of a single tomographic image to

any spatially defined information, e.g., a vector obtained from EEG data.

2D-2D registration may apply to separate slices from tomographic data,

or intrinsically 2D images like portal images. 2D-3D registration is also

possible but more complicated, eg. pre-operative CT image to an intraoperative X-ray image.

Times series Registration could be used to monitor the growth process of

medical treatment, such as monitoring of tumor growth (medium interval),

post-operative monitoring of healing (short interval), or observing the passing of an injected bolus trough a vessel tree (ultra-short interval).

Intrinsic/Extrinsic registration methods: Intrinsic methods are based on

image generated content only, such as a set of identified salient points (landmarks) on object surfaces. In contrary, Extrinsic registration is based on

foreign objects introduced into the imaged space. A commonly used fiducial object is a stereotactic frame (12) screwed rigidly to the patients outer

skull table. Such frames are used for localization and guidance purposes in

neurosurgery.

In this paper, image registration refer to an application for 2D-2D images

that come from an identical sources (monomodal ) where one image is a random

deformed version of another. The image is usually displayed by assigning varying

levels of brightness known as gray levels or intensities, to each point in the image

space. Our interest in geometrical shapes and their interrelationships requires us

to impose a coordinate system on each participating image space. The points in

the image space are specified by the usual Cartesian coordinates, i.e. as distances

from the orthogonal coordinate system axes. Image registration can now be

defined as the process of finding the one-to-one mapping between the coordinates

in the image spaces of interest such that the points so transformed will correspond

to the same anatomical point. We also emphasize the use of difference image

which represent the error between two images by the mean of taken an absolute

difference between two imaged pixel intensities at the same coordinates.

In the next Chapter, we briefly describe Image Registration Framework. We

show that Image Registration problem can be formulated as an Optimization

problem, thus we will analyse the performance of difference Optimization approaches. In fact, there is no such Optimization method that produce the best

performance for all type of applications but the choice of Optimization methods depends on particular applications. We, therefore, review the Optimization

methods based on different type of deformation images.

In Chapter 3, Deterministic Optimizations are listed for random uniformly

deformed images. The Quasi Newton (QN) method shows the best convergence

rate however it suffers from outliers in some test cases. Besides Quasi Newton,

Gauss Newton (GN) and Levenberg Marquardt (LM) methods produce a consistent convergence rate and GN is slightly better than LM. Nonlinear Conjugate

Gradient (NCG) result shows that the method is very dependent on the type of

input images.

The interesting part comes in Chapter 4, we present a framework for Recursive Subsampling registration. The technique Subsampling has been presented

before, however, most of those concepts are not similar from our concept. The

test results show that recursive Subsampling technique outperforms all normal

deterministic optimization methods. In addition, we also introduce a Weighted

Subsampling approach, which is inspired from the Difference Sampling in Chapter 5. Weighted Subsampling methods demonstrate even more atractive results

compared to Subsampling methods for localised deformed images.

Finally, the best of this thesis is here. In Chapter 5, we proposed a novel approach based on Stochastic Appoximation. The type of registrations that is suitable for this method is local deformation at local parts of images, which is a very

likely type of applications in medical image processing. This approach employs a

random non-uniform sampling method that we call it Difference Sampling. The

results show that for local-part-deformed images, Difference Sampling Stochastic

Deterministic Optimization methods.

Chapter 6 provides a guidance on implementaion materials of the entire thesis

using MATLAB. Chapter 7 is the conclusion. More testing results are shown in

Appendix A.

Chapter 2

Image Registration

Image Registration has been an active research area from the last few decades.

In general, depends on the type of applications, we use different registration

techniques. Classification of image registration is primely defined as followed:

Dimension: 2D-2D, 3D-3D, 2D-3D

Nature of Deformation: Rigid, Affine, Local Deform

Optimization procedure

Modality: monomodal, multimodal

Manual, Semi-automatic, Automatic registration

An overview on classical registration methods can be found in (30). All methods

that we discuss in this paper emphasize the 2D-2D monomodal automatic registration application with a potential to extend to 3D-3D multimodal applications.

The type of deformation (39) of an image determine the complexity of the

registration problem and affect the choice of suitable registration methods. Typically, the deformation of an image is categorised by number of parameters or

degree of freedom: rigid, affine transformation and local deformation.

2.1

of translation, rotation and scale. It preserves distances, lines and angles:

0

p11 p12 0

x

x

y 0 = p21 p22 0 y

1

0

0 1

1

An Affine Transfomation model is a bit more complex which has up to 8 parameters. In addition to rigid model, affine model compensates for global size changes

and shears. It preserves parallel lines but not angles:

0

p11 p12 p13

x

x

y 0 = p21 p22 p23 y

1

p31 p32 1

1

Readers should reference to (29) for intensive review on rigid image registration methods. In (2), S. Baker and I. Matthews provided a very comprehensive

analysis and extension on gradient descent image registration methods for rigid

and affine transformation. They showed that Gauss-Newton (35) and LevenbergMarquardt (27; 31) produce better performance compared to Newton, Steepest

Descent, Diagonal Hessian.

2.2

Local Deformation

order to account for more complex deformation, for instance, the deformation of

the heart muscle due to respiration in MRI, higher order transformation has to be

used (46). One of the earliest method was introduced by Goshtasby (16), which

use a modification to the original least square method. Another method based

on fluid registration (5) cause the changing in intensity during the registration

process. However, the use of radial basis functions as mapping transformations

has shown a big advantage over other techniques. Typical radial basis function

methods are thin-plate spline registration method (33) and B-spline model of

local deformation. In this paper, we will use the B-spline transformation model

which is introduced by D. Rueckert et. al (38) because of its robustness and fast

convergence for large scale problems. For nonrigid deformation model, we define

a combined transformation consisting of a global and a local components:

T (x) = Tlocal (Tglobal (x))

(2.1)

where x = (x, y), Tglobal is an affine transformation matrix and Tlocal is a local

transformation matrix. In this paper we only examine the local deformation

therefore Tglobal is absorbed in 2.1. Following Rueckerts formulation (38), we

derive a 2D model to our problem:

Tlocal (x) = x +

3 X

3

X

Bm (u)Bn (v)pi+m,j+n

(2.2)

m=0 n=0

where pi,j is a control point of the grid px py with uniform spacing and Bm is

a m-th cubic B-spline basis function (26) and:

i = bx/nx c 1

j = by/ny c 1

u = x/nx bx/nx c

v = y/ny by/ny c

One of the attractive features is that the basis functions have local supports, i.e.

if we change a control point pi,j , it only affects its local neighbourhoods. The

mesh of control points P acts as parameters for the transformation matrix and

its resolution px py decides the degree of freedom (number of parameters) of the

registration problem. A large spacing mesh grid is less expensive to solve than a

fine mesh grid, however its pay off is that it can not model a small local deform.

For example, a mesh grid of 5 5 control points yields a 50 -parameter problem

should not produce as quality matching as a mesh grid of 9 9 control points

(162 -parameter problem), however, the 50 -parameter problem is less expensive

than the 162 -parameter problem. The choice of mesh grid resolution is up to the

B-spline blending function is its constant derivative with respect to the objective

parameters (section 2.3.3).

2.3

Registration framework

There are various algorithms for image registration such as difference decomposition (15) or linear regression (7) however the gradient-based framework that first

proposed by Lucas-Kanade (28) is still the most widely use technique.

Given a deformed image I and a referenced image T. The registration process

aims to find a spatial transformation matrix W(p), where pT = (p1 , . . . , pn ) is

a set of parameters, that match the two images: I(W(p)) T . Lucas-Kanade

algorithm iteratively generates the transformations W(pk ) that reduce the difference between two images. The process of generating a set of warp parameters pk

at iteration k -th requires the gradient of the cost function, gk , and an appropriate

descent parameter, ak , that ensure the descent property of the cost function:

pk+1 = pk + ak gk

(2.3)

The gradient of the cost function is derived in the next section. The descent

parameter is defined depends on different optimization schemes (Chapter 3 and

5).

2.3.1

Cost Function F

in intensities between images. Succeeding in the applications requires a quantitative measure of the goodness of the registration. There are various of intensity

difference measure methods such as Mutual Information (45), Cross-correlation

(13), Histogram entropy (6). For simplicity and enhanced-optimization focus of

monomodal image registration, we use Sum of Squared Difference (SSD) as

the cost function throughout the paper:

1X

[I(W(p)) T]2

2 x

10

(2.4)

minimization process that minimize 2.4 with respect to p. Assuming we know the

current estimation of p, the problem becomes iteratively solving for the increment

p; thus 2.4 can be written as:

1X

F=

[I(W(p + p)) T]2

2 x

(2.5)

thus the termination criteria normally are F or kpk . In order to find

p, we perform the first order Taylor expansion on I(W (p + p)) transform 2.5

to:

2

1X

W

F=

I(W(p)) + I

p T

2 x

p

I I

where I = x

, y is the gradient of image I evaluated at W(p) and

(2.6)

W

p

is the

Jacobian of the transformation matrix (2.9). Next, differentiate 2.6 with respect

to p yields:

X

x

W

I

p

T

W

p T

I(W(p)) + I

p

X W T

1

p = H

I

[I(W(p)) T]

p

x

(2.7)

where H is the Hessian matrix of the objective function and a part of the descent

parameter in Equation 2.3. The second term in the RHS is the gradient of the

objective function (2.8). Expression 2.7 is only used when applying Deterministic Optimization methods, for Stochastic methods, we use different techniques

(Chapter 5). The choice of Hessian evaluation is one of the main sources that

affects the application performance.

2.3.2

Gradient g

X W T

[I(W(p)) T]

g=

I

p

x

11

(2.8)

2.3.3

Transformation W(p)

The transformation matrix is defined by the B-spline tensor model 2.2, W(p) = Tlocal .

Hence, the derivative of the deformation field with respect to control points p is:

3

3

W X X

=

Bm (u)Bn (v)

p

m=0 n=0

(2.9)

For any given input images, we can always compute the Jacobian W

at the bep

gining of the procedure and do not need to recompute it during the registration

process. This is a big advantage to reduce the computational cost.

During transformation process, interpolation procedure is essential. Images

come discrete with pixel values at integer coordinates. However, after apply transformation, the coordinates are likely to be fractional numbers. Therefore we must

be able to evaluate the image pixels at arbitrary coordinates. This is achieved

by interpolation. Different methods of interpolation such as linear, bilinear, trilinear can be used. In this paper, we will use linear interpolation because of its

reasonable good quality and less expensive than higher order interpolation.

2.3.4

Optimization

and update parameters (2.3). This is an Optimization Process.

The fact that Image Registration is an Optimization problem benefits itself from

a vast amount of literature on one of the most studied subject in mathematics.

Popular optimization methods include gradient descent (35), conjugate gradient

(10), Newton type (quasi-Newton (11), Gauss-Newton (35), etc.), stochastic approximation (37) and evolutionary strategy (18). However, every benefit has its

pay-off. The various availabilities of optimisation methods trigger two problems:

the choice of optimisation methods and the parameter settings for optimisation

problems. Unfortunately, current literatures does not provide the definite answer

to these problems. There have been many researches on this topic which produce

some limit guidance on the choice of optimisation methods as well as aided constraints.

12

Levenberg Marquardt, Quasi Newton, Nonlinear Conjugate Gradient (Chapter

3) on random deformed images. In addition, we propose some extensions to

these methods in Chapter 4. Chapter 5 presents an extension to Robbins Monro

Stochastic method for local deformation at local parts of images.

2.3.5

Pre-condition

In practice, the use of cost function 2.4 is not realiable if we do not include some

pre-conditions. The first condition is that the input images must come from the

same source to assure monomodality. The second condition is that the deformed

fields must not be folded. One way to exempt the second condition is by adding

the regularisation term into the cost function (17). However, we do not include

this therefore we ensure that no folding is possible for random generated inputs

by ensuring the Jacobian of the transformation fields is non-negative.

13

Chapter 3

Deterministic Optimization

A standard formula for deterministic optimization methods follows the derivation

from 2.3 and 2.7:

pk+1 = pk H1

k gk

(3.1)

of the cost function and gk is defined as in 2.8. All optimization algorithms

described in this chapter follows the form of 3.3 except there is a slight difference

in NCG method. The complexity of the procedure 3.3 is shown in table 3.1.

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

N2

N n2

N2

N2

N 2n

N 2n

O(H)

O( )

n3

Table 3.1: Complexity of the registration framework where N is the total number of pixels

of an input image and n is number of control points of the mesh grid

(3.2)

The following sections show that the choice of Optimization affect the performance because it contributes different complexity measure to evaluate p. The

step-size complexity often negligible because in medical image processing application, the initial deformation often not very large (no folding allowed), therefore

the approximate descent direction itself often satisfy the descent property.

14

(

)

1X

p = min F =

[I(W(p)) T]2

p

2 x

P recompute :

(1)

Iterate :

(2)

W

by 2.9

p

Evaluate W(p) by 2.2

(3)

(4)

Evaluate I

(5)

Evaluate I

(6)

W

p

X

x

W

I

p

T

[I(W (p)) T ] (2.8)

(7)

(8)

(9)

U ntil :

i.

kpk p

ii.

kFk F

iii.

kgk g

iv.

for NCG method: (7) is not used, (9) p is computed using 3.8

Step (1)-(6) and (10) are essential for every method, step (8) is negligible for good

approximation. Therefore the comparable complexity now relies on step (7) and

(9).

15

(3.3)

3.1

Gauss-Newton (GN)

objective function 2.4 can be formulated as a least-square problem:

1X 2

F =

f

2 x

(3.4)

2 F to:

X f T f

H

p

p

x

T

X

W

W

=

I

I

(3.5)

p

p

x

The convergence of Gauss-Newton to a local minimum can be rapid if the approximated Hessian term dominates the value of the full Hessian evaluation by

Newton method (35, p256-257), eg. when the approximation is good and close

to local minima, and can approach quadratic rate (4, p341-342). The cost for

evaluating 3.5 is O(N 2 n2 ), thus the comparable complexity is O(N 2 n2 + n3 ).

3.2

Levenberg-Marquardt (LM)

takes into account whether the error in approximating the Hessian gets better

or worse after each iteration. The implementation of Levenberg-Marquardt for

registration problem as follows (2, p39-40):

2

W

0

...

I p1

2

X W T W X

0

I W

...

p2

H

I

I

+

.

.

..

p

p

..

..

.

x

x

...

..

.

2

W

I p

n

(3.6)

trust regions approach, the second term in 3.6, to guarantee the descent property

16

on the error in approximating the Hessian. For instance, if the error decreases,

, then LM method is

approximately GN method. If error increases, is increased to 10. The

rate of convergence for LM method can be expressed as similar as GN method

and the comparable cost is the same: O(N 2 n2 + n3 )

3.3

Quasi-Newton (QN)

of the objective function at each iteration. Furthermore, its evaluation does not

include the image gradient and the Jacobian of the warp matrix, which contains

the expensive calculation of N N . Indeed, the Hessian evaluation in QuasiNewton method only require calculation with the matrix of dimensions n n

1 such as SR1,

much smaller than N N . There are various ways to construct H

DFP, BFGS (35), however, numerical experiments have shown that the Broyden

class (BFGS) is more efficient (22; 35). We use a Broyden update approximation

to H (40):

Hk Hk1 +

kpk pk1 k22

(3.7)

kpk+1 pk

0

k kpk p

k

lim

The computational cost to compute 3.7 is O(n3 ), thus the comparable complexity

is reduced to O(n3 ).

3.4

Starting from a Linear Conjugate Gradient method to solve for a convex quadratic

function (35, p102), Fletcher and Reeves (14) extend the method to nonlinear

17

3.5 Step-size

gk and the previous search direction dk1 = pk1 = pk pk1 :

p = dk = gk k dk1

(3.8)

Dai Y uan : kDY =

gkT gk

dT

(g

k1 k gk1)

(3.9)

dT

k1 (gk gk1)

(3.10)

The choice of k has a big influence to the convergent property of the method.

In this study, we adopt a hybrid version (9) as in (22):

k = max(0, min(kDY , kHS ))

(3.11)

One practical implementation of NCG methods is to restart the iteration at every m steps by setting = 0, this property is heuristically handled in the hybrid

method above. Readers can refer to (9) and (8) for extensive review on convergence properties of NCG. The comparable cost to evaluate the search direction is

O(n2 ).

3.5

Step-size

Many step-size strategies are available to date (35, Chapter 3), thus the choice

of an optimal step-size strategy remains one of the most difficult thing in optimization. A traditional line search method, Armijo rule (1), guarantee global

convergence, and for certain type of problems, we could apply the Modified Armijo

method (42) which shows better performance. For our comparative study, we implement an Armijo-similar method with a certain maximum number of iterations:

1

k 1

2

2

T

= min kgk+1 k2 kgk k2 gk p

(3.12)

k

2

2

1

where (0, 1); k = 0,1,2,. . . and 0,

2

18

3.6

choice. The input data are generated by randomly displaced the control points

by the normal distribution of N (0, 10) to obtain the deformed images. We

generate 100 of such randomly deformation. The average initial SSD from Lena

images is 606.26, and from Ankle images is 1562.15. Figure 3.1 represents the

convergence rate of 3 particular random tests (the graph is plotted after the first

iteration).

We also look at the average performance for 100 random inputs by examining

the boxplot of SSD and time measures in Figure 3.2.

It seems that GN is more consistent and perform better than LevenbergMarquardt, it produces a good convergent result within shorter time. This pattern is expectable because the initial deformation is reasonable small therefore

the approximate Hessian without diagonal matrix calculation is sufficient. NCG

convergence rate depends on how much difference the original images are and

how good approximate descent direction at each iteration. According to the convergence figure 3.1, NCG converge very slow when it approaches the optimal

solution. In the tests, I use a same stopping criteria for all methods for fairness,

however, the average time for NCG could be reduced by alter its termination criteria so that it could stop as it gets close to local minima. Intensive study on the

test results shows that although QN produce worse final result in average for 100

tests, its rate of convergence is always faster compared to other. The fact that

QN produce a slightly higher optimal value is due to its premature termination

at local minima sometimes. However, since the time taken for QN is very small

compared to other methods, one could reapply QN registration using the resulted

warped parameters to obtain a better optimal value while not exceeding the time

by other methods. We will see this in the next Chapter - Recursive Subsampling

Technique.

19

Algorithm

Gauss-Newton

Levenberg-Marquardt

Quasi-Newton

NCG

Comparable Convergence

Complexity

Rate

N 2 n2 + n3

N 2 n2 + n3

n3

n2

Fast

Medium

Very Fast

Depends

Lena Test

Ankle Test

SSD Time SSD Time

47.2

40.8

53.7

39.7

26.7

39.5

7.8

18.4

20.8

20.8

22.3

21.4

29

44.8

11.6

78.2

n is number of control points.

20

(a) Lena

(b) Ankle

Figure 3.1: Convergence Rate for 3 random tests. Each panel in the figure shows one

21

random test. F is the SSD measure, T is CPU runtime measure. GN (GaussNewton),

LM(LevenbergMarquardt), QN(QuasiNewton), NCG(Nonlinear Conjugate Gradient)

(a) Lena

(b) Ankle

Figure 3.2: Average performance on 100 random tests. Difference is SSD measure, Time is

CPU runtime measure. GN (GaussNewton), LM(LevenbergMarquardt), QN(QuasiNewton),

22

NCG(Nonlinear Conjugate Gradient)

Chapter 4

Recursive Subsampling and

Weighted approach for

Deterministic Optimizations

4.1

Images

In many literatures, subsampling techniques either refer to random sample

subset of pixels (22; 45) or a single subset of pixels (24; 32). The random subsampling approach is reviewed in Chapter 5, and the use of single subset of pixels

cannot be guaranteed to converge. Our approach is more or less similar to the

23

paper by Sun and Guo (44), however our proposal method is more general.

As we know for our gradient descent optimizer, a better initial estimation

of objective parameters enhance the rate of convergence. This idea helps us to

think of a way to find a good estimation of warp parameters p supplied to the full

resolution image registration. For instance, given the images of resolution Nx Ny

with total number of pixels N , we could shrink it by half to N2x N2y with N4 pixels,

downscale it again by half to

Nx

4

N4y with

N

16

N

-pixels

16

images is around 4 times faster than N4 -pixels images and 16 times faster than

full images. The idea of recursive subsampling framework is that we will use the

result warped parameters of a smaller dimensional registration phase as an initial

estimated parameters for the larger dimensional registration phase. In order to

do this, we add a recursive subsampling mechanism to our traditional registration

framework 2.4. The number recursive phase indicates how many times we want

to downsampling the images, stop shrinking the images when the resolution is

smaller than 50 50 could be a good heuristic criterion.

24

Deformation

The use of Recursive subsampling methods also empower the good performance of Quasi-Newton method compared to others because with good initial

estimation, Quasi-Newton method is less likely to encounter premature termination at local minima.

4.2

From this section, we refer Random Deformation as Random Uniform Deformation on the whole image and Local Deformation as Random Deformation at

local parts of the image.

The idea to associate weights to the registration process comes from Difference

Sampling approach (Chapter 5). For local deformation inputs, which are very

likely to happen in medical image processing, one can manually identify the

deformed parts before starting registration process. If we could find a way to

automatically identify the local deformed parts, then emphasize these regions

during transformation, the performance can be enhanced. We propose a method

to implement this idea by using a weighted transformation which is proportional

to the current difference image. The difference image can be defined as:

x = kIx T xk

25

(4.1)

where x is a pixel of an image and x [0, 1]. We want to assign weights that

proportional to x . Lets define a weight that corresponding to pixel x to be:

(x) = + log(1 + x )

(4.2)

We know, for certains, that log(1 + x ) [0, 0.7). In my study, I suggest that

we set to be small when the local deformation is quite large and to be large

when the local deformation is quite small. This heuristic prevents the method

from slow convergence and premature termination. The transformation matrix

asscociated with Weighted Difference is therefore:

x = (x)Wx

W

4.3

(4.3)

All difference measure are Sum of Square Difference (SSD), all time measure are

CPU runtime in seconds.

Recursive Subsampling methods.

First, we examine the rate of convergence of the Recursive Subsampling methods compared to traditional methods. In our tests, we downscale the images by

N N N

3 levels, thus there are 4 phases with number of pixels equal to 64

, 16 , 4 and N .

For a random test, we use the same set of random input images of Lena pictures

and Ankle pictures, apply both methods and compare the rate of convergence.

Figure 4.4 show the test results.

The above figure shows that Subsampling technique is generally faster and produces similar or better results compared to single (normal) methods. We now compare rate of convergence of different subsampling optimizations. Figure 4.5 shows

convergence rate of four Subsampling optimizations: Subsampling Gauss-Newton

(SGN), Subsampling Levenberg-Marquardt (SLM), Subsampling Quasi-Newton

(SQN) and Subsampling Nonlinear-Conjugate-Gradient (SNCG). Each panel

demonstrates convergence rate of one recursive phase: panel 1, 2, 3 and 4 correspond to

N N N

, , ,

64 16 4

resolution phases costs a fraction of time compared to large resolution phases. The

26

faster while giving good results.

In the next experimetns, we will see that SQN is the best choice. Lets look

at the overall performance on average for 100 random generated tests of Lena

picture (figure 4.6)and Ankle picture (figure 4.7). Recall in chapter 3 that single QN method sometimes suffers from outliers, the SQN method reduces this

bug dramatically since it uses a better starting estimation of warped parameters

at every recursion. The test results illustrate that SubSampling Quasi-Newton

method produces an as good result as other methods with much less time. Please

see Appendix A for performance testing on average of 100 tests for different type

of pictures: Lena, Ankle, Brain, Knee, Lung; table 4.1 summarizes the results of

Subsampling methods on all types of images. From now on, we only use Subsampling methods for testing purpose.

Next, we look at the effect of using different resolution grids of control points.

As we know the performance of registration process is influenced by number of

control points (Table 3.2). It is expecting that the finer grids will reduce the

cost function however it will take longer time because the number of parameters

increases. The graph in figure 4.8 compares the rate of convergence of using

5 5-grid and 7 7-grid on the Knee image. The average initial difference (SSD)

between two input Knee images is 2526.72. Figure 4.9 demonstrates the average

performance on 100 random tests. It confirms that the finer the grid, the better

the convergence however the computational cost is more expensive.

Finally, we will see what happen if the moving image gets deformed more

heavily. We generate random test images by increasing the variance of deformation from 10 to 15, and 20. We predict that the more deformation, the more likely

that optimizations stop at a local minimum, hence the worse convergence to the

optimal solution. The graph in figure 4.10 shows the convergence of different

deformation levels for Lena pictures. The boxplot in figure 4.11 empowers the

pattern by observing 100 random test of each deformation levels.

Weight Recursive Subsampling methods

In order to generate local deformation image. We randomly choose one control point and randomly displace the control point by a variance with Normal

27

distribution. We then apply the Recursive Subsampling methods with and without weight. Figure 4.12 compares the convergence rate between two approaches.

Clearly, we can see that for local deformation, Weighted Transformation converge

faster than UnWeighted Transformation. One disadvantage of Weight approach

is that it seems to stop prematurely when it get close to the local minimum.

However, as same as an argument for single Quasi-Newton method, we benefit

from the fast convergence therefore we could reapply the method using current

results. In general, the optimal values obtained by Weight methods are almost as

same as UnWeight methods while the consumed time is much more appreciated

(Figure 4.13).

Image

Type

Initial

SSD

SGN SLM SQN SNCG

SGN SLM SQN SNCG

Ankle 300x336

1562.15

20.7

20.7

21.6

22.1

20.8

26.7

9.5

75.0

Brain 354x353

3315.5

18.9

18.9

18.9

19.1

25.1

31.6

9.5

82.0

Knee 353x343

2526.71

38.5

38.6

38.2

39.2

30.1

33.4

12.3

64.2

Lena 256x256

1025.37

38.3

37.6

36.8

39.1

25.8

34.3

9.8

20

Lung 394x378

1327.49

8.1

8.1

8.3

8.2

19.7

27.8

10.1

98.4

Table 4.1: Summary of results for Subsampling methods on Random Deformation (100 tests

each image). The suffix after an image name is the resolution. Subsampling

GN(GaussNewton), LM(LevenbergeMarquardt), QN(QuasiNewton),

NCG(NonlinearConjugateGradient)

28

(a) Lena

(b) Ankle

Figure 4.4: Convergence rate of single and subsampling methods. F is SSD measure, T

is CPU runtime. The lower red dots in subsampling methods are convergence of shrinked phases.

GN(GaussNewton),LM(LevenbergeMarquardt),QN(QuasiNewton),NCG(NonlinearConjugateGradient)

29

(a) Lena

(b) Ankle

Figure 4.5: SubSampling Convergence Rate. F is SSD value, T is runtime. Each panel

shows convergence rate of one recursive phase. S-prefix is Subsampling.

GN(GaussNewton),LM(LevenbergeMarquardt),QN(QuasiNewton),NCG(NonlinearConjugateGradient)

30

Figure 4.6: Lena: Performance of single and subsampling methods. Difference is SSD value,

Time is runtime. Single is traditional method. GN(GaussNewton),

LM(LevenbergeMarquardt), QN(QuasiNewton), NCG(NonlinearConjugateGradient)

31

Figure 4.7: Ankle: Performance of single and subsampling methods. Difference is SSD

value, Time is runtime. Single is traditional method. GN(GaussNewton),

32

LM(LevenbergeMarquardt), QN(QuasiNewton), NCG(NonlinearConjugateGradient)

Figure 4.8: Knee: Convergence of 5 5 and 7 7-grid of control points. F is SSD value, T

is runtime. The data is drawn from the final recursive phase with full resolution. Subsampling

GN(GaussNewton), LM(LevenbergeMarquardt), QN(QuasiNewton),

NCG(NonlinearConjugateGradient)

33

Figure 4.9: Knee: Average Performance with different grid size. Difference measure is SSD,

Time is runtime. Subsampling GN(GaussNewton), LM(LevenbergeMarquardt),

QN(QuasiNewton), NCG(NonlinearConjugateGradient)

34

runtime. var -suffix indicates level of deformation, the higher suffix, the more deformation.

Data is drawn from final recursive phase. SGN, SQN, SLM, SNCG are Subsampling

Optimization methods.

35

measure is SSD, Time is runtime. var -suffix indicates level of deformation, the higher suffix,

the more deformation. SGN, SQN, SLM, SNCG are Subsampling Optimization methods.

36

Figure 4.12: Lena: Converegence of UnWeight and Weight methods for local deformation.

Difference is SSD measure, Time is CPU runtime. Subsampling GN(GaussNewton),

LM(LevenbergeMarquardt), QN(QuasiNewton), NCG(NonlinearConjugateGradient)

37

Figure 4.13: Lena: Average performance on 100 test between UnWeight and Weight

methods for local deformation. Difference is SSD, Time is runtime. SGN, SQN, SLM, SNCG

are Subsampling Optimization methods

38

Chapter 5

Stochastic Approximation

5.1

Stochastic Approximation

Image Registration is a large scale optimization problem. In addition to deterministic optimization algorithms that we reviewed in Chapter 3, the stochastic

gradient descent methods (23) is also widely investigated in current researchs.

The approximation framework follows the same scheme as 2.3 where:

k

pk+1 = pk + ak g

(5.1)

The distinctions in Stochastic Method (SM) include the derivative of the cost

k and the decaying sequence

function, g(pk ), is replaced by an approximation g

{ak }. SM aims to find the unknown solutions by successively reducing the inaccuracy in their estimates. They have been successfully applied in many applications

and been evaluated in the Image Registration field (22).

The speed and accuracy of SM depends on the quality of the gradient estimation obtained by random sampling. In general, Random Uniform Sampling (RUS)

is often used for both monomodal and multimodal stochastic image registration

(21). In this chapter, we present a novel approach to random Difference Sampling

(DS) methods which uses either Deterministic Sampling strategy or Stochastic

Sampling strategy. We argue that when the input images are very localised deformed, RUS results in too few samples at that specific location. One solution

could be allow more iterations, to ensure that in the end, enough samples have

been drawn from the local deformed regions. However, the immediate effect of

39

using more iterations is more computational time. If we can realiably detect the

misaligned regions, we could greatly accelerate the registration. Our Difference

Sampling Stochastic method aims to detect the deformed regions based on the

difference image x = kIx T xk, and randomly pick a subset of pixels from those

regions based on some defined non-uniform probability distribution.

When the image is largely deformed, the difference image is no longer a realiable indicator of misalignment. Ideally, Difference Sampling will converge to

Random Uniform as its non-uniform probabilities becomes almost uniform.

The registration procedure for Stochastic Approximation methods is similar

to 3.3 except we replace step (7), (8), (9), (10) by 5.1 and termination criterion

is replaced by convergence of {pk }:

W

by 2.9

p

Iterate : (2) Evaluate W(p) by 2.2

P recompute : (1)

(4) Evaluate I at x

W

(5) Evaluate I

at x

p

X W T

(6) Compute g =

I

[I(W (p)) T ] (2.8)

p

x

(7) Compute decaying sequence ak

(8) Update pk+1 = pk + ak g

U ntil :

i.

E{pk+1 } E{pk }

(5.2)

where is a subset of random pixels. The study of complexity for Deterministic Optimization (3.2) shows that step (5), (6), (7), (9) are the most expensive

and costs at least N 2 n + n3 (SQN and SNCG). In Stochastic methods, step (5)

and (6) costs S2 n instead of N 2 n, where S << N is the size of , step (7)

and (8) costs n. It makes up the competitive cost of S2 n + n much smaller than

N 2 n + n3 .

40

In the next sections we learn how to pick random pixels and how to approximate using those random data.

5.1.1

Monro (RM) (37) in 1958. The method assumes that an approximation of the

derivative of the cost function is available:

k = g(pk ) + k

g

(5.3)

if the bias

of the approximation error k goes to zero:

E(

gk ) g(pk ),

as k

(5.4)

k is not necessarily

vanish close to the solution p

, which satisfies g

(

p) = 0; therefore the convergence

of pk (5.1) must be forced by ensuring ak 0 as k . This leads to a

study of the decaying sequence ak .

5.1.2

Decaying sequence

sequence ak , is designed to guarantee convergence of the optimizer, is a nonP 2

P

increasing non-zero sequence ak , k N such that

k=1 ak <

k=1 ak = and

. Clearly, there are many sequences that describe the valid decaying sequence.

In practice of medical image processing problem, the following expression is often

used (43):

ak = /(A + Q)

(5.5)

Different adaptive step-sizes are described in (20) and (21). For simplicity and

adopting (3), we employ the implementation of step-sizes sequence in (20). The

algorithm observes that the more rapid oscillates of pk about the stationary point

p

, the closer pk is to its optinum. At the same time, the decaying sequence ak

should approach to zero.

41

i = 1,2,..pn . The oscillations of pk can be formulated as the rate of changes in

the sign of (pik pi ) (pik1 pi ) = pik pik1 . The step-size aik associated with

pik is therefore inversely proportional to the number of sign changes of pik pik1 .

Our modified adaptive step-size strategy to 5.5 for the i -th component of ak as

follows:

aik = /(A + Qik ), with = 1

(5.6)

where Qik is the number of sign changes in pim pim1 , m = 2,..k and Qi1 = 0.

A and are heuristically chosen depends on applications. In our experiments. I

set = 150 and A = 15.

5.2

Difference Sampling

Given a two input images which are very localised misalignment. A Random

Uniform Sampling (RUS) method might cause a bias in estimating the difference,

as in Figure 5.1, because it does not provide sufficient samples at the deformed

parts. Difference Sampling (DS), in contrast, is a non-uniform sampling approach

which could reduce the variance of the approximation, as in Figure 5.2. DS takes

into account the probability distribution based on the current difference between

two images.

The idea of DS basically follows a common sense: If the deformed image differs

from the referenced image at small local parts, we need not to consider very

much at those parts that are already identical but only at those small parts that

are different. In addition, the larger error between a pair of pixels (at the same

coordinates in two images), the more likely those pixels will be pick. Interestingly,

a few non-uniformly sampling approaches have been proposed before by Bhagalia

(3) and Sabuncu (41). However, their sampling methods emphasize the imageedges which is quite different to our approach and also cause a bias in case of

localised deformation.

In order to study the variance reduction by DS, we briefly explain how nonuniform random distribution brings advantages in certain problems. We want to

42

sample the a subset of pixels that indicate the current misalignment. Recall that

the error (2.4) can be written as:

1X

[I(W(p)) T]2

2 x

(5.7)

using RUS method:

uni = F

X

=

[Ix (W (p)) Tx ]2 PU

x

= f (X)

(5.8)

another approximation of F by using DS method to be:

dif = F

X

PU

[Ix (W (p)) Tx ]2 PD (x), where PD (x) =

=

w(x)

x

=

f (X)

w(X)

(5.9)

The expectation and variance of the above estimations can be written as:

uni = E(F ) =

dif

f (X)

var(dif f ) = var

w(X)

(5.10)

(5.11)

(5.12)

The above equations show that the expectation of RUS and DS methods are

similar. Therefore the use of DS is only advantageous if we can formulate a

f (X)

distribution X PD that ensure var( w(X)

) < var(f (X)). It is possible to do

so by setting the larger weight w(x) at pixels that have more influence to f (X).

How to set up the difference distribution and how to sample the data will be

discussed next.

43

5.3

Sampling Strategy

Sampling technique takes into account intensities of the difference image. The

simplest way to derive the probability of a pixel i -th is:

P (i) =

kIi Ti k

+ i

(5.13)

we explain the setting of those constants accordingly. The idea of our the sampling

strategies is inspired by similar techniques in (41)

5.3.1

Deterministic Sampling

we set = 1 and i = 0 for all i. The probability P (i) now is simply the intensity

value of pixel i-th of the difference image. We classify the pixels into different

levels based on its probability. The group at higher level has larger probability

44

strategy and Stochastic strategy

and its pixels are more likely to be sampled. Figure 5.3 illustrates an example of

probability classification.

5.3.2

Stochastic Sampling

We walk through every pixels in the difference image and at pixel i -th, we decide

to sample or not based on its probability (5.13) greater than a threshold, where

(0, 1) and i = rand(1, 1). The value of inversely proportional to the

number of samples, thus the larger deformation (more samples needed) require

smaller for sufficient number of samples. A good guidance to choose is between

0.2 (large local deformation) to 0.8 (small local deformation). While the first

component

kIi T ik

45

5.4

We examine the rate of convergence between Subsampling Deterministic Optimization algorithms and Robin-Monro Stochastic Optimization using two type of

sampling techniques(Difference Sampling with Deterministic strategy, Stochastic

strategy; and Random uniform Sampling). For fairness of comparison, we take a

subset of 2% of total number of pixels for all stochastic methods. Registration

is applied for MRI pictures of Knee with 512x512 pixels. Since the main part

of the MRI picture locates at the center, the side parts have uniform intensity

therefore Random-Uniform Deformation (RUD) of the image results in the large

misalignment at the center of the image and Random-Local Deformation (RLD)

results in the small change at one part of the center of the image (Figure 5.4).

A Random uniform Sampling Stochastic method (RSS) will be re-sampled

at every iteration to avoid bias and earn sufficient samples at every region of

the pictures. The Deterministic Sampling Stochastic method (DSS) only get

samples once since it concentrate on the regions of difference and also take into

account neighbour regions. The Stochastic Sampling Stochastic method (SSS)

samples the pixels based on the current difference of the two images therefore,

theoretically, it need to be re-sampled at every iteration. However, Stochastic Resampling at every iteration is very costly and indeed, the changes in difference

after one iteration is not much, we could possibly re-sample using Stochastic

46

Figure 5.4: MRI Knee: We generate 100 random uniform deformation (RUD) images by

randomly perturbed every control points and 100 random local deformation (RLD) images by

randomly perturbed one control point.

recommend to re-sample the images using Stochastic Sampling approach about

5-20 times during the whole process of registration. In our test cases, since I

set the maximum number of iterations is 100 therefore we compare the rate of

convergence for SSS with resample at every 20, 15, 10, 5 iterations. We denote

SSS-20, SSS-15, SSS-10, SSS-5 respectively.

Figure 5.5 and 5.6 compare the convergence rate between Subsampling Deterministic Opimizations and Stochastic Approximation with different sampling

methods for RUD and RLD types. The the initial average difference (SSD) of

RUD and RLD inputs are 2060 and 780 respectively. With the large size of

the image, we use 7x7 control point grid for all methods. For SSS method, I

re-sample at every 5 iterations, I set = 0.25 for RUD inputs and = 0.75 for

47

methods always outperform Deterministic Optimization methods (Det. methods

take, in average, more than 200 seconds to converge for RUD, and more than 75

seconds for RLD; compared to Stochastic which are 80+ seconds for RUD and

40+ seconds for RLD; and Det. final SSD always higher than Stoc. final SSD)

. We will not examine further Deterministic methods here. Within Stochastic

Approximation class, RSS is more likely to stop at a local minima and its convergence rate is worse compared to Difference Sampling methods. Within Difference

Sampling methods, DSS converges slower than SSS-5, this pattern is expectable

because Stochastic Sampling method gets a new set of samples that fit the current

different parts during registration therefore it follows the misalignment regions

better.

In the next experiment, we review the average performance of Stochastic

methods and also conclude that the more frequent resampling of SSS method,

the better convergence rate. We examine on 100 random tests of RUD input

type and RLD input type. Figure 5.7 shows the results. In general, we can see

that RSS often stops at a local minimum with higher SSD compared to Difference

Sampling methods. DSS performs better than RSS however not as consistent as

SSS. Between SSS methods, SSS-5 produce a best result within a lowest time, the

experiment illustrates the more frequent resampling, the better result SSS would

return.

The final experiments is carried out with the hope of finding a better approach

than any of above methods, however, it turns out to be not succesful, I include it

for completeness of this research paper. We want to analyze the effect of applying

Deterministic Optimizations after a good approximation by Stochastic methods.

We call it Combined method with two phases: In the first 100 iterations, SSS-5 is

used; in the second phase, we apply Deterministic methods based on the current

approximation for maximum of 100 iterations. In contrary, we let SSS-5 to run

for 200 iterations. The experiments results are shown in figure 5.8. As we see

the performance of Combined methods using Deterministic Optimization in the

second phase does not show any advantage to Stochastic methods only. Perhaps if

we let Stochastic runs for more iterations, eg. 200-300 iterations, until it can not

48

produce any better result, then we apply Deterministic Optimization for further

reduction.

49

(a) RandomUniformDeform Convergence by UnWeight Det. methods. Weight methods are not

applicable here because of large deformation

Figure 5.5: Convergence of RUD by Deterministic and Stochastic methods. Data for Det.

Opt. is drawn from last phase of Recursive SubSampling. SGN,SLM,SQN,SNCG are

Subsampling Det. methods. D/R/S-5 SS are Deterministic/ Random/

Stochastic-resample every 5 iterations

50 Sampling Stochastic methods

Figure 5.6: Convergence of RLD by Deterministic and Stochastic methods. Data for Det.

Opt. is drawn from last phase of Recursive SubSampling. SGN,SLM,SQN,SNCG are

Subsampling Det. methods. DSS/RSS/SSS-5 are Deterministic/ Random/

51

Stochastic-resample every 5 iterations Sampling Stochastic methods

Figure 5.7: MRI Knee: Average Performance of Stochastic Approximation methods for two

type of deform: RUD and RLD. DSS/RSS/SSS-suffix are Deterministic/ Random/

Stochastic-resample frequency52Sampling Stochastic methods

(a) Convergence Rate of Combined and Stochastic methods. F is SSD, Time is runtime.

(b) Average Performance on 100 tests of Combined and Stochastic methods. F is SSD, Time is

runtime.

Figure 5.8: Comparison between Combined Stoc-Det and Stochastic methods. DSS-SQN,

RSS-SQN, SSS-SQN are combined methods. DSS, RSS, SSS-5 are Stochastic methods

53

Chapter 6

MATLAB Implementation: Vreg

6.1

Introduction

The code presents all algorithms described in this paper is manually designed

using MATLAB version 7 R14 SP3 (www.mathworks.com) with Spline toolbox.

Some coding notations are learned from (2) and B-spline implementation follows a tutorial by R. Larsen (25). I outline some important evaluations in the

registration framework 3.3

6.2

Implementation

6.2.1

Cost Function F

the cost function F = 0.5*ee.

6.2.2

Image Gradient I

the image with respect to x and y:

[dIx,dIy] = gradient(I)

54

6.2 Implementation

6.2.3

W

p

We construct the transformation based on B-spline model (2.2). The tensor Bspline is defined by two sets of control points (knots) with respect to the row

and the column directions. The knots are placed at uniform spacing. In order

to handle the displacement of image boundaries, we need to add extra 3 knots

on top of the boundaries knots. For instance, a set of knots on one row can be

constructed by:

k = augknt(0:space:rowlength,3)

Each 2D basis function is the tensor product of a row and a column B-spline basis

function. Let the row functions be bxi (x), i=1. . . m and the column functions be

byj (y), j=1. . . n. Then the displacement W = (Wx , Wy ) becomes:

Wx =

Wy =

m X

n

X

i=1 j=1

m X

n

X

bxi (x)byj (y)pyij

i=1 j=1

Use MATLAB spline make function spmak to construct basis function from knots

sequence. We can classify row and column B-spline functions into two sets:

Qx i = bxi (x)

Qy j = byj (y)

Using Knonecker product Q = kron(speye(2),kron(Qx,Qy)) we obtain:

W = I2 Qx Qy p = Qp

where I2 is a sparse identity matrix size 2 2. It is easy to see that

6.2.4

W

p

Other evaluations

interp2(I,Wx,Wy,linear,0)

Other expressions can be computed using simple matrix operations.

55

= Q.

6.3

User Guide

Using Vreg is easy. User needs to run the application on MATLAB, set the

paths to VReg package and all of its subfolders. Once you have done this, call

the register function with desired parameters and let the machine works it out.

The registration process for inputs of resolution less than 512 512 should not

take more than one minute if using an appropriate algorithm.

Given any two 2D images T and I where I is a deformed version of T. We

start the registration process by calling:

[F,p] =

register(T,I,p1,p2,[algo],[recur.phase],[init.warp],[max.iter],[show],[fig])

Only the first four parameters are always essential although user is encouraged

to indicate algo. The rest are not needed however you need to put empty notation

[] at the parameter that you do not want to include. List of parameters:

T,I are filenames of input images. I is the deformed image.

p1,p2 indicates grid of control points p1 p2, eg. 7 7.

algo is the choice of algorithms:

GaussNewton,LevenbergMarquardt,QuasiNewton,NonlinearConjugateGradient

or, alternatively, GN, LM, QN, NCG. User wants to use Weight

methods have to add arguments weight,[] at the end of the call,

ie. after [fig]. Default = 0.75.

RandomSamplingStochastic, DeterministicSamplingStochastic or, alternatively, RSS, DSS. User uses these methods can add arguments [%],[],[A] after [fig] to indicates how many pixels should

be sampled, eg. 0.02 indicates 2% of total pixels. Default values:

%=0.02, =150, A=15.

StochasticSamplingStochastic[-resample] or SSS[-resample] where [-resample]

indicates the frequency of resampling, eg. SSS-5 means resampling

after every 5 iterations. User uses this method can add arguments

[],[],[A] to the end. Note: (0.2, 0.8), larger fewer samples

(5.3.2). Default values: =0.7, =150, A=15.

56

recur.phase is the number of recursive phase we want. A single registration has 0 recursive phase. A Subsampling methods has 1 or more recursive

phases. Note: do not chop down the image too much, it can not be registered, recur.phase should be less than 4. Default values: 3 for Deterministic

methods and always 0 for Stochastic methods.

init.warp is the initial estimation of warped parameters, it should be the

result from a previous registration using this application. Default value is

zeros.

max.iter indicates maximum number of iterations allowed to run. Default

value: 100.

show, fig : show=1 will show the registration process on the figure(fig)

and suitable for demo because it slows down the registration. Default value:

show=0

57

Chapter 7

Conclusion

In this paper, we have discussed the performance of different type of Optimization methods by theoretical review and practical experiments. The choice of

Optimization methods is mainly depends on: the size of the input images and

the type of deformation (random deformation or very localised deformation). We

have shown that the new proposed approaches based on detecting the misalignment parts of the input images can accelerate and produce a better results.

We define Optimizations as either Deterministic approach or Stochastic approach. For each approach, we construct a unify registration framework. Deterministic approach is suitable for small and medium size images while Stochastic

is more suitable for large images.

In Deterministic approach, our extensive study shows that Quasi-Newton is

the better choice compared to Gauss-Newton, Levenberg-Marquardt and Nonlinear Conjugate Gradient methods. In addition, the Recursive Subsampling methods always outperform the methods without Subsampling. We also examine the

effect of applying Weight (based on difference image) to transformation matrix.

The results show that Weight Deterministic methods produce better convergence

rate compared to UnWeight methods for very localised deformed input images.

In Stochastic approach, we have demonstrated that, for localised deformation,

the use of Difference Sampling methods with either Deterministic strategy or

Stochastic strategy produces a better convergence rate compared to Random

Uniform Sampling. In addition, Stochastic Sampling strategy perform slightly

better than Deterministic strategy for most of localised deformation experiments;

58

The incompleteness of this project is its limitation to 2D monomodal images

and no regularisation compensation. The proposed methods in this paper are

all based on difference image, this is an obstacle for multimodal input images.

Although extending to 3D input images and adding regularisation should not

be difficult but extending to mutimodal input images would require some additional functions to compensate bias. Future work can also review a better setting

parameters for Stochastic methods to enhance the convergence speed.

59

Appendix A

Performance of SubSampling

methods on different images

Figure A.1 shows the images used for registration. Lena 256256 indicates the

image Lena has resolution of 256x256.

The box-plot below shows the average performance on 100 tests.

60

Figure A.1: Different type of images used for SubSampling Methods Test

61

62

63

64

References

[1] Larry Armijo. Minimization of functions having lipschitz continuous first

partial derivatives. 1966. 18

[2] Simon Baker and Iain Matthews. Lucas-kanade 20 years on: A unifying

framework. International Journal of Computer Vision, 56:221 255, 2004.

7, 16, 54

[3] Roshni R. Bhagalia.

of Michigan, 2008. 41, 42

[4] Ake Bjrck. Numerical methods for least squares problems. SIAM, 1996. 16

[5] M. Bro-Nielsen and C. Gramkow. Fast fluid registration of medical images.

In Proceedings Visualization in Biomedical Computing, 1996. 8

[6] T. Buzug and J. Weese. Improving dsa images with an automatic algorithm

based om template matching and an entropy measure. Computer assisted

radiology of Excerpta medica - international congress series, 1124:145150,

1996. 10

[7] T.F. Cootes, G.J. Edwards, and C.J. Taylor. Active appearance models. In

Proceedings of the European Conference on Computer Vision, 1998. 10

[8] Y.H. Dai. Convergence of nonlinear conjugate gradient methods. Journal of

Computational Mathematics 19 (5), 539-548, 2001. 18

65

REFERENCES

[9] Y.H. Dai. An efficient hybrid conjugate gradient method for unconstrained

optimization. Ann. Oper. Res., vol. 103, pp. 3347, 2001. 18

[10] Y.H. Dai. A family of hybrid conjugate gradient methods for unconstrained

optimization. Math. Comput., vol. 72, pp. 13171328, 2003. 12, 18

[11] J. E. Dennis, Jr., and J. J. Mor. Quasi-newton methods, motivation and

theory. SIAM Rev, vol. 19, pp. 46-89, 1977. 12, 17

[12] Vandermeulen et. al. Multi-modality image registration within covira. In

Medical imaging: analysis of multimodality 2D/3D images, Vol. 19 of Studies

in health, technology and informatics, pp. 2942, 1995. 3

[13] A. C. Evans, D. L. Collins, P. Neelin, and T. S. Marrett. Correlative analysis

of three-dimensional brain images. Computer-integrated surgery, Technology

and clinical applications, 1996. 10

[14] R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients.

Computer Journal 7, 1964. 17

[15] M. Gleicher. Projective registration with difference decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1997. 10

[16] A. Goshtasby. Image registration by local approximation methods. Image

and Vision Computing 6, 1988. 8

[17] Eldad Haber and Jan Modersitzki. Image registration with guaranteed displacement regularity. Int. J. Comput. Vision, 71(3):361372, 2007. 13

[18] N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in

evolution strategies. Evol. Comput., vol. 9, no. 2, pp. 159195, 2001. 12

[19] Neville Hunt and Sidney Tyrrell. http://www.coventry.ac.uk/ec/ nhunt/meths/strati.html.

44

[20] H. Kesten. Accelerated stochastic approximation. Ann. Math. Stat., 29:41

59, 1958. 41

66

REFERENCES

[21] Stefan Klein, Josien P. W. Pluim, Marius Staring, and Max A. Viergever.

Adaptive stochastic gradient descent optimisation for image registration. International Journal of Computer Vision, 81:227239, 2009. 39, 41

[22] Stefan Klein, Marius Staring, and Josien P.W. Pluim. Evaluation of optimization methods for nonrigid medical image registration using mutual

information and b-splines. IEEE Transactions on Image Processing, 2007.

17, 18, 23, 39

[23] H.J. Kushner and G.G. Yin. Stochastic Approximation and recursive algorithms and applications. Springer-Verlag, New York, 2003. 39

[24] J. Kybic and M. Unser. Fast parametric elastic image registration. IEEE

Trans. Image Process., vol. 12, no. 11, pp. 14271442, 2003. 23

[25] Rasmus Larsen. Medical image analysis non-linear b-spline based image

registration, part 2. tutorial, 2007. 54

[26] Seungyong Lee, George Wolberg, and Sung Yong Shin. Scattered data interpolation with multilevel b-splines. IEEE Transactions on Visualization and

Computer Graphics, 3:228244, 1997. 9

[27] Kenneth Levenberg. A method for the solution of certain non-linear problems

in least squares. The Quarterly of Applied Mathematics, 2:164168, 1944. 7

[28] B. Lucas and T. Kanade. An iterative image registration technique with

an application to stereo vision. In Proceedings of the International Joint

Conference on Artificial Intelligence, 1981. 10

[29] F. Maes, D. Vandermeulen, and P. Suetens. Comparative evaluation of multiresolution optimization strategies for multimodality image registration by

maximization of mutual information. Med. Image Anal., 3:373386, 1999. 7

[30] J. B. Antoine Maintz and Max A. Viergever. A survey of medical image

registration, 1997. 3, 6

[31] Donald Marquardt. An algorithm for least-squares estimation of nonlinear

parameters. SIAM Journal on Applied Mathematics, 11:431441, 1963. 7

67

REFERENCES

[32] D. Mattes, D. R. Haynor, H. Vesselle, T. K. Lewellen, and W. Eubank. Petct image registration in the chest using free-form deformations. IEEE Trans.

Med. Imag., vol. 22, no. 1, pp. 120128, 2003. 23

[33] C. R. Meyer and et. al. Demonstration of accuracy and clinical versatility of mutual information for automatic multimodality image fusion using

affine and thin-plate spline warped geometric deformations. Medical Image

Analysis, 1997. 8

[34] Bryan S. Morse. Image registration, lucas-kanade algorithm. CS 650: Computer Vision lecture notes. vii, 7

[35] J. Nocedal and S. J. Wright. Numerical optimization. Springer, 2006. 7, 11,

12, 16, 17, 18

[36] X. Pennec, P. Cachier, and N. Ayache. Tracking brain deformations in time

sequences of 3d us images. Pattern Recognit. Lett., 24:801813, 2003. 2

[37] H. Robbins and S. Monro. A stochastic approximation method. Ann. Math.

Statist., 22:400407, 1951. 12, 41

[38] D. Rueckert and et. al. Nonrigid registration using free-form deformations:

Application to breast mr images. IEEE Transaction on Medical Imaging, 18,

1999. 8

[39] Daniel Rueckert. Tutorial on image registration. Tutorial. 6

[40] Berc Rustem. Algorithms for equilibria, games and systems of nonlinear

equations. Lecture Notes, 2005. 17

[41] M. R. Sabuncu and P. J. Ramadge. Gradient based nonuniform sampling

for information theoretic alignment methods. Proc. Intl. Conf. IEEE Engr.

in Med. and Biol. Soc., 3:16831686, 2004. 42, 44

[42] Z. J. Shi and J. Shen. New inexact line search method for unconstrained

optimization. 2005. 18

68

REFERENCES

stochastic optimization. IEEE Trans. Aerosp. Electron. Syst., 34:817823,

1998. 41

[44] Shaoyan Sun and Chonghui Guo. Medical image registration by maximizing

a hybrid normalized mutual information. Bioinformatics and Biomedical

Engineering, 2007. 24

[45] P. Viola and W. M. Wells III. Alignment by maximization of mutual information. International conference on computer vision, 1995. 3, 10, 23

[46] Barbara Zitova and Jan Flusser. Image registration methods: a survey.

Image and Vision Computing, 21, 2003. 8

69

- Underbelly Fairing Composites Optimization VersionIIUploaded byswapinky
- Conceptual Framework of the Study FinalUploaded byMariel Enhaynes Reyes
- Pvt KeywordUploaded byahmed_497959294
- Axioms: Neutrosophic Multi-Criteria Decision MakingUploaded byAnonymous 0U9j6BLllB
- Isprs Archives XL 3 W4 123 2016Uploaded byoscarcm
- IJETR021313Uploaded byerpublication
- Satisficing, Optimization, and Adaptive SystemsUploaded byJason Brownlee
- Column OptimizationUploaded byPratyush Mishra
- Levenberg_MarquardtUploaded byChristian Espinoza
- Worked_Examples_03.docUploaded byAbhishek Bhandari
- 37 Paper AskarbekPazylbekovUploaded byfgsdas
- 03 RA47043EN70GLA1 Physical RF OptimizationUploaded byRudy Setiawan
- Conceptual Framework of the Study final.docxUploaded byMariel Enhaynes Reyes
- An Introduction to Optimization Chong Solution Manual PDFUploaded byBhanu Prakash
- 60 Comparison OfUploaded byAnirban Mitra
- Cascading Flowlines and Layout ModulesUploaded byOrchid Corn
- Bokanowski Picarelli Zidani RevisedUploaded bydarraghg
- IMC Mining Integrated Pit Dump and HaulageUploaded byJavier Raya
- Exercises NosolUploaded byDominiqueTame
- Distribution network reconfiguration for loss reduction using fuzzy controlled evolutionaryUploaded byapi-3697505
- Direct Estimate of Closed-Form Dynamic Equation Parameters of a Robot: Tools and Methods for Designing ExperimentsUploaded byfermilella
- MRI BRAIN TUMOUR DETECTION BY HISTOGRAM AND SEGMENTATION BY MODIFIED GVF MODELUploaded byIAEME Publication
- Algorithms 07 00166Uploaded byNohaMohamed
- Yalla2000 Optimum Absorber Parameters for Tuned Liquid Column DampersUploaded byGerardo Alejandro Millar Fernandez
- 2. 2001Uploaded byMohit Jain
- Total Club345Uploaded byBharat Kumar
- VOL2I4P1- Application Of Genetic Algorithms In The Design Optimization Of Three Phase Induction MotorUploaded byJournal of Computer Applications
- Nonlinear Optimization in Gas Networks ZIB-ReportUploaded bystoicadoru
- mbaii_qt.docxUploaded bymanjunatha TK
- PaperUploaded bymidhula

- BackflyUploaded byAsaduz Zaman
- XRayIntrinsicParam_RRI_964_1288.txtUploaded byAsaduz Zaman
- Real Time 2D 3D Registration Using KV MV Image Pairs for Tumor Motion Tracking in Image Guided RadiotherapyUploaded byAsaduz Zaman
- Pattern search for Understanding NLPUploaded byAsaduz Zaman
- Mechanism Design 2Uploaded byAsaduz Zaman
- MOTITEXUploaded byAsaduz Zaman
- Algoritmo Hooke JeevesUploaded byxmas82
- 2D-3D Image Registration Using Regression LearningUploaded byAsaduz Zaman
- Monitoring Tumor Motion by Real Time 2D-3D Registration During RadiotherapyUploaded byAsaduz Zaman
- CNN Regression Approach for 2D-3D RegistrationUploaded byAsaduz Zaman
- (공지용)국외 여비 지급표Uploaded byAsaduz Zaman
- Mechanism DesignUploaded byAsaduz Zaman
- 3-540-45787-9_65Uploaded byAsaduz Zaman
- ______ ______ _ - _______ _______ (1).pdfUploaded byAsaduz Zaman
- Standard Evaluation methodUploaded byAsaduz Zaman
- 61fmaUploaded byAsaduz Zaman
- Us 7889905Uploaded byAsaduz Zaman
- A General Algorithm for Computing DTUploaded byAsaduz Zaman
- UNP_pendulum.pdfUploaded byAsaduz Zaman
- 05_160516Uploaded byAsaduz Zaman
- 06_160516.pptxUploaded byAsaduz Zaman
- Fast DRR generation for intensity-based 2D/3D image registration in radiotherapyUploaded byAsaduz Zaman
- 06_160516 (1)Uploaded byAsaduz Zaman
- Solved Question Papers for Gate Mechanical EngineeringUploaded byAsaduz Zaman
- CDPR AnalysisUploaded byAsaduz Zaman
- The Mathematic Model for the Double Inverted Pendulum Based on State Feedback and T-S ModelUploaded byAsaduz Zaman
- Formula SheetUploaded byAsaduz Zaman
- doubled inverted pendulum on a cart[2]Uploaded byAH Smta

- How to Research ArticleUploaded bySens March
- 3244-10053-1-PB.pdfUploaded byIffatul Fuadah
- Critical Thinking-Problem Solving Life ScienceUploaded byKarla Carballo
- Functional Skills Support ProgrammeUploaded byapi-26828262
- UT Dallas Syllabus for psci6333.501.10s taught by Robert Lowry (rcl062000)Uploaded byUT Dallas Provost's Technology Group
- Delay.pdfUploaded byVimi Bhoobdasur
- Tesla RadiantUploaded bykartikkeyyan
- G-WATCH GUIDE – YOUR PARTNER IN MONITORING GOVERNMENT PROGRAMSUploaded bykasamahumanrights
- The Creation of Customer Loyalty – A Qualitative Research of the Bank SectorUploaded byAnonymous fZqz9Z2
- Contractor Selection Criteria in EgyptUploaded byajgerrard
- Camera ReadyUploaded byJEet Kp
- Consumer Motivation.pptxUploaded byLakshay anand
- Design of Anchor Plates Based on the Component Method; Rybinski & KulhmannUploaded byvcKamp
- The Supply Chain Impact Survey Research ResultsUploaded byzyrixx
- Annotated BibliographyUploaded byAnav Sood
- chapter 2Uploaded byrudreshsj86
- DAFTAR PUSTAKA (1)Uploaded byiin purnamasari
- Shen, Yuan, Ewing ell website from chinese practioners perspective.pdfUploaded byRolf Naidoo
- 3D: Designing Competitive Advantage.Uploaded byBruno Granero
- mung bean lab reportUploaded byapi-266083965
- Morre Si 2014Uploaded byCristobal Matias Paredes Peñafiel
- machintosh probe testUploaded byFatin Samsudin
- Project report on decision making process in the purchaseof bikeUploaded byk0xx
- DispepsiaUploaded byAhmad Athaillah
- 2019 03 Exam p SyllabiUploaded byPittnanPanichakan
- Dr. Steve Hickey & Dr. Hilary Roberts Ascorbate the Science of Vitamin CUploaded byAgime Ukella
- G. Clark Word Hesed in the Hebrew Bible JSOT Supplement 1993.pdfUploaded byVeteris Testamenti Lector
- Structure-Activity Relationships of Indole- And Pyrrole-Derived Cannabinoids - JPET, June 1998, 285(3), 995-1004Uploaded bymuopioidreceptor
- Kosovo DiasporaUploaded byDimitri Rachowski
- Gender Inequality at Wok PlaceUploaded byrowanpurdy