Professional Documents
Culture Documents
1
Center for Computational Imaging and Simulation Technologies in Biomedicine,
School of Computing, University of Leeds, Leeds,UK
2
Biomedical Imaging Department, Leeds Institute for Cardiovascular and Metabolic
Medicine, School of Medicine University of Leeds, Leeds, UK
3
Department of Cardiovascular Sciences, KU Leuven, Leuven, Belgium
4
Department of Electrical Engineering, KU Leuven, Leuven, Belgium
5
Alan Turing Institute, London, UK
1 Introduction
Image registration involves establishing spatial correspondence between a given
pair or group of images, which is fundamental for many downstream medical
imaging applications (e.g. image fusion, atlas-based segmentation, image-guided
interventions, organ motion tracking and strain analysis, amongst others). Re-
cently, deep learning-based methods have found widespread use in medical image
registration, achieving comparable or better performance than traditional regis-
tration methods, and yielding substantial speed-ups in execution relative to the
latter. Among them, unsupervised and weakly-supervised methods are the most
popular as they do not require ground-truth deformation fields to be available.
Unsupervised methods [45,3,13] do not need any ground-truth annotations (e.g.
segmentation, landmarks and ground-truth deformation fields) for training and
rely just on the information available in the pair/group of images to be registered.
These approaches have been shown to achieve similar or better registration per-
formance than traditional registration methods [8], at a fraction of the execution
time. Weakly-supervised methods [4,14], however, require some annotations (e.g.
landmarks, segmentation masks) for training, but have been shown to improve
registration performance relative to their unsupervised counterparts.
Currently, most deep learning-based registration methods assume globally
smooth and continuous deformation fields throughout the image domain, using
regularisation like L2 norm of deformation fields to ensure that. However, this
assumption is not appropriate for all medical image registration applications,
especially when there are physical discontinuities resulting in sliding motion be-
tween organs/soft tissues that must be estimated to register the input images.
For example, respiratory motion resulting from inflation and deflation of the
lungs during breathing contains discontinuities between the lungs, the pleural
sac encompassing the lungs and the surrounding rib cage. The pleural sac itself
contains two layers that slide over one another as the lung inflates and deflates
resulting in what is perceived as a sliding motion at the boundaries of the lungs.
Enforcing deformation fields to be completely smooth when registering thoracic
images of any given individual to recover breathing motion would result in phys-
ically unrealistic deformation fields and artefacts near lung boundaries.
Generally, the sliding of organs and different material properties of different
sub-regions in the input images may cause the deformation fields to be locally
smooth but globally discontinuous [18]. In previous research, many traditional
methods have been proposed to achieve discontinuity-preserving image regis-
tration [18,46,39,33,19,15,44,32,25,49]. The fundamental goal of discontinuity-
preserving registration is to predict deformation fields which are locally smooth,
i.e. within each sub-region, while, discontinuous globally, such as at the interface
between different regions/organs. A simple solution to this problem is to regis-
ter the corresponding sub-organs in the input images independently and then
compose them to obtain the final deformation field [46]. Another approach is to
reformulate the regularisation constraint enforced on the estimated deformation
field/functions, to allow for discontinuities at points near the interface between
different tissues/organs [39,33]. However, for those methods, the label informa-
Title Suppressed Due to Excessive Length 3
objects and the rest of the images like background) as additional supervision.
Additionally, co-attention has also been used for the segmentation of the same
object in different time frames, in a video sequence for example. Ahn et al. [1]
proposed a Multi-frame Attention Network to learn highly correlated spatio-
temporal features in a sequence. Experiments demonstrated that their method
significantly outperformed other competing deep learning-based methods. Fur-
thermore, Yang [48] proposed a zero-shot object detection approach for analysing
video sequences, using co-attention to learn motion patterns. They empirically
demonstrated that their approach outperformed previous zero-shot video object
segmentation approaches, while requiring fewer training data.
In this study, we tackle the problem of intra-subject registration of cardiac
cine-MR images, acquired at different phases/time points in the cardiac cy-
cle. In intra-subject cardiac image registration the moving and fixed images
are different/deformed representations of the same heart, acquired at different
time points in the cardiac cycle (e.g. at end-diastole (ED) to end-systole (ES)).
Hence, we hypothesise that joint segmentation of both the fixed and moving
images using co-attention can yield more consistent segmentations of the car-
diac regions/structures of interest, and in turn improve the overall registration
performance of the proposed approach.
Most deep learning-based image registration methods assume the desired de-
formation fields to be globally smooth and continuous, and do not consider the
presence or relevance of discontinuities at structure/organ boundaries and their
impact on the image registration task. To our best knowledge, only two previous
studies have attempted to preserve discontinuities at object/structure bound-
aries in deep learning-based image registration [30,9]. Ng et al. [30] addressed
this issue in an unsupervised manner. They proposed an discontinuity preserving
regularisation term by comparing local displacement vectors with neighboring
displacement vectors individually, which was able to tackle specific behaviors
on the discontinues deformation fields. Without ground-truth information spec-
ifying the locations of discontinuous boundaries, their registration performance
did not show significant improvements than traditional approaches. In contrast,
we previously [9] proposed a deep discontinuity-preserving image registration
(DDIR) approach, to generate locally smooth sub-deformation fields for each im-
age sub-region, which were then composed to obtain locally smooth and globally
discontinuous deformation fields. Although [9] was shown to outperform the state
of the art, its need for segmentation masks delineating the objects/structures of
8 X. Chen et al.
2 Method
In this paper, we focus on pair-wise image registration, aiming to establish spatial
correspondence between the moving image IM and fixed image IF . This task can
be formulated as,
φ(x) = x + u(x), (1)
where, x is the coordinate of voxels/pixels in the moving image IM , u(x) and
φ(◦) represents the displacement field and the deformation function, respectively.
To preserve discontinuities during image registration, similar to our previ-
ous study [9], we decompose the fixed and moving images into corresponding
pairs of image sub-regions, register each pair and combine the obtained sub-
deformation fields to obtain the final deformation field used to warp the moving
image. Different from [9], in this study we propose a joint segmentation and
discontinuity-preserving image registration (SDDIR) approach, which includes
an additional segmentation sub-network in DDIR [9], to jointly segment the
fixed and moving images. The primary motivation for the approach proposed in
this study is to ameliorate the need for separately sourcing segmentation masks
(either manually or automatically) for the images to be registered, as required
by DDIR. As shown in Fig. 1, our SDDIR includes a segmentation sub-network
and a registration sub-network. The input fixed and moving images are first fed
into the segmentation branch and tissue/organ specific segmentation masks are
predicted for each image. For example, the focus of this study is on intra-subject
cardiac cine-MR image registration, and given input cine-MR images, four-class
segmentation masks are predicted, delineating the following regions - left ventric-
ular myocardium, left ventricular blood pool, right ventricular blood pool and
background tissue. Subsequently, the fixed and moving images, and their pre-
dicted segmentation masks are fed into the discontinuity-preserving image regis-
tration branch, which predicts region/structure-specific sub-deformation fields,
and composes them in to a final deformation field used to warp the moving
image. The segmentation and registration branches are trained jointly end-to-
end, as a single network, using a combined composite loss function. In subse-
quent sections, we describe the co-attention based segmentation sub-network,
discontinuity-preserving image registration sub-network, and the composite loss
function used in the proposed approach.
Deformation Field
(𝜙z0)
Ix Sx
Pair of LVBP
Spatial Transform
Pair of LVM Warped
Deformation Field 𝜙z Segmentation
SLVM
Iy Sy (𝜙z1)
Pair of RV
U-Net
Warped
Pair of Background Deformation Field Image
SRV
(𝜙z2)
Co-attention based
Segmentation
Ix,Iy: Moving and Fixed Images Sx,Sy: Predicted Moving and Fixed Segmentation
h2(y)
1×1×1 conv
ATTy
g(y)
y transpose
1×1×1 conv
f(x) Soft
max similarity matrix
1×1×1 conv
h1(x)
ATTx
x
1×1×1 conv
Fig. 2. Schema of co-attention based segmentation. The x and y denotes the feature
of moving and fixed images, respectively. The cardiac MR images were reproduced by
kind permission of UK Biobank ©.
S = f (Fmov ) × g(Ff ix )T ,
ATTmov = Sof tmax(S) × h2 (Ff ix ),
ATTf ix = Sof tmax(ST ) × h1 (Fmov ), (2)
Omov = Concat(Fmov , σ(ATTmov ) · ATTmov ),
Of ix = Concat(Ff ix , σ(ATTf ix ) · ATTf ix ),
where, Omov and Of ix are the output feature maps (learned new representations)
of Fmov and Ff ix , following application of their estimated co-attention maps,
respectively. Sof tmax(◦) is the Softmax function, applied to the last channel of
the similarity matrix S. σ(◦) denotes the Sigmoid function, which is a 1 × 1 × 1
convolution layer followed by a Sigmoid activation. Concat(◦) is to concatenate
the input feature with the corresponding attention maps, comprising a concate-
nation, a 1 × 1 × 1 convolution layer, a batch-normalisation and an activation
layer.
Following the two upsampling blocks in the decoder, the co-attention feature
maps of the fixed and moving images are recovered to the original size and
resolution of the inputs images. A 3 × 3 × 3 convolution followed by a Softmax
activation function is used to predict the segmentation masks of the moving
and fixed images. The focus in this study is on intra-subject spatio-temporal
registration of cine-MR image sequences, i.e. pair-wise registration of images
acquired at different time points in the cardiac cycle. We train and evaluate
the performance of SDDIR on cardiac cine-MR images available from the UK
Title Suppressed Due to Excessive Length 11
The segmentation masks predicted for the input fixed and moving images (by the
segmentation sub-network) are passed as inputs along with their corresponding
original images, to the registration sub-network. To estimate the desired locally
smooth and globally discontinuous deformation field mapping the moving im-
age to the fixed image, we first predict four different smooth sub-deformation
fields for each of the four sub-regions of interest in the cardiac MR images (i.e.
LVBP, LVM, RV and background), and then compose them to obtain the final
deformation field.
field. Finally, a spatial transform network (STN) is used to warp the moving im-
age and predicted segmentation using the composed discontinuous deformation
field.
mov mov f ix f ix
where, Spre , Sgt , Spre , Sgt are the predicted and ground-truth segmentations
of the moving and fixed images, respectively.
The image similarity loss evaluates the dissimilarity between the warped
moving image and the fixed image. In this paper, we use the mean squared error
to evaluate the distance between them, formulated as,
W ×H×D
1 X
LM SE = (Imov ◦ φ − If ix )2 . (5)
W ×H ×D i
Title Suppressed Due to Excessive Length 13
R(φ) = || 5 u||2 ,
1 (7)
LR = (RLV BP + RLV M + RRV + Rbackground ),
4
where LR denotes the combined regularisation terms for each sub-region.
The complete loss function used to train the network is,
were used for training, validation and testing. The ground-truth segmentation
masks for the UKBB dataset were manually annotated by experts, as part of
a previous study [34]. To verify the generalisation and robustness of the pro-
posed approach, we also apply the trained model (i.e. trained on UKBB data)
to images from the ACDC and M&M datasets. Similarly, we choose the ED and
ES SAX images from 100 subjects (totally 200 samples) in the ACDC dataset,
whose ground-truth segmentation masks are available. In the M&M dataset, 300
samples (registration from ED to ES and from ES to ED) are extracted from
150 subjects. Each image volume in ACDC and M&M is pre-processed similarly
to the UKBB data, resulting in images of size 128 × 128 × 16 using resampling,
cropping and padding. Note that, to reduce the domain gap between different
datasets, histogram-matching is applied to the ACDC and M&M images, using
a random image volume from UKBB as the reference.
The SDDIR was implemented in Python using PyTorch, on a Tesla M60 GPU
machine. The Adam optimiser with a learning rate of 1e-3 was used to optimise
the network. We set the batch-size to 3, due to limitations in GPU memory
available. The hyper-parameters in the total loss Ltotal λ0 , λ1 , λ2 and λ3 were
tuned empirically and were set to 0.1, 1, 0.1, 0.01, respectively, throughout all
experiments presented in this study. The number of integration steps in the
integration layer is 7, following [14,4]. The source code will be publicly available
on Github (https://github.com/cistib/DDIR).
Table 1. Quantitative comparison on UKBB between SDDIR and state of the art
methods. Statistically significant improvements in registration and segmentation accu-
racy (DS and HD95) are highlighted in bold. Besides, LVEDV and LVMM indices with
no significant difference from the reference (before Reg) are also highlighted in bold.
Methods Avg. DS (%) LVBP DS (%) LVM DS (%) RV DS (%) HD95 (mm) Seg DS (%) LVEDV LVMM
before Reg 43.75 ± 5.47 63.14 ± 14.17 52.93 ± 13.14 68.12 ± 15.36 15.44 ± 4.56 - 157.53 ± 32.78 99.50 ± 27.89
B-spline 66.23 ± 6.70 74.26 ± 9.54 62.12 ± 8.08 62.31 ± 7.88 15.92 ± 4.92 - 140.34 ± 41.13 100.27 ± 30.25
Demons 68.29 ± 6.03 77.51 ± 8.32 63.79 ± 7.17 63.56 ± 7.58 14.44 ± 4.58 - 139.64 ± 37.30 102.53 ± 30.26
SyN 54.78 ± 5.60 65.15 ± 6.38 41.09 ± 8.34 58.11 ± 6.89 16.10 ± 4.82 - 150.19 ± 33.11 93.44 ± 28.22
VM 74.18 ± 4.96 85.50 ± 6.38 69.43 ± 6.55 67.61 ± 7.44 12.74 ± 4.60 - 151.85 ± 33.76 97.08 ± 29.25
VM-Dice 79.80 ± 4.42 86.96 ± 5.80 71.72 ± 6.77 80.71 ± 5.90 8.67 ± 4.56 - 162.09 ± 34.40 100.36 ± 28.57
Baseline 79.95 ± 4.38 88.33 ± 5.08 73.61 ± 6.28 77.91 ± 6.58 9.51 ± 4.25 87.31 ± 2.95 156.93 ± 33.64 98.46 ± 28.53
VM(compose) 69.77 ± 5.72 75.77 ± 5.57 57.50 ± 9.59 76.03 ± 6.38 10.06 ± 2.88 - 128.89 ± 41.42 100.54 ± 30.34
VM-Dice(compose) 65.95 ± 6.93 74.36 ± 7.26 52.66 ± 11.13 70.84 ± 6.62 11.10 ± 2.98 - 124.06 ± 44.59 102.38 ± 29.35
DDIR 86.54 ± 4.30 90.47 ± 4.88 80.62 ± 6.20 88.53 ± 5.14 6.59 ± 4.37 - 155.65 ± 32.92 101.65 ± 27.87
SDDIR 84.46 ± 3.64 90.75 ± 3.84 78.07 ± 6.48 84.56 ± 5.21 7.54 ± 4.83 88.60 ± 2.06 156.02 ± 32.66 101.28 ± 27.90
SDDIR(-DC) 80.41 ± 4.18 88.41 ± 5.03 74.24 ± 6.29 78.58 ± 6.46 9.44 ± 4.10 88.07 ± 2.44 155.53 ± 33.50 98.43 ± 28.44
16 X. Chen et al.
Warped Images
Deformation Fields
Fixed Image
Fig. 3. Visual comparison of results on UKBB estimated using SDDIR and state of
the art methods. Left column: moving and fixed images; Right column: corresponding
warped moving image (first row), deformation fields (second row). The cardiac MR
images were reproduced by kind permission of UK Biobank ©.
Quantitative registration results obtained for the unseen test set from UKBB are
summarised in Table 1. The Baseline joint segmentation and registration network
achieves higher Dice scores on the registration results than those solely designed
for image registration (e.g. Demons, B-spline and VM). The composition meth-
ods VM(compose) and VM-Dice(compose) do not show any improvements over
VM and VM-Dice, and perform consistently worse than VM-Dice across all met-
rics evaluated. While the Baseline network significant outperforms VM across
all metrics, its average Dice score (computed across all three cardiac structures,
LVBP, LVM and RV), only marginally outperforms VM-Dice, and it performs
worse than VM-Dice in terms of HD95. Both the Baseline and VM-Dice networks
use the Dice loss to guide the training of their constituent registration networks.
The architectures of the constituent registration networks are almost identical,
leading to similar performance of the Baseline and VM-Dice networks. SDDIR
significantly outperforms the Baseline network in terms of both the Dice score
and HD95, highlighting the superior registration performance of the proposed
approach. The DDIR approach, which uses manually annotated ground truth
segmentation masks for decomposing the input images into regions of interest
during inference, achieves the best registration performance of all methods in-
vestidated. DDIR achieves an average Dice score 5% higher than SDDIR and an
HD95 score that is on average 2 mm lower than SDDIR. In terms of the clinical
indices derived from the registered/warped moving image following registration,
both DDIR and SDDIR show no significant differences with respect to the ref-
erence (with indices derived for SDDIR being closer to the reference). While
DDIR shows some improvements over SDDIR in terms of registration accuracy,
the latter does not require segmentation masks to be provided as inputs during
inference (as required by DDIR). SDDIR thus has more flexibility in its util-
Title Suppressed Due to Excessive Length 17
To assess the ability of the proposed approach to generalise to unseen data repre-
sentative of real-world data acquired routinely in clinical examinations, we apply
the SDDIR model pre-trained on UKBB data, to other external cardiac MR data
sets, namely, ACDC and M&M. Data available in ACDC and M&M were ac-
quired at multiple different imaging centres distributed across different countries,
using different types of MR scanners, and from patients diagnosed with differ-
ent types of cardiac diseases/abnormalities (e.g. myocardial infarction, dilated
cardiomyopathy, hypertrophic cardiomyopathy, abnormal right ventricle). Gen-
eralising to such unseen data is challenging due to domain shifts in the acquired
images, relative to the UKBB data used for training SDDIR.
Table 2. Quantitative comparison on ACDC between SDDIR and the state of the
art methods. Statistically significant improvements in registration accuracy (DS and
HD95) are highlighted in bold. Besides, LVEDV and LVMM indices with no significant
difference from the reference are also highlighted in bold.
Methods Avg. DS (%) LVBP DS (%) LVM DS (%) RV DS (%) HD95 (mm) Seg DS (%) LVEDV LVMM
before Reg 60.24 ± 11.19 65.70 ± 16.22 51.90 ± 14.50 63.14 ± 14.17 10.60 ± 3.88 - 165.13 ± 73.57 130.09 ± 50.58
B-spline 74.96 ± 9.55 80.40 ± 14.16 75.72 ± 7.84 68.75 ± 15.50 10.96 ± 4.66 - 152.52 ± 82.11 134.36 ± 51.37
Demons 73.54 ± 8.83 78.42 ± 12.58 72.34 ± 8.90 69.87 ± 13.79 10.52 ± 3.92 - 147.02 ± 81.01 137.06 ± 53.38
SyN 70.03 ± 7.58 79.66 ± 10.56 66.08 ± 8.59 64.35 ± 14.69 10.56 ± 3.47 - 155.95 ± 75.71 133.03 ± 51.74
VM 76.02 ± 7.84 83.85 ± 10.42 74.15 ± 8.25 70.07 ± 13.99 9.85 ± 4.36 - 156.62 ± 75.93 130.54 ± 51.92
VM-Dice 77.75 ± 7.40 84.67 ± 9.92 73.86 ± 8.24 74.71 ± 12.39 8.03 ± 3.91 - 164.70 ± 77.20 134.75 ± 54.01
Baseline 78.64 ± 6.76 86.31 ± 8.64 76.60 ± 7.42 73.02 ± 12.55 9.19 ± 4.00 66.40 ± 18.39 159.99 ± 74.93 132.48 ± 52.60
VM(compose) 72.57 ± 10.13 76.85 ± 14.08 64.03 ± 13.67 76.84 ± 12.45 9.73 ± 2.99 - 137.96 ± 82.08 129.18 ± 54.00
VM-Dice(compose) 71.01 ± 9.22 77.29 ± 11.92 62.54 ± 12.63 73.21 ± 12.71 10.03 ± 3.24 - 140.07 ± 81.06 130.72 ± 53.44
DDIR 87.79 ± 5.56 91.53 ± 6.62 85.49 ± 7.16 86.34 ± 9.13 5.86 ± 4.54 - 161.03 ± 74.98 134.96 ± 53.08
SDDIR 76.13 ± 8.66 83.75 ± 12.59 73.22 ± 10.38 71.42 ± 13.10 9.38 ± 3.50 72.64 ± 15.17 155.40 ± 73.17 134.31 ± 57.69
SDDIR(-DC) 79.61 ± 6.67 86.97 ± 8.52 77.73 ± 6.91 74.12 ± 12.49 8.84 ± 3.90 70.10 ± 13.86 160.57 ± 75.90 132.40 ± 52.52
The quantitative and qualitative results obtained for data from ACDC are
shown in Table 2 and Fig. 4, respectively. For the results on ACDC we ob-
served that B-spline and Demons achieve better registration performance than
VM, while the SyN algorithm is still worse than VM (by ∼ 2% in terms of the
average Dice score). The VM-Dice and Baseline networks are significantly bet-
ter than VM and two composition-based image registration approaches, namely,
18 X. Chen et al.
Warped Images
Deformation Fields
Fixed Image
Fig. 4. Visual comparison of results on ACDC estimated using SDDIR and state of
the art methods. Left column: moving and fixed images; Right column: corresponding
warped moving image (first row), deformation fields (second row).
Baseline
Fixed Image
GT Segmentation
SDDIR
Baseline
Fixed Image
SDDIR
Fig. 5. Visual comparison of segmentation estimated using SDDIR and Baseline. The
first two rows are results on UKBB (green dashed border) and the bottom two rows (red
dashed border) are the results on ACDC. Left column: moving and fixed images with
the corresponding segmentation; Right column: corresponding moving segmentation,
fixed segmentation, warped moving image, deformation fields. The cardiac MR images
were reproduced by kind permission of UK Biobank ©.
(e.g. the segmentation of the right ventricle, plotted in blue in Fig. 5). In the
UKBB dataset, the results of SDDIR are more similar to the ground-truth, but
there is over-segmentation in the results of Baseline (see the region of the right
ventricle in the moving segmentation). There is a domain gap from the UKBB to
ACDC dataset, which leads to the decreased segmentation Dice for both meth-
ods. However, SDDIR offers some improvements over the Baseline, for example,
by capturing the right ventricle in cases where it is entirely missed by the latter.
In summary, due to the co-attention block, SDDIR is more robust (than the
Baseline) for segmenting the input pairs of images in the presence of domain
shifts.
Title Suppressed Due to Excessive Length 21
segmentation masks are seldom available, and has the added benefit of produc-
ing good quality segmentation masks for both input images to be registered, as
auxilliary outputs. Although the globally smooth version of SDDIR (i.e. without
discontinuity composition), SDDIR(-DC), incurs significantly higher registration
errors than SDDIR for the UKBB data set, it consistently outperforms other
state-of-the-art approaches in terms of registration accuracy and joint registra-
tion and segmentation performance, across all three datasets. Considering those
scenarios where globally smooth registration is required/appropriate, SDDIR(-
DC) can be employed in place of SDDIR to reduce dependency of registration
accuracy on segmentation quality, and improve overall registration performance.
For example, results reported for ACDC and M&M data sets in Tables 2, 3 show
that SDDIR(-DC) outperforms SDDIR in terms of registration accuracy, at the
cost of enforcing global smoothness on the estimated deformation fields.
The proposed approach, SDDIR, is versatile and can be employed in various clin-
ical applications requiring pair-wise image registration. For example, the ability
to simultaneously segment and register cardiac MR images means that SDDIR
can facilitate real-time quantification of cardiac clinical indices across the car-
diac cycle and quantitative analysis of cardiac motion. In addition, as SDDIR
can predict both fixed segmentation and warped moving segmentation, a more
anatomical structure-plausible segmentation results can be obtained by joint
considering those two predictions. SDDIR is agnostic to image modality and the
organs/structures visible within the field of view of the images being registered.
Hence, SDDIR may also be used to jointly segment and register thoracic or ab-
dominal CT images, where strong discontinuities exist between organ structures
due to their relative motion (i.e. sliding at organ boundaries) resulting from
respiration.
Although the proposed approach is demonstrated to jointly segment and reg-
ister input pairs of images accurately, outperforming several state-of-the-art ap-
proaches, one main limitation remains. The registration performance of SDDIR
is highly dependent on the performance of the segmentation sub-network, i.e. on
the quality of the segmentation masks predicted for the input pair of images to be
registered. As the registration sub-network requires the predicted segmentation
masks to split the original MR images into pairs of corresponding regions, the
registration sub-network performs poorly when the quality of the segmentation
masks is poor. Thus, SDDIR performs well on the UKBB dataset as accurate
segmentation masks are predicted for the input pairs of images and used to ef-
fectively guide the discontinuity-preserving registration. As SDDIR was trained
using UKBB data, the segmentation sub-network was able to generalise well to
unseen data from UKBB due to homogeneity/consistency in appearance across
images from different subjects. Conversely, SDDIR’s performance significantly
degraded when the trained model (on UKBB data) was used to register images
from other datasets (e.g. ACDC and MM). This is due to domain shifts result-
Title Suppressed Due to Excessive Length 23
ing from variations in imaging scanners and protocols, used to acquire images
in different datasets. The presented results indicate that SDDIR outperforms
the Baseline network in terms of registration accuracy only when good quality
segmentation masks are predicted by its constituent segmentation sub-network.
Specifically, we found that segmentation accuracy (in terms of Dice) on the fixed
and moving images had to be over 76% for ACDC and 70% for MM, for the sub-
sequent registration accuracy of SDDIR to be better than the Baseline network.
Therefore, future work in the field should look to improve the robustness of
discontinuity-preserving image registration methods to domain shifts that are
commonly found in medical images. This may be achieved by imbuing SDDIR
with recent approaches to domain generalisation, for example, to mitigate for
the drop in segmentation and registration performance resulting from domain
shifts (relative to the training data). Additionally, the over-dependence of reg-
istration quality on the quality of the segmentation masks predicted by SDDIR
could be relaxed by modelling object/tissue boundaries as weak discontinuities
(as opposed to strong discontinuities used currently in SDDIR) that are incor-
porated into the regularisation of the deformation field to ensure locally smooth
and globally discontinuous deformation fields.
In this paper, we propose a novel weakly-supervised discontinuity-preserving
registration network, SDDIR. The proposed approach is applied to the task of
intra-patient spatio-temporal CMR registration, to jointly segment the input pair
of images to be registered and predict a locally smooth but globally discontin-
uous deformation field that warps the source/moving image to the fixed/target
image. Compared with previous discontinuity-preserving registration methods,
SDDIR provides improvements in terms of execution speed (relative to tradi-
tional iterative approaches), and does not required segmentation masks to be
available prior to registering images (unlike some state-of-the-art deep learning
based registration approaches such as DDIR). We demonstrate the registration
performance of SDDIR on three cardiac MR datasets, and prove that it can sig-
nificantly outperform both traditional and deep learning-based state-of-the-art
registration methods. Future works will explore domain generalisation techniques
to mitigate for the drop in performance observed with SDDIR due to domain
shifts and will look to weaken the dependency on segmentation quality to ensure
accurate image registration.
Acknowledgements
This research was conducted using the UKBB resource under access applica-
tion 11350 and was supported by the Royal Academy of Engineering under the
RAEng Chair in Emerging Technologies (CiET1919/19) scheme.
References
1. Ahn, S.S., Ta, K., Thorn, S., Langdon, J., Sinusas, A.J., Duncan, J.S.: Multi-frame
Attention Network for Left Ventricle Segmentation in 3D Echocardiography. In:
24 X. Chen et al.
16. Dong, P., Wang, L., Lin, W., Shen, D., Wu, G.: Scalable Joint Segmentation
and Registration Framework for Infant Brain Images. Neurocomputing 229, 54–62
(2017)
17. Droske, M., Rumpf, M.: Multiscale Joint Segmentation and Registration of Image
Morphology. IEEE Transactions on Pattern Analysis and Machine Intelligence
29(12), 2181–2194 (2007)
18. Hua, R.: Non-rigid Medical Image Registration with Extended Free Form Deforma-
tions: Modelling General Tissue Transitions. Ph.D. thesis, University of Sheffield
(2016)
19. Hua, R., Pozo, J.M., Taylor, Z.A., Frangi, A.F.: Multiresolution eXtended Free-
Form Deformations (XFFD) for Non-rigid Registration with Discontinuous Trans-
forms. Medical Image Analysis 36, 113–122 (2017)
20. Jud, C., Sandkühler, R., Möri, N., Cattin, P.C.: Directional Averages for Motion
Segmentation in Discontinuity Preserving Image Registration. In: International
Conference on Medical Image Computing and Computer-Assisted Intervention.
pp. 249–256. Springer (2017)
21. Krebs, J., Delingette, H., Mailhé, B., Ayache, N., Mansi, T.: Learning A Probabilis-
tic Model for Diffeomorphic Registration. IEEE Transactions on Medical Imaging
38(9), 2165–2176 (2019)
22. Lei, B., Huang, S., Li, H., Li, R., Bian, C., Chou, Y.H., Qin, J., Zhou, P., Gong, X.,
Cheng, J.Z.: Self-co-attention neural network for anatomy segmentation in whole
breast ultrasound. Medical Image Analysis 64, 101753 (2020)
23. Li, B., Niessen, W.J., Klein, S., de Groot, M., Ikram, M.A., Vernooij, M.W., Bron,
E.E.: A Hybrid Deep Learning Framework for Integrated Segmentation and Regis-
tration: Evaluation on Longitudinal White Matter Tract Changes. In: International
Conference on Medical Image Computing and Computer-Assisted Intervention. pp.
645–653. Springer (2019)
24. Li, B., Sun, Z., Li, Q., Wu, Y., Hu, A.: Group-wise Deep Object Co-segmentation
with Co-attention Recurrent Neural Network. In: Proceedings of the IEEE/CVF
International Conference on Computer Vision. pp. 8519–8528 (2019)
25. Li, D., Zhong, W., Deh, K.M., Nguyen, T.D., Prince, M.R., Wang, Y., Spince-
maille, P.: Discontinuity Preserving Liver MR Registration with Three-dimensional
Active Contour Motion Segmentation. IEEE Transactions on Biomedical Engineer-
ing 66(7), 1884–1897 (2018)
26. Lu, X., Wang, W., Ma, C., Shen, J., Shao, L., Porikli, F.: See More, Know More:
Unsupervised Video Object Segmentation with Co-attention Siamese Networks. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
pp. 3623–3632 (2019)
27. Marstal, K., Berendsen, F., Staring, M., Klein, S.: SimpleElastix: A User-friendly,
Multi-lingual Library for Medical Image Registration. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition Workshops. pp. 134–142
(2016)
28. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully Convolutional Neural Net-
works for Volumetric Medical Image Segmentation. In: 2016 Fourth International
Conference on 3D Vision (3DV). pp. 565–571. IEEE (2016)
29. Mo, S., Cai, M., Lin, L., Tong, R., Chen, Q., Wang, F., Hu, H., Iwamoto, Y.,
Han, X.H., Chen, Y.W.: Mutual Information-Based Graph Co-Attention Networks
for Multimodal Prior-Guided Magnetic Resonance Imaging Segmentation. IEEE
Transactions on Circuits and Systems for Video Technology (2021)
26 X. Chen et al.
46. Wu, Z., Rietzel, E., Boldea, V., Sarrut, D., Sharp, G.C.: Evaluation of Deformable
Registration of Patient Lung 4DCT with Subanatomical Region Segmentations.
Medical Physics 35(2), 775–781 (2008)
47. Xu, Z., Niethammer, M.: DeepAtlas: Joint Semi-supervised Learning of Image
Registration and Segmentation. In: International Conference on Medical Image
Computing and Computer-Assisted Intervention. pp. 420–429. Springer (2019)
48. Yang, S., Zhang, L., Qi, J., Lu, H., Wang, S., Zhang, X.: Learning Motion-
Appearance Co-Attention for Zero-Shot Video Object Segmentation. In: Proceed-
ings of the IEEE/CVF International Conference on Computer Vision. pp. 1564–
1573 (2021)
49. Zhang, J., Chen, K., Yu, B.: An Improved Discontinuity-preserving Image Regis-
tration Model and Its Fast Algorithm. Applied Mathematical Modelling 40(23-24),
10740–10759 (2016)