Complete Thesis 004

University of Groningen
Deep learning-based cone beam CT correction for adaptive proton therapy

Thummerer, Adrian
DOI:
10.33612/diss.589889430
IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from
it. Please check the document version below.
Document Version
Publisher's PDF, also known as Version of record
Publication date:
2023
Link to publication in University of Groningen/UMCG research database
Citation for published version (APA):

Thummerer, A. (2023). Deep learning-based cone beam CT correction for adaptive proton therapy. [Thesis
fully internal (DIV), University of Groningen]. University of Groningen.
https://doi.org/10.33612/diss.589889430
Copyright
Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the
author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).
The publication may also be distributed here under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license.
More information can be found on the University of Groningen website: https://www.rug.nl/library/open-access/self-archiving-pure/taverne-
amendment.
Take-down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the
number of authors shown on this cover page is limited to 10 maximum.
Download date: 01-05-2023

Deep learning-based
cone beam CT correction
for adaptive proton therapy 1
Adrian Thummerer
Adrian Thummerer
Deep learning-based cone beam CT correction for adaptive proton therapy
Thesis, University of Groningen, The Netherlands
The studies presented in this thesis were financially supported by a grant from the Dutch
Cancer Society (KWF research project 11518), called 'INCONTROL - Clinical Control Infras-
tructure for Proton Therapy Treatments'.
Printing of this thesis was financially supported by Elekta.
Cover: The cover on the front of this thesis was created using the pre-trained deep learning
model DALL-E 2 (openai.com/dall-e-2/) using "oil panting of a neural network" as input text.
Layout: office manager
© Copyright, A. Thummerer, Groningen, 2023.

All rights reserved. No part of this thesis may be produced or transmitted in any form or by
any means, electronic or mechanical, including photocopying, recording, or by any informa-
tion storage or retrieval system, without written permission of the author. The copyright of
previously published chapters of this thesis remains with the publisher of the journal.
Deep learning-based cone beam CT 1
correction for adaptive proton
therapy
PhD thesis
to obtain the degree of PhD at the

University of Groningen
on the authority of the
Rector Magnificus Prof. C. Wijmenga
and in accordance with
the decision by the College of Deans.
This thesis will be defended in public on
Monday 27 March 2023 at 09:00 hours
by
Adrian Thummerer
born on 10 September 1993

in Amstetten, Austria
Supervisors
Prof. Dr. S. Both
Prof. Dr. J. A. Langendijk
Assessment Committee
Prof. S. Korreman
Prof. P.M.A. van Ooijen
Prof. C.A.T. van den Berg
Contents
Chapter 1 General Introduction 7
Chapter 2 Comparison of CBCT based synthetic CT methods 21

suitable for proton dose calculations in adaptive proton
therapy
1
Chapter 3 Comparison of the suitability of CBCT- and MR-based 55
synthetic CTs for daily adaptive proton therapy in head
and neck patients
Chapter 4 Range probing as a quality control tool for CBCT‐based 87

synthetic CTs: In vivo application for head and neck
cancer patients
Chapter 5 Clinical suitability of deep learning based synthetic CTs 107

for adaptive proton therapy of lung cancer
Chapter 6 Deep learning based 4D-synthetic CTs from sparse-view 139

CBCTs for dose calculations in adaptive proton therapy
Chapter 7 Summarizing Discussion 177
Appendices Acronyms 198

Summary 200
Samenvatting 203
Acknowledgements 206
Curriculum Vitae 209
List of publications 210
Conference records 211
Introduction
Chapter 1
Introduction
7
Chapter 1
1 Introduction
Proton therapy (PT) is an advanced form of external beam radiotherapy (RT) that
exploits the physical interaction of charged particles with matter to deliver radia-
tion dose to tumors1,2. In contrast to the more conventionally used photon radia-
tion, proton beams have a finite range and deliver their maximum dose in a pristine
peak at the end of their range, the so-called Bragg peak, after which they rapidly
stop3. The depth of the Bragg peak is dependent on the proton energy. Clinically
used proton beam accelerators can produce energies in the range of approximately
70 to 230 MeV which roughly translates to a range of 41 mm to 330 mm in water.
The use of multiple proton energies allows to extend the pristine Bragg peak into
a spread-out Bragg peak (SOBP) and enables highly conformal dose distributions
that maximize the dose in the target volume and spares surrounding healthy tissu-
es, especially beyond the tumor4. The finite range and localized dose delivery sub-
stantiate the theoretical advantage of PT over photon beam radiotherapy3. Figu-
re 1a compares the dose-depth characteristics of photons, single mono-energetic
proton beams and poly-energetic proton beams creating a spread-out Bragg peak.
The beneficial dose delivery characteristics of proton beams come with a high sen-
sitivity to uncertainties affecting the proton range, such as density variations along
the beam path5. Figure 1b illustrates the dosimetric impact of such a density change
for photon and proton beams (single energy and SOBP) with depth-dose profiles.
Due to its sensitivity to density variations, proton therapy requires highly accurate
imaging for treatment planning. Furthermore, proton therapy is particularly sensi-
tive to anatomical changes, meaning that frequent and accurate imaging is also re-
quired during the treatment course6. This can be provided within adaptive proton
therapy (APT)7. APT aims to account for patient related variations (e.g., anatomical
changes) occurring throughout the treatment by monitoring the patient anatomy
and, if necessary, adjusting the treatment plan to ensure consistent target coverage
and organ-at-risk (OAR) sparing8.
1.1 The radiotherapy workflow
The conventional radiotherapy workflow can be grouped into the following stages:
imaging, planning, QA, patient setup and treatment delivery (see Figure 2a). In
non-adaptive RT workflows, imaging is usually only performed once for treatment
planning and it is assumed that the patient anatomy and geometry remains un-
changed throughout the treatment course. However, for many indications this as-
sumption does not hold true as relevant geometrical and anatomical changes occur
on various timescales ranging from weeks (e.g., weight loss, tumor regression), to
8
a) nominal situation
beam direction
Target
Volume
dose
depth
b) ”uncertain“ situation
photons protons (Bragg Peak) protons (SOBP)
Target Target Target
Volume Volume Volume
dose
dose
dose
depth depth depth
Figure 1 a) Schematic comparison of typical depth-dose profiles of photon beams, mono-energetic proton beams (Bragg peak), and poly-energetic
spread-out Bragg peaks (SOBP). b) Schematic illustration of the effect of density variations and uncertainties on different beam types. The red ruler
indicates the dose difference observed at the distal edge of the target volume. Inspired by Knopf and Lomax (2013)38
9
Introduction
1
Chapter 1
days (e.g., atelectasis, prostate rotation), to minutes (e.g., bladder filling) and se-
conds (e.g., cardiac, and respiratory motion)9. Adaptive radiotherapy on the other
hand addresses these patient related variations by frequently imaging the patient
throughout the treatment course and verifying if anatomical changes warrant a
change of the treatment plan (see Figure 2b). For proton beam therapy, with its
inherent sensitivity to density changes, adaptive workflows are even more relevant
than for photon radiotherapy, since already small and localized variations can have
a significant impact on the dose distribution7. The treatment delivery stage in APT
workflows is more complex than in traditional RT workflows, as next to the patient
setup and dose delivery, also additional imaging, dose recalculation, plan adapta-
tions and recontouring might be required. The following subsections focus on the
role of imaging in different stages of the APT workflow.
ĂͿĐŽŶǀĞŶƚŝŽŶĂůZd dƌĞĂƚŵĞŶƚ&y
Wd/Ed^dhW
/D'/E' W>EE/E' Y >/sZz
;dͿ
ďͿĂĚĂƉƚŝǀĞZd
dƌĞĂƚŵĞŶƚ&y
>/sZz
z^
Wd/Ed^dhW W>E
/D'/E' W>EE/E' Y
;dͿ K<͍
EK
ZͲ
Y >/sZz
W>EE/E'
Figure 2 a) conventional vs. b) adaptive treatment workflow.
1.1.1 Imaging modalities
Volumetric imaging is the backbone of radiotherapy and is essential to visualize

the patient anatomy and localize tumors and organs at risk (OAR)10. The imaging
stage consists, depending on the indication, of up to three different imaging mo-
dalities, including computed tomography (CT), magnetic resonance imaging (MR)
and positron emission tomography (PET).
CT imaging is the standard imaging modality used in radiotherapy. Images are

created by a fan-beam shaped x-ray source and a multi-slice detector array that
is rotated around the patient with a helical trajectory11. Images are reconstructed
slice-by-slice and stacked to form a full 3D image. In both photon and proton ra-
10
Introduction
diotherapy, CT imaging is fundamental for dose calculations, which are necessary

to generate and optimize treatment plans for patients. CT is also commonly used
for delineations of target volumes and OARs for treatment planning. During the
treatment course, especially in proton therapy, CT is often also used to monitor
1
the patients’ anatomy. For anatomical locations that are significantly affected by
motion (e.g., respiratory motion in the thorax), time-resolved 4D imaging can be
performed12. 4D-images are created by sorting the acquired detector data into mo-
tion amplitude or phase bins and reconstructing a 3D-image for each respective
motion bin. In radiotherapy, 4D-imaging allows to assess tumor motion and its in-
fluence on treatment plans. Furthermore, 4D images allow 4D robust optimization
and evaluation of treatment plans to mitigate the adverse effects of motion on the
radiotherapy treatment.
MR imaging utilizes magnetic fields, radio waves and radiofrequency receivers for
image generation. Compared to CT imaging, MR does not rely on ionizing radiation
and shows superior soft tissue contrast. In radiotherapy, MR images are used to
aid delineations in indications where the soft tissue contrast of CT is not sufficient
to accurately delineate the target volumes and OARs13 (e.g., head and neck, pelvis,
brain). Contours, delineated on MRs, must be transferred to CT images using de-
formable image registration methods, since MR images are not directly suitable for
dose calculations. For photon RT, recent technological advancements have allowed
a combination of MR imaging and linear accelerators (LINACs) in a single machi-
ne14. These systems enable daily onboard and real-time MR-imaging and provide
the foundation for MR-only radiotherapy, which eliminates the need for CT ima-
ging since MR images are converted into synthetic CTs (see section 1.4). In proton
therapy though, the combination of MR imaging and treatment gantry is still in an
experimental stage and not yet clinically available15.
CT MR PET
Figure 3 Imaging modalities commonly used in radiotherapy (from left to right): Computed tomogra-
phy (CT), Magnetic resonance imaging (MR) and Positron Emission Tomography (PET).
11
Chapter 1
PET uses radiotracers to visualize the metabolic characteristics of tumors and aids
with tumor delineations and assessing treatment response during or after radio-
therapy16. PET images are created by a β+ decay, that emits a positron which qui-
ckly recombines with an electron and emits two high energy photons. These two
photons are emitted in opposite direction and can be detected by a detector ring
outside of the patient. PET imaging is often combined with CT imaging in a single
scanner (PET-CT). Recently LINACs have also been equipped with onboard PET-
CT scanners that allow for real-time PET imaging and so-called biology guided ra-
diotherapy (BgRT)17. On-board PET systems for proton therapy have not yet been
developed.
1.1.2 Patient setup
After imaging the patient and generating a treatment plan, the treatment is usually
delivered in individual fractions (e.g., 35 x 2 Gy for head and neck cancer patients).
When highly conformal treatment techniques, such as intensity modulated proton
therapy (IMPT), are used, accurate patient alignment with the treatment isocenter
is essential.
Cone-beam computed tomography (CBCT) is an imaging modality that can be

used within the treatment room to align the patient. CBCT is compact enough to
be combined with treatment gantries and has become the standard imaging moda-
lity for image-guided radiotherapy (IGRT)18. To acquire a CBCT, a cone-beam sha-
ped x-ray source and a flat panel x-ray detector are rotated around the patient to
acquire a large number of 2D-projections. 3D-images are then reconstructed from
the 2D-projection stack. Due to cone-beam specific imaging artifacts (e.g., increa-
sed scatter)19, CBCTs show lower image quality than conventional fan-beam CTs
and are not directly suitable for segmentations or dose calculations, in particular
in proton therapy. CBCTs are usually acquired for each treatment fraction and re-
present the anatomy of the day. This makes CBCTs an ideal imaging modality for
APT which relies on up-to-date patient images. Figure 4 shows an image quality
comparison of CBCT and CT for a head and neck, and lung cancer patient.
CBCT CT CBCT CT
Figure 4 Image quality comparison of CBCTs and CTs for head and neck, and lung cancer patients.
12
Introduction
1.1.3 Adaptation stage
Adaptive proton therapy can be classified into online and offline strategies7. Offline
adaptive proton therapy workflows separate the evaluation of anatomical changes 1
and the potential treatment plan adaptation from the actual treatment delivery.
If the need of an adaption is detected, an adapted treatment plan is prepared and
used from one of the following fractions onwards. Online adaptive proton thera-
py on the other hand includes the adaptation process in the treatment delivery
(see Figure 2b). This leads to much stricter time constraints and the need for an
increased level of automation for online adaptive treatment strategies20. Adaptive
radiotherapy can either be performed daily (for each treatment fraction) or with a
certain frequency (e.g., weekly).
The current clinical practice for adaptive proton therapy in our proton center is to
acquire conventional fan-beam CTs, also referred to as repeat CTs, for weekly re-
scans during the treatment course. Repeat CTs are used to evaluate the suitability
of the original treatment plan and, if required, to create a new treatment plan. Re-
peat CT acquisitions provide the same image quality as planning CT scans, but the
acquisition frequency during treatment is limited, due to the additional imaging
dose and increased workload for clinical staff. The use of more frequent imaging,
ideally on a daily basis, is desirable to earlier detect changes or changes occurring
on smaller time scales (daily, hourly).
As mentioned before, CBCTs are routinely acquired for daily patient alignment
and represent the daily anatomy of the patient. Re-using these images for adaptive
treatment strategies comes at no additional cost in terms of acquisition time or
imaging dose. However, CBCTs show significant imaging artifacts that deteriorate
the CT-number accuracy and thereby also the dose calculation accuracy. Dose cal-
culations are an essential part of adaptive proton therapy workflows. They allow to
evaluate the impact of anatomical changes on the dose distributions in target vo-
lumes and organs-at-risk. By correcting CBCT image deficiencies, accurate proton
dose calculations can be enabled, facilitating the transition to CBCT-based adapti-
ve proton therapy workflows.
1.2 CBCT correction techniques
Multiple CBCT correction techniques and their ability to enable (proton) dose cal-
culations have been investigated in literature. This section provides a short over-
view of some of these approaches:
13
Chapter 1
1.2.1 Density overrides
In density override techniques, the main tissue categories (e.g adipose, muscle, bo-
nes and air) are segmented on the CBCT and overwritten with uniform Hounsfield
units, for each tissue category21,22. The replacement value can be determined pa-
tient specific utilizing a previous CT scan. Density override approaches are simple
to implement but lack in high accuracy for proton therapy applications.
1.2.2 Deformable image registration-based corrections
DIR can be utilized to deform a high quality fan-beam CT image to the daily CBCT
image. As a result, the deformed CT, also referred to as virtual CT, ideally represents
the anatomy of the daily CBCT but with accurate HUs of the fan-beam CT23. DIR-
based methods are also simple to implement, but depending on the DIR algorithm,
deformation can be relatively time-consuming. Furthermore, large anatomical va-
riations, happening between CT and CBCT acquisition (e.g. cavity filling or ate-
lectasis of the lung), are difficult to model with DIR. DIR-based CBCT correction
methods have been investigated for photon and proton dose calculations and have
shown promising results24,25.
1.2.3 Prior-based scatter estimation
This method aims at correcting CBCT projections, instead of the reconstructed

image. It requires a prior image, such as a planning CT, to estimate the scatter. Ba-
sed on the prior CT, artificial CBCT projections, assumingly free of scatter, are gene-
rated and compared to the actual CBCT projections. The difference between actual
CBCT projections and artificial CBCT projections is generously smoothed and used
to correct the original CBCT projections. Afterwards the corrected projections are
used to reconstruct the CBCT. This method has shown to be highly accurate for
both proton and photon radiotherapy applications, but is time-consuming.23,26–28
1.2.4 Monte Carlo-based scatter correction
Monte Carlo simulations, taking the cone beam x-ray source and detector setup
into account, can be performed to calculate the scatter contribution based on prior
CT images. The resulting scatter distribution is used to correct the CBCT image.
Monte Carlo methods can produce accurate image quality but are usually compu-
tationally expensive to calculate.29
14
Introduction
1.2.5 Analytical image-based correction
In the research version of RayStation (RaySearch, Sweden), the clinically used treat-
ment planning system at our institution, two iterative methods for CBCT correc- 1
tion using prior images are available. Method 1 deformably registers prior CT and
CBCT to create a HU look up table for CBCT HU correction. Afterwards a difference
map, excluding anatomical differences, is created, and generously smoothed to es-
timate low frequency scatter on the CBCT image. This scatter estimation is then
used to correct the original CBCT. The second method combines DIR-based CBCT
correction with method 1. After deforming the prior CT to the CBCT, a comparison
with the corrected CBCT from method 1 is performed to detect differences in low
density regions such as air or lung tissue. In case large differences are detected, the
affected area is copied from the corrected CBCT to the deformed prior CT to ensures
that the more recent anatomical information from the CBCT is used.
1.2.6 Neural network-based correction
Recently, artificial intelligence (AI) and specifically deep learning (DL) has been
applied to many research tasks across the entire field of medical image proces-
sing30,31. Also in radiotherapy, due to its data-driven nature, deep learning has
found its way into research and clinical applications. Deep convolutional neural
networks (DCNNs), a type of deep learning architecture, have shown excellent re-
sults in image classification33, segmentation34 and synthesis tasks35. DCNNs con-
sist of multiple stacked convolutional layers that filter images and extract features
depending on the task they are trained for. DCNNs can be utilized to learn CBCT
correction by providing an extensive dataset of CBCTs and artifact free CT images.
The aim of the neural network is to generate a corrected CBCT, also referred to as
synthetic CT, that resembles CT images in terms of image quality and HU accuracy.
DCNNs learn a mapping of CBCT-to-CT intensities while also correcting image ar-
tifacts, that are present on CBCTs but not on CTs (e.g scatter and dental artifacts).
Large amounts of imaging data is required to train such neural networks and de-
pending on the neural network architecture, paired (registered) imaging data (e.g.,
U-net) or unpaired data (e.g., generative adversarial networks, GANs) is used36.
1.3 Thesis outline
APT requires frequent imaging and accurate proton dose calculations, to support
plan adaptation decisions. Currently, CBCT is the most suitable imaging modality
to provide daily images. Shortcomings in image quality and CT-number accuracy
15
Chapter 1
prevent the direct use of CBCTs for proton dose calculations. The aim of this thesis
is to establish a fast CBCT correction method to enable CBCT-based dose calcu-
lations for adaptive proton therapy workflows in head and neck, and lung cancer
patients.
Chapter 2, “Comparison of CBCT based synthetic CT methods suitable for pro-

ton dose calculations in adaptive proton therapy”, compares several methods to
correct CBCTs and generate synthetic CTs for head and neck cancer patients. The
comparison includes a DIR-based technique, an analytical image-based method
utilizing histogram matching and DIR, and a deep learning-based method. The
evaluation for head and neck cancer patients includes image quality metrics and
simplified proton treatment plans with artificial target volumes.
The lack of electron density information obstructs the use of MR images for pro-
ton dose calculations. Similar to the CBCT-to-CT conversion, deep convolutional
neural networks can be trained to convert MRs into CTs. This allows to combine
the high soft tissue contrast of MR scans with the ability to perform dose calcu-
lations without the need of an actual CT acquisition. This can also eliminate the
need for deformable image registration, usually used to transfer contours from MR
to CT, since the synthetic CTs are directly generated from the MR images. Further-
more, MR imaging does not require ionizing radiation to create an image and can
therefore be acquired frequently throughout the treatment without increasing the
patient’s exposure to radiation. In the future, MR guided proton therapy systems
might enable adaptive proton therapy workflows without the need for CBCT or
CT imaging15. For the head and neck patients investigated in Chapter 2, the clini-
cal imaging protocol also included MR scans. This allowed a direct comparison of
CBCT- and MR-based synthetic CTs generated using two deep convolutional neu-
ral networks. Chapter 3, “Comparison of the suitability of CBCT- and MR-based
synthetic CTs for daily adaptive proton therapy in head and neck patients”, pre-
sents these results including an image quality and dosimetric evaluation compa-
ring CBCT- and MR-based synthetic CTs. Compared to Chapter 2, the dosimetric
evaluation was refined by using the actually clinically applied treatment plans.
Chapters 2 and 3 propose deep learning-based methods to generate accurate syn-

thetic CTs for proton dose calculations. The ‘black-box’ nature of deep learning
and the sensitivity to out-of-distribution data, however, requires stringent quality
assurance procedures, especially if applied in a clinical environment. Chapter 4,
“Range probing as a quality control tool for CBCT-based synthetic CTs: In vivo ap-
plication for head and neck cancer patients”, proposes proton radiography measu-
rements as a quality control tool to verify the accuracy of deep learning based syn-
16
Introduction
thetic CTs. Proton radiography is an imaging modality that utilizes transmitting

proton beams to image an object37. Image contrast is formed by the variation in
residual range of proton beams traversing different tissues. In the context of proton
therapy, imaging with protons has the benefit that the same radiation type is used 1
for imaging and treatment. In Chapter 4, results from previously acquired proton
radiography acquisitions in seven head and neck cancer patients were retrospecti-
vely analyzed to evaluate the HU accuracy of synthetic CTs.
Chapter 5, “Clinical suitability of deep learning based synthetic CTs for adapti-
ve proton therapy of lung cancer”, extends deep learning based sCT generation,
presented for head and neck cancer patients in Chapters 2, 3 and 4, to the thorax
region. Respiratory motion and tissue heterogeneities make the thorax more chal-
lenging than the head and neck region. The evaluation of synthetic CTs for thoracic
cancer patients focused on clinically relevant metrics, including dose-volume his-
togram parameters, proton radiography simulations and normal tissue complica-
tion (NTCP) probabilities.
Respiratory motion largely impacts the thorax region. This justifies the use of time
resolved 4D imaging for treatment planning and evaluation. In Chapter 6, “Deep
learning-based 4D-synthetic CTs from sparse view CBCTs for dose calculations in
adaptive proton therapy”, we propose a deep learning-based method to generate
4D-synthetic CTs based on sparse view 4D-CBCTs. 4D-dose accumulation, uti-
lizing patient specific breathing signals acquired during treatment delivery, was
used to evaluate a clinical use-case of 4D-synthetic CTs.
Chapter 7 provides a summarizing discussion that highlights the main findings,

brings the presented work in a clinically relevant perspective, and provides a fu-
ture outlook for the application of CBCT-based synthetic CTs in adaptive proton
therapy.
17
Chapter 1
References
[1] Wilson RR. Radiological use of fast protons. Radiology. 1946;47(5):487-491.
doi:10.1148/47.5.487
[2] Mohan R, Grosshans D. Proton therapy – Present and future. Adv Drug Deliv Rev.
2017;109:26-44. doi:10.1016/J.ADDR.2016.11.006
[3] Newhauser WD, Zhang R. The physics of proton therapy. Phys Med Biol. 2015;60(8):R155-
-R209. doi:10.1088/0031-9155/60/8/r155
[4] Jette D, Chen W. Physics in Medicine & Biology Creating a spread-out Bragg peak in pro-
ton beams. MEDICINE AND BIOLOGY Phys Med Biol. 2011;56:131-138. doi:10.1088/0031-
9155/56/11/N01
[5] Paganetti H. Range uncertainties inproton therapy and the role of Monte Carlo simulati-
ons. Phys Med Biol. 2012;57(11):99-117. doi:10.1088/0031-9155/57/11/R99
[6] Wohlfahrt P, Richter C. Status and innovations in pre-treatment CT imaging for proton
therapy. British Journal of Radiology. 2020;93(1107). doi:10.1259/BJR.20190590/ASSET/
IMAGES/LARGE/BJR.20190590.G002.JPEG
[7] Paganetti H, Botas P, Sharp GC, Winey B. Adaptive proton therapy. Phys Med Biol.
2021;66(22). doi:10.1088/1361-6560/ac344f
[8] Albertini F, Matter M, Nenoff L, Zhang Y, Lomax A. Online daily adaptive proton therapy.
Br J Radiol. 2019;(July):20190594. doi:10.1259/bjr.20190594
[9] Sonke JJ, Aznar M, Rasch C. Adaptive Radiotherapy for Anatomical Changes. Semin Radiat
Oncol. 2019;29(3):245-257. doi:10.1016/J.SEMRADONC.2019.02.007
[10] Lecchi M, Fossati P, Elisei F, Orecchia R, Lucignani G. Current concepts on imaging in
radiotherapy. European Journal of Nuclear Medicine and Molecular Imaging 2007 35:4.
2007;35(4):821-837. doi:10.1007/S00259-007-0631-Y
[11] Kalender WA. X-ray computed tomography. Phys Med Biol. 2006;51(13):R29.
doi:10.1088/0031-9155/51/13/R03
[12] Hugo GD, Rosu M. Advances in 4D radiation therapy for managing respiration: Part I – 4D
imaging. Z Med Phys. 2012;22(4):258-271. doi:10.1016/J.ZEMEDI.2012.06.009
[13] Lagendijk JJW, Raaymakers BW, van den Berg CAT, Moerland MA, Philippens ME, van Vul-
pen M. MR guidance in radiotherapy. Phys Med Biol. 2014;59(21):R349. doi:10.1088/0031-
9155/59/21/R349
[14] Winkel D, Bol GH, Kroon PS, et al. Adaptive radiotherapy: The Elekta Unity MR-linac con-
cept. Clin Transl Radiat Oncol. 2019;18:54-59. doi:10.1016/J.CTRO.2019.04.001
[15] Hoffmann A, Oborn B, Moteabbed M, et al. MR-guided proton therapy: A review and a
preview. Radiation Oncology. 2020;15(1):1-13. doi:10.1186/S13014-020-01571-X/FIGURES/5
[16] Jelercic S, Rajer M. The role of PET-CT in radiotherapy planning of solid tumours. Radiol
Oncol. 2015;49(1):1. doi:10.2478/RAON-2013-0071
[17] Oderinde OM, Shirvani SM, Olcott PD, Kuduvalli G, Mazin S, Larkin D. The technical de-
sign and concept of a PET/CT linac for biology-guided radiotherapy. Clin Transl Radiat
Oncol. 2021;29:106-112. doi:10.1016/J.CTRO.2021.04.003
[18] Jaffray DA, Siewerdsen JH, Wong JW, Martinez AA. Flat-panel cone-beam computed to-
mography for image-guided radiation therapy. International Journal of Radiation Oncolo-
gy*Biology*Physics. 2002;53(5):1337-1349. doi:10.1016/S0360-3016(02)02884-5
[19] Schulze R, Heil U, Groß D, et al. Artefacts in CBCT: a review. Dentomaxillofacial Radiology.
2011;40(5):265. doi:10.1259/DMFR/30642039
[20] Lim-Reinders S, Keller BM, Al-Ward S, Sahgal A, Kim A. Online Adaptive Radiation Thera-
18
Introduction
py. Int J Radiat Oncol Biol Phys. 2017;99(4):994-1003. doi:10.1016/j.ijrobp.2017.04.023

[21] Fotina I, Hopfgartner J, Stock M, Steininger T, Lütgendorf-Caucig C, Georg D. Feasibility
1
of CBCT-based dose calculation: Comparative analysis of HU adjustment techniques. Ra-
diotherapy and Oncology. 2012;104(2):249-256. doi:10.1016/J.RADONC.2012.06.007
[22] Dunlop A, McQuaid D, Nill S, et al. Comparison of CT number calibration techniques for
CBCT-based dose calculation. Strahlenther Onkol. 2015;191:970-978. doi:10.1007/s00066-
015-0890-7
[23] Kurz C, Kamp F, Park YK, et al. Investigating deformable image registration and scatter
correction for CBCT-based dose calculation in adaptive IMPT. Med Phys. 2016;43(10):5635-
5646. doi:10.1118/1.4962933
[24] Veiga C, Alshaikhi J, Amos R, et al. Cone-Beam Computed Tomography and Deformable
Registration-Based “Dose of the Day” Calculations for Adaptive Proton Therapy. Int J Part
Ther. 2015;2(2):404-414. doi:10.14338/IJPT-14-00024.1
[25] Veiga C, McClelland J, Moinuddin S, et al. Toward adaptive radiotherapy for head and
neck patients: Feasibility study on using CT-to-CBCT deformable registration for “dose of
the day” calculations. Med Phys. 2014;41(3):31703. doi:10.1118/1.4864240
[26] Park YK, Sharp GC, Phillips J, Winey BA. Proton dose calculation on scatter-corrected
CBCT image: Feasibility study for adaptive proton therapy. Med Phys. 2015;42(8):4449-
4459. doi:10.1118/1.4923179
[27] Andersen AG, Park YK, Elstrøm UV, et al. Evaluation of an a priori scatter correction algo-
rithm for cone-beam computed tomography based range and dose calculations in proton
therapy. Phys Imaging Radiat Oncol. 2020;16:89-94. doi:10.1016/J.PHRO.2020.09.014
[28] Niu T, Sun M, Star-Lack J, Gao H, Fan Q, Zhu L. Shading correction for on-board cone-
beam CT in radiation therapy using planning MDCT images. Med Phys. 2010;37(10):5395-
5406. doi:10.1118/1.3483260
[29] Thing RS, Bernchou U, Mainegra-Hing E, Brink C. Patient-specific scatter correction in
clinical cone beam computed tomography imaging made possible by the combination of
Monte Carlo simulations and a ray tracing algorithm. Acta Oncol (Madr). 2013;52(7):1477-
1483. doi:10.3109/0284186x.2013.813641
[30] Barragán-Montero A, Javaid U, Valdés G, et al. Artificial intelligence and machine learning
for medical imaging: A technology review. Physica Medica. 2021;83:242-256. doi:10.1016/J.
EJMP.2021.04.016
[31] Wang S, Cao G, Wang Y, et al. Review and Prospect: Artificial Intelligence in Advanced
Medical Imaging. Frontiers in Radiology. 2021;0:15. doi:10.3389/FRADI.2021.781868
[32] Francolini G, Desideri I, Stocchi G, et al. Artificial Intelligence in radiotherapy: state of
the art and future directions. Medical Oncology. 2020;37(6):1-9. doi:10.1007/S12032-020-
01374-W/TABLES/5
[33] Rajan Jeyaraj P, Rajan Samuel Nadar E. Computer-assisted medical image classification
for early diagnosis of oral cancer employing deep learning algorithm. J Cancer Res Clin
Oncol. 2019;145(3):829-837. doi:10.1007/s00432-018-02834-7
[34] Liang S, Tang F, Huang X, et al. Deep-learning-based detection and segmentation of or-
gans at risk in nasopharyngeal carcinoma computed tomographic images for radiothera-
py planning. Eur Radiol. 2019;29(4):1961-1967. doi:10.1007/s00330-018-5748-9
[35] Spadea MF, Pileggi G, Zaffino P, et al. Deep Convolution Neural Network (DCNN) Multi-
plane Approach to Synthetic CT Generation From MR images—Application in Brain Pro-
ton Therapy. International Journal of Radiation Oncology*Biology*Physics. 2019;105(3):495-
503. doi:10.1016/j.ijrobp.2019.06.2535
[36] Spadea MF, Maspero M, Zaffino P, Seco J. Deep learning based synthetic-CT generation in
radiotherapy and PET: A review. Published online 2021. doi:10.1002/mp.15150
[37] Poludniowski G, Allinson NM, Evans PM. Proton radiography and tomography with
19
Chapter 1
application to proton therapy. British Journal of Radiology. 2015;88(1053). doi:10.1259/

BJR.20150134/ASSET/IMAGES/LARGE/BJR.20150134.G008.JPEG
[38] Knopf AC, Lomax A. In vivo proton range verification: a review. Phys Med Biol.
2013;58(15):131-160. doi:10.1088/0031-9155/58/15/R131
20
Comparison of CBCT based synthetic CT methods suitable for
proton dose calculations in adaptive proton therapy
Chapter 2
Comparison of CBCT based synthetic 2
1
CT methods suitable for proton
dose calculations in adaptive
proton therapy
Adrian Thummerer1, Paolo Zaffino2, Arturs Meijers1, Gabriel Guterres Marmitt1,
Joao Seco3,4, Roel JHM Steenbakkers1, Johannes A Langendijk1, Stefan Both1,
Maria F Spadea 2,6, Antje C Knopf 1,5,6
1 Department of Radiation Oncology, University Medical Center Groningen

University of Groningen,Groningen, The Netherlands
2 Department of Experimental and Clinical Medicine
Magna Graecia University, Catanzaro, Italy
3 Department of Biomedical Physics in Radiation Oncology,
German Cancer Research Centre (DKFZ), Heidelberg, Germany
4 Department of Physics and AstronomyHeidelberg University,
Heidelberg, Germany
5 Division for Medical Radiation Physics,
Carl von OssietzkyUniversität Oldenburg,Oldenburg, Germany
6 Both authors contributed equally to this work
Published in:
Physics in Medicine and Biology
April 2020, Volume 65, Issue 9
DOI: 10.1088/1361-6560/ab7d54
21
Chapter 2
Abstract
In-room imaging is a prerequisite for adaptive proton therapy. The use of onboard
cone-beam computed tomography (CBCT) imaging, which is routinely acquired
for patient position verification, can enable daily dose reconstructions and plan
adaptation decisions. Image quality deficiencies though, hamper dose calculation
accuracy and make corrections of CBCTs a necessity.
This study compared three methods to correct CBCTs and create synthetic CTs that
are suitable for proton dose calculations. CBCTs, planning CTs and repeated CTs
(rCT) from 33 H&N cancer patients were used to compare a deep convolutional
neural network (DCNN), deformable image registration (DIR) and an analytical
image-based correction method (AIC) for synthetic CT (sCT) generation. Image
quality of sCTs was evaluated by comparison with a same-day rCT, using mean ab-
solute error (MAE), mean error (ME), Dice similarity coefficient (DSC), structural
non-uniformity (SNU) and signal/contrast-to-noise ratios (SNR/CNR) as metrics.
Dosimetric accuracy was investigated in an intracranial setting by performing
gamma analysis and calculating range shifts.
Neural network-based sCTs resulted in the lowest MAE and ME (37/2 HU) and the
highest DSC (0.96). While DIR and AIC generated images with a MAE of 44/77 HU,
a ME of −8/1 HU and a DSC of 0.94/0.90. Gamma and range shift analysis showed
almost no dosimetric difference between DCNN and DIR based sCTs. The lower
image quality of AIC based sCTs affected dosimetric accuracy and resulted in lo-
wer pass ratios and higher range shifts. Patient-specific differences highlighted the
advantages and disadvantages of each method. For the set of patients, the DCNN
created synthetic CTs with the highest image quality. Accurate proton dose calcu-
lations were achieved by both DCNN and DIR based sCTs. The AIC method resulted
in lower image quality and dose calculation accuracy was reduced compared to the
other methods.
22
1. Introduction
Adaptive radiotherapy (ART) intends to improve radiation treatments by moni-

toring changes in patient anatomy, assessing the actual delivered dose and sub-
sequently modifying treatment plans to achieve the best possible target coverage
and organs at risk sparing (Yan et al 1997, Lim-Reinders et al 2017, Sonke et al 2019).
Repeated imaging throughout the treatment course plays an essential role in ad-
2
1
aptive treatment strategies, since dose recalculations, based on these images, indi-
cate the necessity for treatment plan modifications (Hvid et al 2018, Posiewnik and
Piotrowski 2019).
In photon therapy centers, and in recent years also in proton therapy centers, on-
board cone-beam computed tomography (CBCT) systems are available for accura-
te pre-treatment patient alignment (Hua et al 2017, Stock et al 2018). Beside patient
position, CBCT images can also provide information about interfractional changes
of the patient anatomy.
Accurate stopping power ratios (SPR), which are typically derived from CT num-
bers, are a prerequisite for precise proton dose calculations. The relationship bet-
ween CT-number and SPR is ambiguous and not unique since tissues with similar
CT-numbers can have different SPRs and vice versa (Yang et al 2012). Any underly-
ing CT-number error will enlarge the uncertainty of the SPR conversion and even-
tually affect dosimetric accuracy. This makes proton dose calculations in particular
sensitive to CT-number uncertainties. In contrast to conventional fan-beam CT,
cone-beam CT images suffer from various imaging artifacts that impair the image
quality and lead to such CT-number uncertainties (Schulze et al 2011, Nagarajappa
et al 2015). Due to the high requirements on the accuracy of CT-numbers, clinical
proton dose calculations cannot be performed directly on CBCT images and correc-
tions have to be applied to the CBCTs first.
With a rise in CBCT equipped proton therapy centers and the increased interest in
adaptive proton therapy in recent years, different correction approaches and their
suitability for proton dose calculations have been reported in literature. This inclu-
des, among others, look-up table (LUT) based approaches (Kurz et al 2015), histo-
gram matching (Arai et al 2017), deformation of planning CTs (Peroni et al 2012, Vei-
ga et al 2015, 2017, Landry et al 2015b, Veiga et al 2016), projection-based correction
methods (Niu et al 2010, Park et al 2015, Kurz et al 2016a) and deep learning techni-
ques (Kida et al 2018, Hansen et al 2018).
Kurz et al investigated a LUT based CBCT correction method (2015). This relatively
simple technique was found not sufficiently accurate for proton dose calculations
23
Chapter 2
and was outperformed by a deformable image registration method. Results from

a histogram matching algorithm were reported by Arai et al (2017). In this study,
dose calculation accuracy in phantoms and head and neck cancer patients was im-
proved compared to dose calculation on raw CBCTs. Another approach, that was
comprehensively investigated in the context of adaptive photon radiotherapy, is
the deformation of planning CTs onto the geometry of daily CBCTs which results
in a synthetic CT (often also referred to as virtual CT or pseudo CT). In the scope
of proton therapy of lung malignancies, Veiga et al reported findings from such a
deformable image registration method (2015, 2016, 2017). They concluded that for
lung cancer similar conclusions for plan adaptations can be drawn on a synthetic
CT as on a repeat (fan-beam) CT scan. Their method also incorporated additional
correction steps for areas in which the deformable image registration was not able
to reconstruct correctly. For head and neck cancer patients Landry et al reported
good agreement between dose calculated based on a synthetic CT resulting from
deformed planning CT and on a repeat CT (2015b). Kurz et al compared the defor-
mable image registration method to a projection based correction method in terms
of its suitability for proton dose calculations (2016a). The projection-based method
was initially described by Niu et al (2010) and Park et al (2015). It uses the synthetic
CT, created by deformable image registration, as prior for scatter correction and
was found highly accurate in terms of proton dose calculations. Kurz et al conclu-
ded that for head and neck patients, the synthetic CTs from the deformable image
registration method were equally suitable for proton dose calculations than tho-
se from the projection-based correction method. For prostate cancer patients, the
projection-based method performed better.
With the recent progress in the field of artificial intelligence (AI), such techniques
are increasingly applied to problems in radiology and radiotherapy. In the subfield
of medical image synthesis a lot of progress has been recently made using AI to
convert magnetic resonance (MR) images into synthetic CT images (Han 2017, Chen
et al 2018). These techniques have also been translated to CBCT scatter correction.
AI-based CBCT correction methods include conventional machine learning tech-
niques such as random forest based methods (Li et al 2019) but also deep learning
based methods like deep convolutional neural networks (Kida et al 2018) and ge-
nerative adversarial networks (Liang et al 2019, Harms et al 2019). In random forest-
based methods decision trees are trained to predict CT intensities based on aligned
CT and CBCT or MRI patches. Deep convolutional neural networks, on the other
hand, learn a direct non-linear mapping of image intensities from one imaging
modality to another using paired CBCT and CT images. Generative adversarial net-
works have the ability to learn from unpaired CBCT and CT datasets. A common
advantage of all these AI-based techniques is that they do not require a planning
24
CT for sCT generation once the model is trained.
Kida et al showed one of the first applications of deep learning approaches to con-
vert CBCT images into synthetic CTs (2018), but did not include a dosimetric eva-
luation. Li et al reported high image quality and potential for accurate dose cal-
culations of DCNN based synthetic CTs for photon radiotherapy (2019). Hansen et
al evaluated the accuracy of proton dose calculations on CBCTs corrected with a
2
1
U-net deep convolutional neural network which was trained on raw and scatter-
free CBCT projections (2018). For patients with pelvic tumors, their results showed
insufficient proton dose calculation accuracy. Landry et al used a similar convolu-
tional neural network and compared different training data types (2019). This in-
cluded raw and scatter-free projections, reconstructed CBCTs, DIR-synthetic CTs
and reconstructed CBCTs, based on raw and corrected projections. For proton dose
calculations in prostate cancer patients the method using CBCTs, from raw and
corrected projections, performed best. Table 1 summarizes the methods for synthe-
tic CT creation, the anatomical site for which they were evaluated and their suita-
bility for proton dose calculations.
Suitability for
Method Literature Anatomical site proton dose
calculations
LUT based correction Kurz et al 2015 head and neck -
Histogram matching Arai et al 2017 phantoms, head and neck -
Veiga et al 2015, 2016,

++ (H&N)
2017 lung, head and neck,
DIR + (pelvis)
Kurz et al 2015, 2016a pelvis
+ (lung)
Landry et al 2015b,
Projection-based Park et al 2015,

head and neck, pelvis ++
correction Kurz et al 2016 a
Deep convolutional Hansen et al 2018,

pelvis +
neural network Landry et al 2019
- stands for insufficient, + for acceptable, and ++ for high proton dose calculation accuracy
Table 1. Overview of previously investigated CBCT correction/conversion methods in the context of

proton dose calculations.
A comparison of the above-mentioned methods is challenging since many investi-

gations were carried out for different anatomical regions, were based on different
input data and used different metrics for the dosimetric evaluation. In this work,
we compare three methods to correct CBCTs (acquired in a proton treatment room)
and create synthetic CTs using a large head and neck data set. Method 1 uses a deep
convolutional neural network derived by the work of Spadea et al (2019). Method
2 is based on deformable image registration and uses a similar DIR-algorithm as
Veiga et al, Kurz et al and Landry et al (Landry et al 2015b, Veiga et al 2016, Kurz et al
25
Chapter 2
2016a). Method 3 uses an iterative approach where the joint histogram between the
pCT and the CBCT is used to create an intensity conversion function. Shading ar-
tifacts are also corrected by using a correction map constructed utilizing the plan-
ning CT. Results from the various methods are compared in terms of image quality
and proton dose calculation accuracy.
In a clinical context, not only raw performance and accuracy are predominant fac-
tors to determine if a method is suitable for clinical implementation. Stability, time
consumption, and labor efficiency should also be considered. Our work aims at
identifying a clinically optimal method to create synthetic CTs suitable for an auto-
mated adaptive proton therapy workflow. Compared to previous works, multiple
sCT methods are tested on exactly the same extensive dataset. This facilitates an
in-depth comparison of these methods for application in adaptive proton therapy.
26
2. Material and methods

2.1. Patient data
CT and CBCT imaging data from 33 head and neck cancer patients, treated with
PBS proton therapy at the University Medical Center Groningen (UMCG, Nether-
lands), were used to create synthetic CTs (sCT). Images were acquired between
2
1
January 2018 and February 2019 and patients were aged between 27 and 80 years
(mean: 62 years). Out of the 33 patients, 23 were of male gender. A planning CT
(pCT), acquired approximately three weeks before treatment, weekly repeated CTs
(rCT) and daily CBCTs (used for patient position verification), were available for
each patient. For synthetic CT generation and validation, CBCT-CT imaging pairs
from the day of the first rCT acquisition were chosen. A Siemens SOMATOM Defi-
nition AS Open scanner (Siemens Healthineers, Germany) with a resolution of 0.98
mm × 0.98 mm, a slice thickness of 2 mm and a FOV of 500 mm was used for the
acquisition of the pCTs. For rCTs, a Siemens SOMATOM Confidence scanner with
similar settings (resolution: 0.98 mm × 0.98 mm, slice thickness: 2 mm, FOV: 500
mm) was utilized. CBCT images were acquired using the onboard imaging device
of an IBA Proteus®PLUS gantry (IBA, Belgium). CBCTs were reconstructed on a
0.50 × 0.50 mm × 2.50 mm grid with a FOV of 260 mm. Generally, CBCTs consisted
of 140 and CTs of 226 axial slices. Both with an axial resolution of 512 × 512 pixels.
A facial mask was used during CBCT and CT acquisition to fix the patient‘s head in
a reproducible position.
CT and CBCT images were not covering the same field-of-view (FOV) and were
therefore cropped to an equal FOV during data preparation. Furthermore, the low
dose imaging protocol for CBCT acquisition lead to artifacts and heavily impaired
image quality below the shoulders. To avoid any influence on the image and dose
comparison, we cropped this region on all images. Examples for the extent of image
cropping can be found in section S4 of the supplementary materials (stacks.iop.org/
PMB/65/095002/mmedia).
2.2. sCT methods
2.2.1. Neural network method (NN)

Paired CBCT and rCT images, acquired on the same day with the same immobili-
zation, were used to train a deep convolutional neural network (DCNN) to convert
CBCT image intensities into CT intensities, generating sCTs. The employed net-
work was originally designed for MRI to sCT conversion but was left unchanged
for our purpose (Spadea et al 2019). It is based on a U-net proposed by Han (2017).
This U-net consists of an encoding path to extract representative features from the
27
Chapter 2
CBCT using convolutional layers. This encoding path is followed by a decoding

path, also based on convolutional layers, to reconstruct these features with the cor-
rected HUs. Based on this architecture, Spadea et al further introduced a multi-pla-
nar approach to train an individual network for axial, sagittal and coronal images
which are afterwards combined into a final sCT image. A detailed description of the
applied neural network in the context of MRI based sCT synthesis was reported by
Spadea et al (2019).
Before training the neural network various data preparation steps were necessa-
ry. On the CBCT and rCT the treatment couch, facial mask, and background were
removed by segmenting the patient outline using an automatic segmentation al-
gorithm included in the image processing software Plastimatch (www.plastimatch.
org; Zaffino et al 2016). The resulting masks were manually corrected to assure
complete coverage of ears, nose and lungs. All voxels outside these masks were set
to −1000 HU. As a next step a rigid registration of the CBCT to the rCT was per-
formed using the registration algorithm included in Plastimatch. In the last step,
the CBCT was deformably registered to match the rCT using a diffeomorphic mor-
phons DIR algorithm (Janssens et al 2011) included in the openREGGUI MATLAB
package (www.openreggui.org). This resulted in aligned images to train the neural
network. To allow a meaningful comparison, the resulting pre-processed CBCT
was also used as starting points for the other two methods. For simplification, we
will still refer to it as CBCT, although it is not the original CBCT anymore. Synthetic
CTs resulting from the DCNN training will be referred to as sCTNN in the following
sections.
As described by Spadea et al (2019), three individual sets of weights were trained

using slices from axial, sagittal or coronal views, respectively. The resulting images
from each trained network were combined afterwards. The training was performed
on a NVIDIA 1080 Ti Graphical Processing Unit (GPU). In total, including slices
from all views and data augmentation, 240 thousand slices were used to optimi-
ze 32 million learnable parameters of the DCNN. The data augmentation included
small translations and mirroring of entire slices. The training was stopped when
five consecutive epochs did not improve the validation loss.
A 3-fold cross-validation approach was followed by randomly splitting our dataset

into three subsets of 11 patients each. Two subsets were used for training/valida-
tion, while the third was used for testing of the trained network. This was repeated
two times, so every subset was used for testing once. This procedure allowed the
utilization of all 33 patients as testing cases and increased the number of available
patients for image comparison and dosimetric evaluations.
28
2.2.2. Deformable image registration method (DIR)

As a second method to generate synthetic CTs, a diffeomorphic morphons DIR al-
gorithm, implemented in the MATLAB package openREGGUI, was used to deform
the pCT onto the geometry of the CBCT. This algorithm is particularly suited for
CBCT-CT image registration since it is using a local phase metric instead of solely
focusing on image intensities (Kurz et al 2016a). The algorithm was previously ap-
plied for CBCT-CT image registration and showed suitability for H&N applications
2
1
(Landry et al 2015a, Kurz et al 2016b). Before DIR the patient outline was automa-
tically segmented on CBCT and pCT using Plastimatch. Similar to the neural net-
work data preparation, masks were checked manually to assure full coverage and
values outside the patient outline were set to −1000 HU. Afterwards, the pCT was
rigidly registered to the CBCT and the FOV was cropped to be equal to the FOV of
the CBCT. Then the actual CBCT conversion was performed by deforming the pCT
to match the CBCT. We refer to the resulting image as sCTDIR.
2.2.3. Analytical image-based correction method (AIC)

A first version of an analytical correction and conversion method was available in
a research version of the clinical treatment planning system (TPS) Raystation (Ver-
sion 7.99; RaySearch, Sweden). This third investigated method uses an iterative ap-
proach to convert and correct CBCTs. First, the pCT is utilized to find a conversion
from CBCT to CT intensity scale by creating a joint histogram in which points cor-
responding to different tissues are found and a piecewise linear conversion func-
tion is created. Then, image artifacts are reduced by creating a correction map, not
influenced by anatomical differences, that tries to eliminate low-frequency variati-
ons that are present on the CBCT but not on the pCT. As data preparation, the pCT
was rigidly registered to the CBCT and masked to cover the same FOV as the CBCT.
For optimal results, the recommendation to perform a DIR within the TPS was fol-
lowed. After that, the script was executed and resulted in sCTAIC. An overview of the
data preparation steps for all methods is provided in figure 1.
2.3. Image evaluation
The rCT was used as ‚ground truth‘ image to evaluate the performance of the vari-
ous sCT conversion methods. The image quality and accuracy of the CBCT-conver-
sion methods were quantified by mean absolute error (MAE) and mean error (ME),
defined in equations (1) and (2).
29
Chapter 2
Figure 1. Overview of data preprocessing, sCT conversion and data evaluation. sCT = synthetic CT, rCT
= repeated CT, DIR = deformable image registration, DCNN = deep convolutional neural network, AIC =
analytical image-based correction, MAE = mean absolute error, ME = mean error, DSC = dice similarity
coefficient, SNU = spatial non-uniformity index.
30
Only voxels that lay within the automatically created patient skin contour were
considered for MAE and ME calculations. A distribution of the MAE in varying HU
regions was investigated by calculating a MAE-spectrum for bins of 20 HU from
−1000 to 1500 HU. To assess the geometric accuracy of each conversion method,
the Dice similarity coefficient (DSC) was calculated for bone according to equation
(3). For this purpose, a segmentation of bone tissue was performed by thresholding
the images to only include voxels with HU-values above a certain limit. Threshold
2
1
values from 100 to 1000 HU (in steps of 100 HU) were used to calculate a DSC spec-
trum. This allowed an investigation of the similarity of bone tissues with increa-
sing density.
Furthermore we investigated the DSC of air cavities within the patient outline. To
segment the air cavities on the rCT and sCTs we used a threshold of −465 HU (Na-
kano et al 2013) and calculated the DSC according to equation (3).
The spatial uniformity of CBCT images is impaired by scatter artifacts. We therefo-

re calculated the spatial non-uniformity index (Shi et al 2017, Kida et al 2018) for the
original CBCT, the reference rCT and the various sCTs. The spatial non-uniformity
(SNU) is defined as
where is the maximum/minimum of the mean pixel value out of mul-

tiple ROIs located in regions with similar density. In this work, six ROIs were equal-
ly distributed across the soft tissue in the brain. An example for the ROI-positio-
ning can be found in the supplementary materials subsection S2. A low SNU index
indicates high uniformity across the image.
2.4. Dosimetric evaluation
To evaluate the suitability of the sCTs for proton dose calculations, two pencil
beam scanning (PBS) proton plans with artificial target volumes were created in
the research version of RayStation TPS (version 7.99). For plan 1, a cylindrical tar-
get was positioned in the region of the brainstem and was treated with a beam
incoming from a 45° gantry angle (see figure 2(b)). This direction was chosen to
avoid airways and oral cavity, where movements and anatomical variations occur
frequently and on short timescales. Nonetheless, the beam still traverses a challen-
ging path with small bone structures and multiple bone-soft tissue interfaces. For
plan 2, a target with a larger volume was positioned above the first one in a central
31
Chapter 2
location in the brain. It was irradiated from a 180° gantry angle and only traverses
the skull and brain (see figure 2(c)). In plan 2, the beam passes through the skull
and then stops in brain tissue. Four patients had to be excluded from the dosimetric
evaluation using plan 2. This was caused by the limited FOV of the CBCTs at the
posterior part of the skull.
Figure 2. (a) Exemplary targets used in plan 1 (blue, inferior) and plan 2 (yellow, superior), (b) beam
direction for plan 1, (c) beam direction for plan 2.
For both targets, a uniform dose of 10 GyRBE in five fractions (5 × 2 GyRBE) was plan-
ned on a 1 mm × 1 mm × 1 mm dose grid. A constant RBE value of 1.1 was used for
all dose calculations. Figure 2 depicts the target positions and beam directions. The
average target volumes for plans 1 and 2 were 20 and 50 cm3, respectively.
Initially, the dose was calculated on the rCT using the RayStation Monte Carlo dose
engine with an uncertainty of 1.0%. For the dosimetric evaluation, the dose was
then recalculated on sCTNN, sCTDIR and sCTAIC. Gamma pass ratios, including only
voxels above 10% of the prescribed dose, were computed for 2%/2 mm and 3%/3
mm criteria. Furthermore, a comparison of HU-profiles of rCT, sCTNN, sCTDIR and
sCTAIC and the resulting dose profiles were plotted along the beam direction at a
central line of plan 1 and 2. To further investigate the proton dose calculation ac-
curacy of the synthetic CTs, range shifts were computed by comparing depth-dose
profiles from dose calculations based on the rCTs and the synthetic CTs as in Pileggi
et al (2018). Range shifts were determined by shifting the depth-dose profile until
the sum of differences between the two curves was minimal. This was performed
for all dose profiles that were exposed to at least 80% of the planned dose. Based on
these results, the mean and standard deviation of the relative range shift for each
patient and the entire patient population were calculated.
32
3. Results
3.1. Image quality and time efficiency
3.1.1. Neural network (NN)

On average, the training of the DCNN was stopped after 29 epochs. Table 2 shows
the number of epochs that each network was trained for and the resulting MAE and
2
1
ME for the individual subsets of the 3-fold cross-validation. Training of one epoch
took up to 4 h. With the hardware used during this study, converting a CBCT into
a sCT took about 3 min (1 min per view). Time for combining the views was negli-
gible. sCTNN resulted in an average MAE and an average ME of 36.3 ± 6.2 HU and 1.5
± 7.0 HU, respectively.
Nr. of epochs
MAE [HU] ME [HU]
axial coronal sagittal
Subset 1 35 32 31 36.4 ± 5.6 0.9 ± 5.1
Subset 2 29 25 23 34.9 ± 3.7 2.0 ± 7.1
Subset 3 31 27 26 37.8 ± 8.4 1.4 ± 8.2
Table 2. Overview of training epochs for subset 1–3 in axial, sagittal and coronal views. Additionally,
also the MAE and ME for the individual subsets are listed.
3.1.2. Deformable image registration (DIR)

Generating synthetic CTs by deforming the pCT onto the CBCT took about 20 min
per patient and resulted in an average MAE of 44.3 ± 6.1 HU and an average ME of
−7.6 ± 4.9 HU.
3.1.3. Analytical image-based correction (AIC)

For synthetic CTs created with the AIC method, MAE and ME averages of 76.2 ±
13.3 HU and 1.4 ± 11.5 HU were observed. Conversion times, including the deforma-
ble registration in the TPS, lay between 1 and 2 min. After registration, the actual
conversion from CBCT to sCTAIC was very fast and only required 5–10 s of compu-
tational time.
3.2. Image comparison
Figure 3 provides an overview of the CBCT, the created synthetic CTs and the re-
ference rCT for patient 15 in axial, sagittal and coronal views. A Hounsfield-unit
window of 2000/0 was applied to all images. sCTAIC images were noticeably blur-
rier than sCTNN and sCTDIR images. This was partially caused by resampling and
33
Chapter 2
image registration of the CBCT during preprocessing. Image noise was comparable
to the rCT in sCTDIR, lower in sCTNN and higher in sCTAIC. A quantitative analysis of
imaging noise in different tissues and animated figures showing the entire imaging
volumes can be found in subsections S2 and S3 of the supplementary materials.
Figure 3. Comparison of CBCT, the various sCTs (sCTNN, sCTDIR and sCTAIC) and the reference rCT
images in axial, sagittal and coronal views for patient 15. The same HU-windowing settings [WL = 250,
WW = 1250] were applied to all images except for the CBCT, where a window of [1900,−100] was used.
The MAE values for this patient are: sCTNN = 35.8 HU, sCTDIR = 43.4 HU and sCTAIC = 78.2 HU. Gamma
pass ratios(2%/2 mm) are: sCTNN = 99.2%, sCTDIR = 99.5% and sCTAIC = 98.4%.
Figure 4 presents the difference between rCT and the various synthetic CTs for pa-
tient 15. The higher average MAE of sCTAIC is clearly visible in all views. Especially in
bone, the error is larger than for the other two methods. In sCTNN the error is more
randomly distributed than in sCTDIR and sCTAIC, where the error is mostly distribu-
ted around bone/soft-tissue interfaces. This is partially caused by the use of defor-
mable image registration to create synthetic CTs. High error areas can also be seen
in the oral and nasal cavity. However, these errors are caused by actual anatomical
differences between the CBCT and the ground truth rCT and not by the synthetic
CT conversion itself.
Figures 5(a) and (b) present the MAE and ME of the three conversion methods for
the entire dataset. All datasets were visually checked for any impeding factors. Due
to major imaging artifacts, the CBCT of patient 4 was found not suitable for further
34
2
1
Figure 4. Difference images between rCT and the various sCT methods in coronal, sagittal and axial
view of patient 15. The color bar indicates the difference in HU.
Figure 5. (a) Mean absolute error (MAE) and (b) mean error (ME) in HU of sCTNN, sCTDIR and
sCTAIC for all 33 patients. Only voxels within the patient outline were considered for calculation of
MAE and ME.
35
Chapter 2
investigation and was therefore excluded from all further analysis. sCTNN produced
images with the lowest MAE for all cases. On average, the MAE was 8 HU higher for
sCTDIR and 40 HU higher for sCTTPS. For sCTNN (1.5 HU) and sCTAIC (1.4 HU), the ME
was evenly distributed around zero, while for sCTDIR (−7.6 HU) the ME was noticea-
bly shifted towards lower HU values. Due to a specific neck position of patient 18,
the conversion using the AIC method partially failed and increased MAE and ME
values were observed.
Figure 6 reports the average MAE spectrum for sCTNN, sCTDIR and sCTAIC. The sha-
ded areas indicate the standard deviation within the entire patient dataset. For vo-
xels with HUs in the interval from −1000 to 0, sCTNN resulted in the lowest average
MAE, though the standard deviation overlaps with sCTDIR and sCTAIC. For HUs ab-
ove 0, sCTAIC shows double the error than sCTNN and sCTDIR. The trend of sCTDIR and
sCTTPS with increasing HU is similar while sCTNN exhibits a different behavior with
a decreasing MAE above 1000 HU. Overlapping with the MAE-spectrum, also an
average image histogram is presented in figure 6. It shows that most of the voxels
have CT-numbers around 0 HU and hence the global MAE is also mainly determi-
ned by these voxels.
Figure 6. Mean absolute error spectrum for sCTNN, sCTDIR and sCTTPS. The continuous lines were crea-
ted by binning the MAE (bin-size 20 HU), calculating MAE for every bin and averaging over all patients
(excluding patient 4). Shaded areas indicate one standard deviation. The dashed black line shows an
average image histogram.
36
Figure 7 shows the DSC of bone tissue for a range of threshold values. At a threshold
of 200 HU, sCTNN results in a DSC of 0.96, sCTDIR in 0.95 and sCTAIC in 0.90. With in-
creasing threshold values, representing more dense bone tissue, the DSC decreases.
For air cavities an average DSC of 0.90, 0.81 and 0.80 was observed for sCTNN,
sCTDIR and sCTAIC, respectively. These values are significantly lower than the once
observed for bone. This is caused by variations in the oral cavity that occur between
2
1
scans even if acquired on the same day.
Figure 7. DSC for threshold values between 100 and 1000 HU. Averaged over all patients (excluding
patient 4). Error bars indicate 1 SD.
For rCT, CBCT, sCTNN, sCTDIR and sCTAIC we observed an average SNU of 14.0 ± 7.0
HU, 118.0 ± 45.0 HU, 8.5 ± 4.2 HU, 12.4 ± 6.1 HU and 21.6 ± 16.3 HU, respectively. The-
se results show that all three sCT methods significantly reduce the non-uniformity
caused by scatter on the CBCT. As expected, the lowest difference to the ground
truth rCT was observed for sCTDIR since it deforms a planning CT scan with similar
uniformity as the rCT. sCTAIC resulted in images with lower uniformity than the
rCT scan, while the smoothing and the loss of detail in the soft tissue of the brain
leads to a higher uniformity for sCTNN when compared to the rCT and all other sCT
methods.
3.3. Dosimetric evaluation

Figure 8(a) shows HU profiles and corresponding line doses of dose distributions
based on the rCT, sCTNN, sCTDIR and sCTAIC along the beam direction of plan 1. sCTNN
and sCTDIR profiles closely follow the HU profile of the rCT, but show some local-
ly confined differences. Larger deviations could be observed for the HU-profile of
sCTAIC which resulted in higher range shifts for dose distributions based on the
37
Chapter 2
sCTAIC. Although, it has to be noted that in the selected profile the range shift of
sCTAIC is larger than the overall average range shift of sCTAIC. Hence, it is not repre-
sentative for the entire dataset.
Figure 8(b) presents a similar plot for profiles in the beam direction of plan 2. In
this case, all three methods can reproduce HU-values of the rCT in the homoge-
neous soft tissue areas. Higher noise was observed for sCTAIC. sCTNN achieves the
highest agreement with the rCT. sCTDIR is not able to reconstruct the first peak ac-
cordingly. This is partially connected to the deformation of the pCT that also cau-
ses a deformation of the patient outline. sCTAIC underestimates the HU of the first
peak but also overestimates the HU of the second peak. Overall, this results in no
range shift for the dose recalculated on sCTAIC.
Figure 8. Comparison of Hounsfield unit (HU) profiles of rCT, sCTNN, sCTDIR and sCTAIC along a)
beam direction 1 and b) beam direction 2 for patient 15. Dashed lines represent dose profiles of the
according sCTs. The presented profiles are exemplary and are not representative of the entire dataset.
Table 3 presents the gamma analysis results using 2%/2 mm and 3%/3 mm accep-
tance criteria for both plans. Boxplots in figure 9 indicate the distribution of the
pass ratios for the entire dataset. While sCTNN showed a similar distribution of pass
38
ratios for plan 1 and 2, sCTDIR and sCTAIC have a noticeably wider distribution in
plan 2.
Plan 1 Plan 2
2
1
2%/2mm 3%/3mm 2%/2mm 3%/3mm
[min, max] [min, max] [min, max] [min, max]
99.91
99.43 99.98 99.18
sCTNN [99.70, 100]
[98.08, 99.75] [99.75, 100] [93.75, 99.57]
99.84
99.58 99.96 98.35
sCTDIR [93.60, 99.99]
[97.59, 99.84] [99.10, 100] [87.52, 99.62]
98.05 99.66 96.64 99.23

sCTAIC
[95.39, 99.62] [98.48, 100] [84.87, 99.44] [93.48, 99.98]
Table 3. Overview of average gamma pass ratios for 1%/1 mm, 2%/2 mm and 3%/3 mm criteria. In
brackets minimum and maximum values of each method/criteria are listed.
Figure 9. Boxplots of gamma pass ratios (2%/2 mm) of sCTNN, sCTDIR and sCTAIC for (a) plan 1 and (b)
plan 2.
The range shift analysis for plan 1 resulted in a mean relative range shift of 0.0 ±
0.7%, 0.1 ± 0.4% and −0.8 ± 1% for sCTNN, sCTDIR and sCTAIC, respectively. For plan 2,
a mean relative range shift of −0.1 ± 0.3%, −0.6 ± 0.7% and −0.3 ± 1.6% was observed
for sCTNN, sCTDIR and sCTAIC. In both plans, sCTNN resulted in the lowest mean range
error. The lowest standard deviation in plan 1 was observed for sCTDIR, in plan 2 for
sCTNN. Figure 10 presents boxplots of range errors for individual patients. It visuali-
39
Chapter 2
zes the higher standard deviations of sCTNN and sCTAIC when compared to sCTDIR in
plan 1. Contrary to sCTDIR and sCTAIC, plan 2 resulted in lower range shifts for sCTNN.
sCTDIR and sCTAIC showed increased inter-patient variability in plan 2.
Figure 10. Boxplots of relative range errors for plan 1 (a) and plan 2 (b) for every patient individually. In figure 10(b) whiskers are out of the image frame
for patients 3 (value of −6.5%), 17 (−6.9%), 19 (−5.5%) and 32 (−8.2%).
40
3.4. Patient-specific differences
Besides global differences between the various synthetic CT methods, also pa-
tient-specific variations were observed. Figure 11 presents a selection of these va-
riations to further highlight differences between the synthetic CT methods. Figu-
re 11(a) shows minor dental artifacts of patient 19. On the ground-truth rCT (and
also the pCT) these artifacts are not visible because built-in iterative metal artifact
2
1
correction was used. This technique was not available on the used CBCT scanner.
sCTNN and sCTDIR were able to completely compensate for the dental artifacts, while
sCTAIC still suffered from these artifacts in a similar magnitude as the CBCT. Figure
11(b) depicts more pronounced dental artifacts on the CBCT of patient 26. These
artifacts are in a reduced intensity also visible on the rCT. On sCTNN the artifact was
almost reduced to rCT level. Similar to patient 19, sCTAIC shows artifacts in the same
magnitude as on the CBCT. In total 42% of the patients used in our work showed
some kind of dental artifact.
Figure 11. Patient-specific differences of the various sCT methods. (a), (b) Dental artifacts, (c) artifact
from nasogastric tube, (d) filling of sphenoid sinus on sCTDIR.
In figure 11(c), a nasogastric tube is visible on the CBCT and the rCT of patient
29. This tube lead to an artifact that is present on the CBCT but not on the rCT.
On sCTDIR no artifact but also no nasogastrical tube is visible. This was caused by
not using such a tube during pCT acquisition, which was then deformed to create
sCTDIR. On sCTNN and sCTAIC these artifacts are still visible and uncorrected. Contra-
ry to the dental artifact, which was present in multiple patient cases, the artifact
41
Chapter 2
emerging from the nasogastric tube was only seen in a single patient case.
Patient 21, shown in figure 11(d), has a filled sphenoid sinus (indicated by yellow
box) on sCTDIR. On the two other synthetic images, but also the CBCT and the rCT,
the sphenoid sinus is empty. Similar to the nasogastric tube of patient 29, the sinus
was filled during image acquisition of the pCT but not on the treatment day on
which the rCT and the CBCT were taken.
42
4. Discussion
This work presented a comparison of three methods to create synthetic CTs from
CBCTs in the context of adaptive proton therapy. This included a DCNN, a DIR and
an AIC based method. Our aim was not only to investigate image quality and pro-
ton dose calculation accuracy but also to address practical aspects of synthetic CT
generation in a clinical workflow.
2
1
We used a H&N cancer patient dataset with pCT, rCT and CBCT images from 33
patients. A patient cohort with such an extent was primarily selected to have suf-
ficient training data for the DCNN and is comparable to other studies in the field
(Hansen et al 2018, Kida et al 2018, Landry et al 2019). Since we used a 3-fold cross-va-
lidation approach, the patient number that was available for evaluation was higher
than in previous studies (Hansen et al 2018, Kida et al 2018, Landry et al 2019, Liang
et al 2019).
Image quality wise, the DCNN based method resulted in the lowest average MAE
(37 HU) and the highest DSC (0.96, 200 HU threshold). Average gamma pass ratios
of 99.95% and 99.30% (both plans averaged) were observed for 3%/3 mm and 2%/2
mm criteria, respectively. To our knowledge, this work presented the first dosime-
tric evaluation of the DCNN method with a H&N dataset in the context of adaptive
proton therapy. Our results showed high dosimetric accuracy and potentially sup-
ports its implementation in adaptive workflows. Compared to Hansen et al (2018),
who investigated a DCNN trained with pelvic CBCT projections, and Landry et al
(2019), who used, among others, reconstructed CBCTs of the pelvis, we achieved
higher and more consistent gamma pass ratios using our neural network strategy
for intracranial dose calculations. However, relative to the intracranial area that we
investigated, the pelvic region can be considered more complex in terms of move-
ment and anatomical change.
A MAE of 44 HU and average gamma pass ratios of 99.90% and 98.65% were obser-
ved for sCTDIR with 3%/3 mm and 2%/2 mm criteria, respectively. The lower image
quality compared to sCTNN (MAE of 44 vs. 37 HU) does not have an impact of simi-
lar magnitude on the dose calculations. Since MAE is a voxel-wise image quality
metric, already small misalignments during DIR can lead to a notable increase in
MAE. For proton dose calculations though, the dose distribution and range are de-
fined by all voxels along the beam path. Therefore, a slight registration error will
have no significant impact on the dose calculation accuracy but still show up in
MAE analysis. This explains why although sCTDIR has a higher MAE, dose calcula-
tions are still on a comparable level of accuracy as for sCTNN. Previously reported
43
Chapter 2
good accuracy of proton dose calculations using sCTDIR in the H&N area (Kurz et
al 2016a) was confirmed by our intracranial results. In the case of sCTAIC the lower
image quality noticeably impacted the proton dose distributions. sCTAIC also sho-
wed more variation in gamma pass ratios and range shifts when compared to the
other methods. The underlying algorithm of sCTAIC is currently still under develop-
ment and only an initial version was available for this work.
Deformable image registration played a key role in this work. The used registration
(DIR) method was extensively tested and showed its principle suitability for CT to
CBCT registration in previous works (Janssens et al 2011, Landry et al 2015a, 2015b,
Kurz et al 2016a). In this work, the registration accuracy was visually evaluated by
the authors by mainly checking the alignment of bony anatomy. A figure showing
the improved alignment using DIR can be found in the supplementary materials
(subsection S5).
In adaptive radiotherapy workflows, also time is an important factor to consider,

especially if online adaptive radiotherapy is of interest. With an average duration
of about 20 min, DIR is the slowest method we investigated. For sCTNN about 3 min
were necessary for the conversion. sCT generation was the quickest when using
AIC. Actual conversion times of only a few seconds were observed. Especially the
DCNN method still has the potential for further acceleration. This can be achieved
by either using more powerful hardware or optimizing the DCNN code. Recent re-
ports show conversion times for neural network-based sCTs on the timescale of a
few seconds (Han 2017, Hansen et al 2018, Liang et al 2019). This would be an import-
ant step towards online adaptive proton therapy.
Patient-specific differences showed that sCTNN had a conceptual advantage over

sCTDIR and sCTAIC. It solely requires the CBCT image to generate a synthetic image,
while sCTDIR and sCTAIC also rely on the pCT. This, especially for sCTDIR, causes pro-
blems whenever there is a change between CBCT and pCT acquisition that cannot
be modeled by DIR alone. In our dataset, we observed cavity filling and usage of a
nasogastric tube that lead to such conversion errors. We expect that these issues
occur even more frequently in regions where anatomical changes and interfractio-
nal movement is more prevalent (e.g. thorax, abdomen). Occurrence of such issues
with sCTDIR and strategies to account for them in lung proton therapy have been
reported in literature (Veiga et al 2015, 2016). A downside of the proposed correction
solution is, that it requires manual interference for proper sCT generation. This
obstructs full automatization, which would be a desirable characteristic for a clini-
cal implementation in the context of online adaptive proton therapy.
44
On the other hand, including the pCT in sCT creation also leads to advantages. It
reduces the dependence on the stability and consistency of the CBCT image acqui-
sition. In case of a change in CBCT scanner or imaging parameters, sCTNN would
require new training data acquired with the updated settings. This training data is
usually not available immediately and has to be acquired over a typical timeframe
of months. Furthermore, preparing the data for repeated training of the DCNN is
laborious and time-consuming. Although, it has to be mentioned that, once cli-
2
1
nical imaging protocols have been established, changes in these settings are rare.
To overcome the dependence on imaging parameters, training of the DCNN with a
more inhomogeneous CBCT dataset, including images with different acquisition
settings or even from different scanners, could be performed. This would require a
much bigger dataset than the one we used for this study. sCTDIR and sCTAIC are not
depending on image acquisition parameters in the same way as sCTNN. The HU in-
formation stems exclusively from the pCT. CBCT image parameter changes would
not disturb or invalidate the procedure of creating sCTDIR or sCTAIC.
Dental artifacts in the CBCT further highlighted differences between the sCT met-
hods. For sCTNN, the DCNN learned during training to correct for dental artifacts.
This was possible since the rCT was corrected for metal artifacts and this type of ar-
tifact had a high occurrence in the training data. Other artifacts, such as from a na-
sogastric tube, were not corrected by sCTNN because they were only present in one
patient case. sCTDIR used the metal artifact corrected pCT for image synthesis and
hence showed no dental artifacts. On sCTAIC we did not observe any artifact correc-
tion and they were present in a similar magnitude as on the CBCT. Overall, the data
suggests that manual editing of the synthetic CT can be avoided if the planning and
repeat CTs were edited and the training cohort is large enough to be representative
of all clinical situations, increasing the feasibility of online adaptation.
For a clinical implementation of adaptive proton therapy workflows, a highly au-

tomatized synthetic CT generation is desirable. Direct integration of the discus-
sed sCT methods within the treatment planning system (TPS) would allow for a
seamless automatization of the image generation and consecutive dose calcula-
tion in a clinical workflow. In our work, only the AIC-method was implemented
directly in the TPS but DIR and NN based sCTs could also be implemented and
used within the TPS and as a result support automatization. For sCTDIR, automati-
zed workflows have already been described in the literature (Veiga et al 2015, Qin
et al 2018). Proper quality assurance procedures are necessary to detect failures of
the DIR algorithm. Veiga et al used a semi-automatic correction procedure to cor-
rect problematic regions (2015, 2016). This manual interference might hinder full
automatization. In the case of sCTNN, images are converted outside of the TPS, but
45
Chapter 2
could easily be integrated into clinical workflows. Additional AI-based algorithms

might be able to further assist the generation of sCTs by using deep learning based
segmentation of the patient outline to speed up the process of preparing training
data or by using AI for quality assurance of synthetic CTs (e.g. detecting artifacts or
conversion errors, checking patient position).
For the dosimetric evaluation, we used artificial targets instead of the actual tumor
volume of the patient. This was necessary due to the limited FOV of CBCTs after
cropping the images below the shoulders, prohibiting the use of the original target
structures. However, the creation of artificial target volumes allowed for a more
systematic analysis. A study looking at the consistency of original clinical dose
distributions and recalculated clinical plans based on synthetic CTs is currently in
preparation.
Our investigations were limited to H&N cancer patients. In anatomical sites with
more interfractional movement and more pronounced anatomical changes, we
would not only expect bigger differences between sCT and pCT, but also between
the various sCT generation methods. For sites with significant intrafractional
changes the use of 4D CBCTs and consecutively also 4D sCTs could further benefit
dose calculation accuracy. We aim to extend our work in this direction.
5. Conclusion
In this study, we compared a DCNN based, a DIR based and an analytical image-
based correction method to generate sCTs from CBCTs and investigated their sui-
tability for use in proton dose calculations. Using a DCNN created synthetic CTs
with the highest image quality. In terms of intracranial proton dose calculations,
both the DCNN and DIR resulted in high dosimetric accuracy. The analytical ima-
ge-based method suffered from lower image quality which in turn lead to lower
dose calculation precision. Future work is warranted for the translation of DCNNs
in automated adaptive proton therapy processes.
Acknowledgment
The authors would like to thank Sebastian Andersson from RaySearch for his sup-
port. Furthermore, the release of open-source software tools by the developer
teams of openREGGUI (www.openreggui.org) and Plastimatch (www.plastimatch.
org) and a travel grant for Paolo Zaffino from the European Association for Cancer
Research (EACR) is gratefully acknowledged. This study was financially supported
by a grant from the Dutch Cancer Society (KWF research project 11518).
46
References
Arai K et al 2017 Feasibility of CBCT-based proton dose calculation using a histogram-matching
algorithm in proton beam therapy Phys. Med. 33 68–76
Chen S, Qin A, Zhou D and Yan D 2018 Technical Note: U-net-generated synthetic CT images for
2
1
magnetic resonance imaging only prostate intensity-modulated radiation therapy treat-
ment planning Med. Phys. 45 5659–65
Han X 2017 MR-based synthetic CT generation using a deep convolutional neural network met-
hod Med. Phys. 44 1408–19
Hansen D C, Landry G, Kamp F, Li M, Belka C, Parodi K and Kurz C 2018 ScatterNet: a convolutio-
nal neural network for cone-beam CT intensity correction Med. Phys. 45 4916–26
Harms J, Lei Y, Wang T, Zhang R, Zhou J, Tang X, Curran W J, Liu T and Yang X 2019 Paired cycle-
GAN-based image correction for quantitative cone-beam computed tomography Med. Phys.
46 3998–4009
Hua C, Yao W, Kidani T, Tomida K, Ozawa S, Nishimura T, Fujisawa T, Shinagawa R and Merchant
T E 2017 A robotic C-arm cone beam CT system for image-guided proton therapy: design and
performance Br. J. Radiol. 90 20170266
Hvid C, Elstrom U, Jensen K and Grau C 2018 Cone-beam computed tomography (CBCT) for adap-
tive image guided head and neck radiation therapy Acta Oncol. 57 552–6
Janssens G, Jacques L, Orban de Xivry J, Geets X and Macq B 2011 Diffeomorphic registration of
images with variable contrast enhancement Int. J. Biomed. Imaging 2011 891585
Kida S, Nakamoto T, Nakano M, Nawa K, Haga A, Kotoku J, Yamashita H and Nakagawa K 2018
Cone beam computed tomography image quality improvement using a deep convolutional
neural network Cureus 10 e2548
Kurz C et al 2016a Investigating deformable image registration and scatter correction for CBCT-
based dose calculation in adaptive IMPT Med. Phys. 43 5635–46
Kurz C, Dedes G, Resch A, Reiner M, Ganswindt U, Nijhuis R, Thieke C, Belka C, Parodi K and Lan-
dry G 2015 Comparing cone-beam CT intensity correction methods for dose recalculation
in adaptive intensity-modulated photon and proton therapy for head and neck cancer Acta
Oncol. 54 1651–7
Kurz C, Nijhuis R, Reiner M, Ganswindt U, Thieke C, Belka C, Parodi K and Landry G 2016b Fea-
sibility of automated proton therapy plan adaptation for head and neck Radiat. Oncol. 11 64
Landry G et al 2015a Phantom based evaluation of CT to CBCT image registration for proton the-
rapy dose recalculation Phys. Med. Biol. 60 595–613
Landry G et al 2015b Investigating CT to CBCT image registration for head and neck proton thera-
py as a tool for daily dose recalculation Med. Phys. 42 1354–66
Landry G, Hansen D, Kamp F, Li M, Hoyle B, Weller J, Parodi K, Belka C and Kurz C 2019 Comparing
Unet training with three different datasets to correct CBCT images for prostate radiotherapy
dose calculations Phys. Med. Biol. 64 035011
Li Y, Zhu J, Liu Z, Teng J, Xie Q, Zhang L, Liu X, Shi J and Chen L 2019 A preliminary study of using
a deep convolution neural network to generate synthesized CT images based on CBCT for
adaptive radiotherapy of nasopharyngeal carcinoma Phys. Med. Biol. 64 145010
Liang X, Chen L, Nguyen D, Zhou Z, Gu X, Yang M, Wang J and Jiang S 2019 Generating synthesized
47
Chapter 2
computed tomography (CT) from cone-beam computed tomography (CBCT) using Cycle-
GAN for adaptive radiation therapy Phys. Med. Biol. 64 125002
Lim-Reinders S, Keller B, Al-Ward S, Sahgal A and Kim A 2017 Online adaptive radiation therapy
Int. J. Radiat. Oncol. Biol. Phys. 99 994–1003
Nagarajappa A K, Dwivedi N and Tiwari R 2015 Artifacts: the downturn of CBCT image J. Int. Soc.
Prev. Community Dent. 5 440–5
Nakano H, Mishima K, Ueda Y, Matsushita A, Suga H, Miyawaki Y, Mano T, Mori Y and Ueyama Y
2013 A new method for determining the optimal CT threshold for extracting the upper air-
way Dentomaxillofac. Radiol. 42 26397438
Niu T, Sun M, Star-Lack J, Gao H, Fan Q and Zhu L 2010 Shading correction for on-board cone-
beam CT in radiation therapy using planning MDCT images Med. Phys. 37 5395–406
Park Y K, Sharp G C, Phillips J and Winey B A 2015 Proton dose calculation on scatter-corrected
CBCT image: feasibility study for adaptive proton therapy Med. Phys. 42 4449–59
Peroni M, Ciardo D, Spadea M F, Riboldi M, Comi S, Alterio D, Baroni G and Orecchia R 2012 Auto-
matic segmentation and online virtual CT in head-and-neck adaptive radiation therapy Int.
J. Radiat. Oncol. Biol. Phys. 84 e427–e433
Pileggi G, Speier C, Sharp G C, Izquierdo Garcia D, Catana C, Pursley J, Amato F, Seco J and Spadea
M F 2018 Proton range shift analysis on brain pseudo-CT generated from T1 and T2 MR Acta
Oncol. 57 1521–31
Posiewnik M and Piotrowski T 2019 A review of cone-beam CT applications for adaptive radiothe-
rapy of prostate cancer Phys. Med. 59 13–21
Qin A, Gersten D, Liang J, Liu Q, Grill I, Guerrero T, Stevens C and Yan D 2018 A clinical 3D/4D
CBCT-based treatment dose monitoring system J. Appl. Clin. Med. Phys. 19 166–76
Schulze R, Heil U, Gross D, Bruellmann D D, Dranischnikow E, Schwanecke U and Schoemer E 2011

Artefacts in CBCT: a review Dentomaxillofac. Radiol. 40 265–73
Shi L, Tsui T, Wei J and Zhu L 2017 Fast shading correction for cone beam CT in radiation therapy
via sparse sampling on planning CT Med. Phys. 44 1796–808
Sonke J J, Aznar M and Rasch C 2019 Adaptive radiotherapy for anatomical changes Semin. Radiat.
Oncol. 29 245–57
Spadea M F, Pileggi G, Zaffino P, Salome P, Catana C, Izquierdo-Garcia D, Amato F and Seco J

2019 Deep convolution neural network (DCNN) multi-plane approach to synthetic CT gene-
ration from MR images—application in brain proton therapy Int. J. Radiat. Oncol. Biol. Phys.
105 495–503
Stock M et al 2018 The technological basis for adaptive ion beam therapy at MedAustron: status
and outlook Z. Med. Phys. 28 196–210
Veiga C et al 2016 First clinical investigation of cone beam computed tomography and deforma-
ble registration for adaptive proton therapy for lung cancer Int. J. Radiat. Oncol. Biol. Phys. 95
549–59
Veiga C, Alshaikhi J, Amos R, Lourenco A M, Modat M, Ourselin S, Royle G and Mcclelland J R 2015
Cone-beam computed tomography and deformable registration-based 'dose of the day' cal-
culations for adaptive proton therapy Int. J. Part. Ther. 2 404–14
Veiga C, Janssens G, Baudier T, Hotoiu L, Brousmiche S, Mcclelland J, Teng C L, Yin L, Royle G and
Teo B K K 2017 A comprehensive evaluation of the accuracy of CBCT and deformable registra-
48
tion based dose calculation in lung proton therapy Biomed. Phys. Eng. Express 3 015003
Yan D, Vicini F, Wong J and Martinez A 1997 Adaptive radiation therapy Phys. Med. Biol. 42 123
Yang M, Zhu X R, Park P C, Titt U, Mohan R, Virshup G, Clayton J and Dong L 2012 Comprehensive
analysis of proton range uncertainties related to patient stopping-power-ratio estimation
using the stoichiometric calibration Phys. Med. Biol. 57 4095–115
Zaffino P, Raudaschl P, Fritscher K, Sharp G C and Spadea M F 2016 Plastimatch mabs, an open
source tool for automatic image segmentation Med. Phys. 43 5155–60
2
1
Supplementary materials
S1 List of acronyms
AIC Analytical image-based correction

ART Adaptive radiotherapy
CBCT Cone-beam computed tomography
CNR Contrast-to-noise ratio
DCNN Deep convolutional neural network
DIR Deformable image registration
DSC Dice similarity coefficient
FOV Field-of-view
GPU Graphical processing unit
H&N Head and neck
HU Hounsfield unit
LUT Look-up-table
MAE Mean absolute error
ME Mean error
MRI Magnetic resonance imaging
PBS Pencil-beam scanning
pCT planning CT
RBE Relative biological effectiveness
ROI Region of interest
rCT repeat CT
sCT synthetic CT
sCTAIC synthetic CT from analytical image based correction
sCTDIR Deformable image registration based synthetic CT
sCTNN Neural-network based synthetic CT
SNR Signal-to-noise ratio
SNU Structural non-uniformity
SPR Stopping power ratio
TPS Treatment planning system
49
Chapter 2
S2 Analysis of image noise

The different noise levels in the synthetic images were assessed by calculating sig-
nal-noise-ratios for soft tissue in the brain, fat and muscles and calculating the
contrast to noise ratio between fat and muscle tissue in the neck. For brain tissue
the SNR was calculated for each patient while for muscles and fat it was only pos-
sible to use a subgroup of 13 patients. Reason for this reduced patient number was
the limited FOV and the low amount of fat in the neck for a large amount of pati-
ents. For calculating the SNR in brain tissue we created a single ROI, with an area of
15 by 15 pixels, on a slice containing large amounts of brain tissue (see Figure S1a).
In this ROI we can assume a relatively homogeneous HU signal and quantify the
noise by calculating the signal to noise ratio (SNR). SNR is defined as
Eq S1
where μ is the mean pixel value and sigma the standard deviation of the pixels wit-
hin the ROI. Low noise levels correspond to a high SNR. For fat and muscles we
positioned six smaller ROIs (three for fat, three for muscles), with an area of 7 by 7
pixels each, in the back of the neck and calculated SNRs and the CNR according to
Eq. S2
Eq. S2
where μ_muscle/μ_fat is the mean pixel value in the three muscle/fat ROIs and sig-
ma_i the standard deviation of the pixels within these ROIs. Table S1 shows SNRs of
brain tissue for rCT and the synthetic CT methods for each patient. For the ground
truth rCT we observed an average SNR of 3.1 ± 0.9. Synthetic CTs created with the
neural network show a significantly lower noise level with an average SNR of 14.1 ±
4.0. For sCTDIR we also observed increased SNR of 5.9 ± 1.5 which indicates that the
DIR reduces the noise slightly. The SNR of sCTAIC, 2.2 ± 0.9, is lower than the SNR of
the reference CT scan indicating higher noise levels. The different noise levels are
also visible in Figure 3.
Table S2 presents CNRs and SNRs for muscle and fat tissue. For the reference rCT
we observed an average CNR of 16.1±3.4. Since the noise is significantly lower on
sCTNN the CNR is higher in sCTNN (40.2 ±6.4). For sCTDIR we observe a slightly in-
creased CNR (24.8 ±4.6) when compared to the rCT. This can be attributed to the
noise reduction occurring during DIR. With a value of 10.9 ±2.2, sCTAIC results in a
lower CNR than all the other methods. SNRs in fat and muscle tissues are higher
than in brain tissue which might be caused by the smaller ROIs and more homoge-
nous tissue composition in fat and muscle than in the brain tissue.
50
SNRbrain
Patient rCT sCTNN sCTDIR sCTAIC
1 1.9 18.7 5.9 2.4
2 2.3 12.1 5.9 2.2
3
4
3.9
3.4
8.2
19.4
6.6
7.8
2.3
2.6 2
1
5 3.2 13.4 6.7 1.5
6 3.7 10.9 4.9 1.4
7 3.4 23.3 8.6 3.2
8 2.6 18.9 5.7 1.6
9 2.2 13.5 3.8 1.5
10 3.8 10.9 5.9 2.3
11 3.8 12.1 5.2 2.9
12 4.0 28.0 5.6 1.0
13 4.2 19.0 5.1 2.4
14 4.1 10.7 6.0 2.5
15 2.5 13.8 2.4 0.8
16 4.3 12.8 7.0 3.4
17 1.8 11.3 3.0 1.9
18 4.2 12.1 6.9 2.2
19 4.0 12.3 8.5 6.0
20 1.9 9.9 3.1 2.0
21 3.4 15.3 8.3 2.6
22 3.0 12.3 5.1 1.8
23 2.2 15.9 3.4 1.7
24 5.1 10.5 6.8 3.0
25 4.2 14.0 7.9 1.9
26 4.1 11.6 7.0 1.7
27 3.4 12.0 6.0 1.9
28 1.8 15.5 6.1 2.1
29 2.2 11.4 4.6 1.8
30 2.3 14.1 6.1 2.2
31 1.9 12.6 5.9 2.0
32 2.3 13.9 6.6 1.9
33 2.3 14.6 5.3 2.0
MEAN 3.1 14.1 5.9 2.2
STD 0.9 4.0 1.5 0.9
Table S1. Signal-to-noise ratio (SNR) of rCT, sCTNN, sCTDIR and sCTAIC for the entire dataset.
51
52
CNRmuscle,fat SNRfat SNRmuscle
Chapter 2
Patient rCT sCTNN sCTDIR sCTAIC rCT sCTNN sCTDIR sCTRS rCT sCTNN sCTDIR sCTAIC
1 14.8 25.7 31.5 9.7 8.5 9.0 15.5 4.6 5.9 36.7 16.4 5.5
2 13.1 49.2 28.2 9.2 7.5 28.4 16.2 5.7 5.3 19.5 11.3 3.3
6 16.4 36.3 17.9 9.9 7.6 17.7 7.2 4.9 9.3 19.0 15.0 5.0
7 20.0 47.1 22.8 13.0 11.3 19.6 11.6 7.6 8.2 36.0 11.1 5.1
8 15.6 49.1 21.3 10.3 7.4 30.3 9.8 5.5 8.5 18.7 12.6 4.6
10 16.4 37.8 33.3 9.7 10.8 37.7 22.2 8.0 5.5 9.3 11.1 2.7
11 16.8 36.2 22.8 14.2 7.7 15.1 10.7 7.3 10.1 28.0 12.9 6.7
15 24.2 43.0 27.1 13.9 15.8 26.9 22.2 9.3 8.1 16.0 7.2 4.5
21 13.4 39.0 21.8 11.7 7.1 35.8 9.6 4.0 6.0 9.9 14.5 8.5
25 15.7 45.5 20.6 14.2 7.4 22.2 9.7 7.6 8.8 23.8 11.6 6.3
26 15.3 38.6 23.0 10.9 8.1 17.7 11.0 6.7 6.9 23.1 12.5 3.7
30 18.3 33.6 30.9 7.6 11.5 15.3 18.9 4.3 6.6 20.3 11.4 2.9
32 9.5 41.9 21.6 7.6 6.4 24.9 11.3 3.9 3.1 15.5 10.0 3.7
MEAN 16.1 40.2 24.8 10.9 9.0 23.1 13.5 6.1 7.1 21.2 12.1 4.8
STD 3.4 6.4 4.6 2.2 2.5 8.2 4.8 1.7 1.9 8.1 2.3 1.6
Table S2 Contrast- (CNR) and signal-to-noise ratios (SNR) for muscle and fat tissue for a subgroup of patients.
2
1
Figure S2 Examples of ROI positioning for calculating a) signal-to-noise ratio (SNR) of brain tissue, b)
SNR and contrast-to-noise ratio (CNR) of muscle (yellow) and fat (red) tissue and c) structural non-
uniformity (SNU). index.
S3 Animated CBCT, rCT and synthetic CTs (GIFs)

Animated .gif Images can be found online.
Figure S3 Animated images for patients 15, 26 and 29 cycling through all slices from CBCT, sCTNN,
sCTDIR, sCTAIC and rCT used for this work. Window settings for the CBCT are [WL=-100 HU,WW=1200
HU]. For all other images a window level of 250 HU and a window width of 1250 HU was chosen.
Imaging artifacts of patients 26 (dental artifacts) and 29 (inserted nasogastric tube) are mentioned in
more detail in section 3.5.
S4 Cropping examples
Figure S4 a) Repeat CT of patient 15 before cropping the area used for analysis in this work. b) Repeat
CT after cropping (patient 15). c) More extreme case of cropping with reduced superior and inferior
FOV (patient 32). d) Example case were FOV was also cropped on the dorsal side (patient 31).
53
Chapter 2
S5 Deformable image registration
Figure S5 Colored overlay of CBCT (red) and repeat CT scans (green) to visualize misalignment of
the images a) after rigid registration and b) deformable image registration. After rigid registration
the skull is very well aligned but mandible and spinal cord showed misalignments due to movement
between the scan acquisitions. After deformable image registration, mandible and spinal cord are also
properly aligned. Still misalignments in oral cavity due to changing throat and tongue positions are
visible.
54
Comparison of the suitability of CBCT- And MR-based synthetic CTs
for daily adaptive proton therapy in head and neck patients
Chapter 3
Comparison of the suitability of
CBCT- and MR-based synthetic CTs for
daily adaptive proton therapy in 3
1
head and neck patients
Adrian Thummerer1,6, Bas de Jong1,6, Paolo Zaffino2, Arturs Meijers1, Gabriel Guter-
res Marmitt1, Joao Seco3,4, Roel JHM Steenbakkers1, Johannes A. Langendijk1, Stefan
Both1, Maria F. Spadea2,6, Antje C.Knopf1,6
1 Department of Radiation Oncology, University Medical Center Groningen,

University of Groningen, Groningen, The Netherlands
2 Department of Experimental and Clinical Medicine,
Magna Graecia University,Catanzaro, Italy

4 Department of Physics and Astronomy,
Heidelberg University, Heidelberg, Germany

Carl von Ossietzky Universität Oldenburg,Oldenburg, Germany

Published in:
Physics in Medicine and Biology
November 2020, Volume 65, Issue 23
DOI 10.1088/1361-6560/abb1d6
55
Chapter 3
Abstract
Cone-beam computed tomography (CBCT)- and magnetic resonance (MR)-images

allow a daily observation of patient anatomy but are not directly suited for accura-
te proton dose calculations. This can be overcome by creating synthetic CTs (sCT)
using deep convolutional neural networks. In this study, we compared sCTs based
on CBCTs and MRs for head and neck (H&N) cancer patients in terms of image
quality and proton dose calculation accuracy.
A dataset of 27 H&N-patients, treated with proton therapy (PT), containing plan-

ning CTs (pCTs), repeat CTs, CBCTs and MRs were used to train two neural net-
works to convert either CBCTs or MRs into sCTs. Image quality was quantified by
calculating mean absolute error (MAE), mean error (ME) and Dice similarity co-
efficient (DSC) for bones. The dose evaluation consisted of a systematic non-cli-
nical analysis and a clinical recalculation of actually used proton treatment plans.
Gamma analysis was performed for non-clinical and clinical treatment plans. For
clinical treatment plans also dose to targets and organs at risk (OARs) and normal
tissue complication probabilities (NTCP) were compared.
CBCT-based sCTs resulted in higher image quality with an average MAE of 40 ± 4

HU and a DSC of 0.95, while for MR-based sCTs a MAE of 65 ± 4 HU and a DSC of
0.89 was observed. Also in clinical proton dose calculations, sCTCBCT achieved hig-
her average gamma pass ratios (2%/2 mm criteria) than sCTMR (96.1% vs. 93.3%).
Dose-volume histograms for selected OARs and NTCP-values showed a very small
difference between sCTCBCT and sCTMR and a high agreement with the reference
pCT.
CBCT- and MR-based sCTs have the potential to enable accurate proton dose cal-
culations valuable for daily adaptive PT. Significant image quality differences were
observed but did not affect proton dose calculation accuracy in a similar manner.
Especially the recalculation of clinical treatment plans showed high agreement
with the pCT for both sCTCBCT and sCTMR.
56
1. Introduction
Adaptive proton therapy (PT) attempts to spare healthy tissue and simultaneously
increase the dose to tumor cells by reacting to interfractional anatomical changes
with treatment plan adaptations (Lim-Reinders et al 2017, Albertini et al 2019). To
monitor these anatomical changes and deploy adaptive workflows, repeated ima-
ging throughout the treatment course is a necessity. In current clinical practice, it is
3
1
feasible to acquire conventional fan-beam computed tomography (CT) images on
a weekly basis to observe the patient anatomy. However, these weekly CT acquisi-
tions require a strong clinical motivation, since they come at the cost of additional
imaging dose and increase the clinical workload. Recent literature suggests that
PT plans should be adapted as soon as unusual anatomical variations occur (Hoff-
mann et al 2017, Nenoff et al 2019). In the future, online adaptive PT might be worth
striving for (Albertini et al 2019). That would imply the necessity of daily or online
repeated imaging.
As an alternative to using conventional fan-beam CTs for repeated image acquisi-

tion, cone-beam computed tomography (CBCT) or magnetic resonance (MR) ima-
ging could be employed. In some PT centers, daily CBCT scans are already routine-
ly acquired for accurate patient alignment (Hua et al 2017, Stock et al 2018). CBCT
images provide the patient anatomy of the day and can be used as basis for daily
adaptive workflows. With MR, volumetric images can be acquired without ionizing
radiation and with superior soft tissue contrast. In current PT practice, MRs are
acquired in the planning stage to aid delineations of target volumes and organs
at risk (OAR, Karlsson et al 2009, Kupelian and Sonke 2014). Daily (online) in-room
MR-image acquisition for PT is not yet clinically available. However, in sight of the
rapid adoption of online MR-guided adaptive photon therapy, within few years,
simple prototype systems for PT will likely exist, and in a decade, we could envisage
coupled MR PT systems with integrated gantries (Oborn et al 2017, Hoffmann et al
2020).
The distinct advantages of CBCT and MR-systems make both favorable imaging
modalities for daily or online adaptive treatment strategies. However, both ima-
ging modalities are not directly suited for accurate proton dose calculations. Be-
cause of the imaging geometry, CBCT images suffer from severe scatter artifacts
that impair the CT-number accuracy and as consequence the conversion into pro-
ton stopping power ratios (SPR), which are required for proton dose calculations.
MR-image intensities correlate with magnetic relaxation properties of hydrogen
atoms and thereby do not allow a derivation of electron densities and SPRs. The
deficiencies of CBCTs and MRs can be overcome by creating so called synthetic CTs
57
Chapter 3
(sCTs), often also referred to as virtual CTs or pseudo CTs. They act as a surrogate
for CT images, containing accurate electron density information (HU-intensities)
and enable proton dose calculations.
To enable dose calculations, various techniques to generate sCTs based on CBCT-

and MR-images have been proposed in literature. Only a few have been assessed in
the context of proton dose calculation accuracy. For CBCTs, projection- and defor-
mable image registration (DIR)-based techniques have shown promising proton
dose calculation accuracy. This includes anatomical locations such as lung (Veiga
et al 2015, 2016), pelvis (Park et al 2015, Kurz et al 2016) and head and neck (Landry et
al 2015, Kurz et al 2015). A downside of these methods is that they require a patient
specific planning CT (pCT) to generate sCTs. For MR-to-sCT conversion, the inves-
tigated anatomical locations include brain (Koivula et al 2016, Pileggi et al 2018, Spa-
dea et al 2019), prostate (Maspero et al 2018, Liu et al 2019), head and neck (Guerreiro
et al 2017) and pediatric patients with abdominal tumours (Guerreiro et al 2019).
In recent years, technological development lead to significant progress in the field

of artificial intelligence and deep learning. These developments have been trans-
lated to the field of medical physics and radiotherapy (Meyer et al 2018, Shen et
al 2020). Deep learning techniques, such as deep convolutional neural networks
(DCNNs) and generative adversarial networks (GANs), have shown their potential
for sCT generation based on CBCTs and MRs (Han 2017, Kida et al 2018, Maspero et
al 2018, Wang et al 2019, Liang et al 2019). DCNNs are trained with paired MR/CT or
CBCT/CT images and learn a nonlinear mapping of intensities from the original
imaging modality, CBCT or MR, to CT. sCTs, based on deep learning approaches,
have recently been discussed for adaptive PT (Hansen et al 2018, Liu et al 2019, Lan-
dry et al 2019) and have shown promising performance when compared to previous
techniques in various anatomical locations (Arabi et al 2018, Thummerer et al 2020).
Previous studies have only looked into either CBCT- or MR-based sCTs and only
a limited number of studies investigated the suitability of the resulting sCTs for
proton dose calculations. Our aim here is to perform a direct comparison of MR-
and CBCT-based sCTs, generated using the same DCNN network architecture, for a
comprehensive head & neck patient cohort. By simultaneously assessing the dosi-
metric suitability of CBCT- and MR-based sCTs for PT, we will identify differences
relevant for their employment in daily or online adaptive workflows.
58
2. Materials and methods

2.1. Patient dataset
To evaluate the performance of CBCT- and MR-based sCTs, imaging data from
27 head and neck cancer patients who received a PT treatment at the University
Medical Center Groningen (UMCG) were used. The included patients were aged
between 27 and 79 years (mean age: 62) and 2/3 were of male sex. Out of the 27
patients, 26 received primary RT (14 patients chemoradiation, 8 conventional RT,
2 RT + cetuximab and 2 accelerated RT) and one postoperative RT. For 24 patients
3
1
the tumor was located in the pharynx, for two in the oral cavity and for one in the
larynx. Tumors had varying extent (T-stage 1–4) and spread to regional lymph no-
des (N stage 0–3). The datasets included pCTs, repeated CTs (rCT), CBCTs and MR-
images. pCTs were acquired on a Siemens SOMATOM Definition AS Open scanner
(Siemens Healthineers, Germany) and rCT scans on a Siemens SOMATOM Confi-
dence scanner. Similar imaging protocols were used for pCT and rCT and besides
being acquired on different scanners, pCT and rCT can be consider equal in image
quality. CBCTs were acquired with the onboard imaging device of an IBA Pro-
teus®PLUS gantry (IBA, Belgium), using a 190°-arc with a rotation speed of 4.9°/s,
a total projection number of 258 and an acquisition time of 39 s. Detailed imaging
parameters for pCT, rCT and CBCT are presented in table 1. MR scans were perfor-
med on a 3 T Siemens MAGNETOM Skyra system after administration of a single
dose of gadoterate meglumine (0.2 ml kg−1) contrast agent. A 3D spoiled gradient
recalled echo (SPGRE) sequence was used to generate MR-images. MR-imaging
parameters were: echo time = 2.46 ms, repetition time = 5.5 ms, flip angle = 9 de-
gree, FOV = 229 mm × 237 mm × 229 mm, bandwidth = 455 hz px−1 and acquisition
time = 42 s. Additional parameters are provided in table 1. In the case of pCT and
MR, which were usually acquired for planning purposes several weeks before the
start of treatment, only one instance was available. rCT scans and CBCTs on the ot-
her hand were acquired periodically (rCTs weekly, CBCTs daily) during treatment
progression. Radiotherapeutic immobilization devices were used during all image
acquisitions to assure consistency of patient immobilization.
2.2. Neural network training
Both, CBCT- and MR-based sCTs, were created utilizing the same DCNN architec-
ture, initially described in the work of Spadea et al (2019) The DCNN consists of an
encoding and decoding path to convert either CBCT-HU or MR intensities into CT
numbers with an accuracy comparable to pCT scans. For training of the networks,
mean absolute error (MAE) between sCT and ground truth 'pCT' (in case of MR)
or 'rCT' (CBCT) was used as similarity metric in the loss function. Following the
59
Chapter 3
approach of Spadea et al (2019), three individual networks were trained with axial,
coronal or sagittal slices exclusively. After training, images from each view were
combined into the final sCT. A three-fold cross validation approach was chosen.
Therefore, patients were randomly divided into three equal sets of nine patients.
Two sets were then used for training and one for evaluation. This was repeated so
each set was used for evaluation once. Additionally, two cases from each training
set were withheld as validation cases during training. When no improvement in
validation loss was observed within five consecutive epochs, the network training
was stopped.
CBCT pCT rCT MR
Scanner IBA Proteus PLUS Siemens SOMATOM Siemens SOMATOM Siemens MAGNE-
Gantry Definition Confidence TOM Skyra 3T
Voltage [kVp]
100 120 120 -
Current [mA]
160 61-165 (min-max) 61-165 -
Exposure
12.5 1106 1106 -
Acq. matrix 512x512x 512x512x 512x512x 256×264×

140 (181-262) (181-262) 256
Resolution [mm]
0.5×0.5×2.5 1.0×1.0×2.0 1.0×1.0×2.0 0.9×0.9×0.9
FOV [mm] RL:260 AP: 260 500,500, 500,500, 229,237,

IS: 350 362–524 362–524 229
Table 1. Summary of imaging parameters of CBCT, pCT, rCT and MR. For tube current, acquisition
matrix and FOV, min-max ranges are reported for some modalities.
2.3. CBCT data preparation
The DCNN requires paired image sets of CBCTs and CTs to successfully learn a con-
version of image intensities. For each patient, the first acquired rCT and a CBCT
from the same day were selected to minimize anatomical differences. Plastimatch,
an image processing toolbox (www.plastimatch.org, Zaffino et al 2016), was used to
automatically segment the patient outline on CBCT and rCT. The resulting masks
were manually edited to assure full patient coverage. Voxels outside these masks
were set to a HU value of − 1000. An additional crop was performed to remove the
area below the shoulders. This was necessary because a low dose imaging proto-
col was used for CBCT acquisition and led to scatter artifacts and very poor image
quality in this area. Afterwards, a rigid registration utilizing Plastimatch and a
DIR were performed. For DIR, a diffeomorphic morphons algorithm with a 4 level
60
resolution pyramid was used. This algorithm is implemented in the open-source

MATLAB toolbox openREGGUI (www.openreggui.org). The principal suitability of
this algorithm for CBCT to CT image registration has been demonstrated previous-
ly (Landry et al 2015, Kurz et al 2016). As a last step, masks from CBCT and CT were
combined to only include voxels present in both CBCT and rCT. The deformed and
masked CBCT-rCT image pairs were then used to train the CBCT-network.
2.4. MR data preparation
In contrast to the CBCT-network, the MR-network was trained with pCT-MR

3
1
image pairs. This was advantageous since both images were acquired either on the
same day or with a maximum of one day in-between. Similarly, to CBCT and rCT,
the patient outline was segmented on MR and pCT. For the pCT, voxels outside
the patient were set to −1000 HU, while for MRs a value of 0 was assigned to this
area. An initial rigid registration was performed using Plastimatch. For DIR of pCT
and MR, the Elastix (Klein et al 2010, elastix.lumc.nl/) registration toolbox with a
three level resolution pyramid was used. Because of the multimodal imaging data,
registration of pCT and MR was more challenging than the CBCT-rCT registration.
Mutual information was used as similarity metric and a penalty, with a weight-
ing of 600:1, was introduced to suppress un-anatomical deformations (Staring et
al 2007). Afterwards masks were combined, and the training of the MR-DCNN was
performed.
2.5. Evaluation of image quality
Anatomical differences between pCT, CBCT and MR had to be minimized to allow a

meaningful comparison of conversion characteristics. MR-images were already de-
formably registered to the pCT during data preparation. CBCTs, on the other hand,
were registered to the first available repeat CT for training of the DCNNCBCT. The-
refore, prior to the conversion into sCTs, CBCTs were also deformed to the pCT
using openREGGUI. This further minimized anatomical differences and allowed to
focus almost entirely on the conversion characteristics of the DCNNs and elimi-
nated the influence of anatomical differences. The pCT was used as 'ground truth'
for image quality and the dosimetric evaluation. MAE (Equation 1) and mean error
(ME, Equation 2) were used to evaluate the similarity between the pCT and the sCT
in terms of Hounsfield units.
Eq. 1
Eq.2
61
Chapter 3
Furthermore, average MAE spectrums for CBCT and MR were calculated by bin-
ning voxels in HU intervals of 20 HU and calculating the MAE for each bin. Error
bars were added to visualize the standard deviation within the dataset. To analyze
the similarity in bones, the Dice similarity coefficient (DSC, Equation 3) was cal-
culated for various threshold levels between 100 and 1000 HU. All image quality
metrics were calculated within the union of patient contours of pCT, CBCT and MR
to enable a meaningful comparison.
Eq. 3
2.6. Evaluation of proton dose calculation accuracy
To determine the proton dose calculation accuracy of CBCT- and MR-based sCTs,
we performed two types of dosimetric analysis. Firstly, non-clinical single-beam
proton treatment plans were used to systematically investigate proton dose accu-
racy and range errors introduced by sCT conversion. Secondly, clinically used treat-
ment plans were recalculated on the sCTs to show the accuracy in clinical condi-
tions.
For the systematic evaluation, an intracranial target was defined in the brainstem
region using the treatment planning system RayStation (RaySearch, Sweden). This
target was irradiated with a single field from a 45-degree gantry angle and a ho-
mogenous dose of 2 GyRBE (constant RBE of 1.1). The dose was calculated using the
RayStation Monte Carlo dose engine on a 1 mm isotropic dose grid. For comparison
between the sCT dose distributions and the reference dose, which was calculated
on the pCT, we performed a gamma analysis with 2%/2 mm and 3%/3 mm passing
criteria. Furthermore, range uncertainties introduced by the sCT conversion were
investigated by calculating range shifts. Range shifts were determined by shifting
depth-dose profiles to minimize the sum of squared differences between sCT and
pCT profiles. Only profiles with a maximum dose of at least 80% of the planned
dose were included for this range error assessment.
Clinically used proton treatment plans, based on the pCT, were recalculated on the
CBCT- and MR-based sCTs. This allowed a comparison of the resulting dose dis-
tributions and thereby an assessment of the clinical suitability of the sCTs. Since
CBCT and MR were deformably registered to the pCT, OARs and target volumes
could be transferred from the pCT to both sCTs. Because of the cropping of CBCTs
during data preparation, sCTs were not always covering the entire low-risk CTVs
of the clinical treatment plans. To still allow a clinical plan recalculation, using
original target volumes, parts of the pCT (e.g. headrest, couch and shoulder area)
62
were stitched to the sCT. A visualization of the cropping and stitching procedu-
re is available in the supplementary materials (available online at stacks.iop.org/
PMB/65/235036/mmedia). The clinical treatment plans consisted of two CTVs. The
first one targeted the primary tumor and pathological lymph nodes and was irra-
diated with 70 GyRBE. The second one was used to irradiate the elective lymph node
areas with 54.25 GyRBE. In most cases, the primary tumor was within the region
covered by the sCT, while the elective area, extending towards the lower neck, also
had substantial parts of its volume on the stitched pCT.
Similar to the systematic dose analysis using single-beam plans, we calculated

3
1
gamma pass ratios for 2%/2 mm and 3%/3 mm criteria. To eliminate the influen-
ce of the pCT on the clinical gamma analysis, a mask, corresponding to the syn-
thetic part of the image, was applied to the dose volume. We also compared the
mean dose, calculated on the pCT, sCTCBCT and sCTMR, for target volumes (CTV)
and selected OARs (brainstem, mandible, parotid glands, submandibular glands
and inferior-, middle- and, superior-pharyngeal constrictor muscles). The selected
OARs were almost always fully covered by the synthetic part of the stitched image.
Exceptions included nine patients where a minor part of the CTV also extended
towards the lower part of the neck and five patients where the inferior pharyngeal
constrictor muscle (PCM) was not entirely covered by the sCTs.
In addition, we used normal tissue complication probability (NTCP) models for

xerostomia (dry mouth) and dysphagia (swallowing difficulties) to investigate
differences between pCT and sCTs. NTCP models establish a relation between the
dose to certain OARs and the probability of radiation induced side effects. Clini-
cally, NTCP models are used in the so called 'model-based approach' for treatment
selection (e.g. photon vs. proton radiotherapy) (Langendijk et al 2013, Widder et al
2016). We made use of models for xerostomia (≥ grade 2 and ≥ grade 3) and for
dysphagia (≥ grade 2 and ≥ grade 3) which have recently been defined in the Dutch
nationwide indication protocol for PT (LIPPv2.2) of head and neck cancer patients.
All included patients qualified for PT based on such NTCP models.
63
Chapter 3
3. Results
3.1. DCNN training
Neural network training was stopped if the validation loss did not decrease within
five consecutive epochs. This condition was reached after 9–26 epochs. Detailed
numbers for each fold and anatomical view can be found in the supplementary ma-
terials. The neural network was implemented using the python framework Theano
(the Theano Development Team et al 2016). A Nvidia GeForce 1080TI was used for
training and validation purposes. With this configuration, the training duration of
a single epoch was approximately two hours for axial trainings and four hours for
sagittal/coronal trainings. This variation is caused by the difference in slices avai-
lable for each view. After training, the conversion of an entire CBCT or MR-image
took approximately three minutes (axial, sagittal and coronal view combined).
3.2. Evaluation of image quality
Central slices of axial, sagittal and coronal views of CBCT, MR, sCTCBCT, sCTMR
and the reference pCT are presented for patient 20 in figure 1. A Hounsfield-unit
window of 1250/250 was applied to all images (except CBCT). In figure 2(a) slices
from sCTCBCT and sCTMR have been subtracted from the corresponding pCT slices
to create difference images. This reveals that MR-based sCTs have higher errors in
bone tissue and at tissue boundaries. This can be a result of geometric distortions
of the MR-images and the more difficult image registration between MRI and CT
compared to CBCT and CT. In soft tissue, sCTMR and sCTCBCT show a comparable
error magnitude. Figure 2(b) shows selected details of pCT, sCTCBCT and sCTMR to
highlight the differences in bone structures. The loss of bone-details in sCTMR is
clearly visible.
The mismatch is quantified in figure 3, which shows MAE of sCTCBCT and sCTMR for
all patients. On average CBCT-based sCTs resulted in a MAE of 40.2 ± 3.9 HU and
a ME of − 1.7 ± 7.4 HU. For sCTMR a significantly higher MAE of 65.4 ± 3.6 HU and
a comparable ME of 2.9 ± 9.4 HU was observed. These results confirm the visual
impression of figures 1 and 2. Additional image metrics (PSNR and SSIM) are pre-
sented in the supplementary materials (Section C).
Figure 4 depicts the average DSC for various thresholds between 100 and 1000 HU.
The highest DSC, with a value of 0.95 for sCTCBCT and 0.89 for sCTMR, was observed
for a threshold of 200 HU. With increasing threshold values, which corresponds to
increasing bone density, the DSC decreases down to 0.91 for sCTCBCT and 0.81 for
64
sCTMR at a threshold of 1000 HU.
3
1
Figure 1. Overview showing axial, sagittal and coronal view of CBCT, MR, sCTCBCT, sCTMR and the
reference pCT. A Hounsfield-unit window of 1250/250 was used (except CBCT).
Figure 2. (a) Difference image sCTCBCT-pCT and sCTMR-pCT. (b) Image details that show the difference
in bone structures of pCT, sCTCBCT and sCTMR.
An average MAE spectrum for sCTCBCT and sCTMR is reported in figure 5. The stan-
dard deviation among all patients is indicated by the shaded areas. CBCT- and MR-
based sCTs follow a similar trend although, as expected from findings presented
in the previous figures, the CBCT spectrum shows lower MAE over the entire HU
range (−1000 HU to 1500 HU). The grey area indicates the HU-range where partial
volume artifacts are partially responsible for increased MAE. Overlapping with the
65
Chapter 3
MAE-spectrum, an average image histogram is presented. This shows that the ove-
rall MAE is mainly determined by soft tissue and that the MAE for bone structures
but also for air cavities is higher.
Figure 3. MAE for sCTCBCT and sCTMR for all patients individually.
Figure 4. DSC for sCTCBCT and sCTMR for bone thresholds between 100 and 1000 HU.
3.3. Evaluation of proton dose calculation accuracy
3.3.1. Gamma analysis

Performing gamma analysis of dose distributions based on the single-beam plan
and using 2%/2 mm criteria resulted in mean pass ratios of 99.31% with a standard
deviation (SD) of 0.80% for sCTCBCT and 98.22% with a SD of 1.88% for sCTMR. Ave-
rage passing ratios for the 3%/3 mm acceptance criteria were 99.97% (sCTCBCT, SD:
0.08%) and 99.42% (sCTMR, SD: 1.25%). For the clinically used treatment plans
66
3
1
Figure 5. MAE spectrum of sCTCBCT and sCTMR. The dashed black line shows an average image
histogram. The grey area indicates the HU area of partial volume effects that are responsible for a large
error contribution. The error bars indicate 1 SD.
Figure 6. (a) Gamma pass ratios for single beam plans using 2%/2 mm passing criteria for sCTCBCT and
sCTMR. (b) Gamma pass ratios for clinical treatment plans using 2%/2 mm passing criteria.
67
Chapter 3
lower average pass ratios were observed. The 2%/2 mm passing criteria resulted
in mean pass ratios of 96.57% for sCTCBCT (SD: 3.26%) and 93.45% for sCTMR (SD:
3.42%). The less strict 3%/3 mm criteria lead to mean pass ratios of 98.77% (SD:
1.17%) and 97.04% (SD: 1.75%) for sCTCBCT and sCTMR, respectively. Figure 6(a) pre-
sents pass ratios of the single beam plan for the stricter 2%/2 mm acceptance cri-
teria for the entire dataset. Figure 6(b) shows a similar plot for the clinically used
treatment plans.
3.3.2. Range error

The single beam plan was used to assess the range error between pCT and sCTs and
results are shown in figure 7. For sCTCBCT the median range error is always within
± 2% and only for a few patients whiskers, indicating maximum/minimum values,
are above or below ± 2%, indicating good agreement between sCTCBCT and pCT. For
sCTMR larger range errors were observed. Although median ranger errors and also
25th to 75th percentile range are comparable to sCTCBCT, sCTMR shows significantly
larger maximum range deviations (indicated by the whiskers). This might be cau-
sed by the higher reconstruction errors in small bone structures on sCTMR.
Figure 7. Range shifts for sCTCBCT (top) and sCTMR (bottom) calculated using the single beam plans.
The dotted line indicates ± 3% range error.
3.3.3. Dose-volume parameters

Figures 8(a) and (b) compare the absolute and relative difference in mean dose of
68
selected OARs between pCT and sCTCBCT/MR. Highest absolute and relative dose dif-
ferences were observed for the inferior PCMs for sCTMR. Together with superior and
middle PCM and the oral cavity, this structure is relatively close to the upper air-
ways and is influenced by the inconsistent positioning caused by swallowing and
breathing motions between and during image acquisitions of CBCT, MR and pCT.
Therefore, these larger errors are not solely caused by conversion errors of the sCTs
but also influenced by anatomical differences.
3
1
Figure 8. (a) Absolute dose difference of selected organs at risk (OARs) for sCTCBCT and sCTMR, (b)
relative dose difference for OARs, (c) absolute dose difference for target volumes and (d) relative dose
difference for target volumes. Whiskers extend to 1.5 IQR and outliers are marked by red and blue dots.
69
Chapter 3
A difference between sCTCBCT and sCTMR is mainly present in the PCMs, for other
OARs sCTCBCT and sCTMR show similar absolute and relative dose differences. In a
similar manner, relative and absolute differences in mean dose to CTV-targets of
sCTCBCT and sCTMR were compared to the pCT (figures 8(c) and (d)). For sCTCBCT ab-
solute dose differences for CTVs were within ± 0.1 Gy. Also, sCTMR resulted in low
dose errors for CTVs with all values between − 0.2 and + 0.1 Gy.
Figure 9. Dose-volume histograms of target volumes (70 Gy) and selected organs at risk (OARs) for (a)
best-case scenario, (b) worst-case scenario. The solid line represents the pCT, the dotted line sCTCBCT
and the dashed line sCTMR.
In figure 9, DVHs for OARs and targets are presented for 'worst' and 'best' case
scenarios. These scenarios were defined based on the gamma analysis of clinical
treatment plans. With 98.5%, patient 11 resulted in the highest 2%/2 mm pass ratio
(sCTMR) and was selected for the 'best-case' scenario. Patient 24 showed the lo-
west pass ratio (87.7%) on sCTMR and was therefore used to illustrate the worst-ca-
se scenario. Excellent agreement of DVH-curves between pCT, sCTCBCT and sCTMR
was observed for the 'best case'. The 'worst-case' scenario reveals some deviations
in OARs, especially in the PCMinf and the oral cavity. One must consider that these
OARs are close to moving structures which can have a significant influence on the
70
dose distribution. The worst-case scenario shows that even if there is a significant
difference in the global dose distribution, indicated by the low gamma pass ratio,
the dose to the target volumes and OAR is not disturbed in a similar manner. The
worst-case scenario does not contain the worst-case for each OAR. As seen in figure
8(b), relative mean dose differences of up to 8% were observed for some OARs in
some patients.
3
1
Figure 10. Normal tissue complication probability for (a) dysphagia grade 2 or higher and (b) xerost-
omia grade 2 or higher calculated on pCT, sCTCBCT and sCTMR.
3.3.4. NTCP evaluation

Figure 10 compares the NTCP for dysphagia (Figure 10a) and xerostomia (Figure
10b) of grade two or higher, calculated on pCT, sCTMR and sCTCBCT. The data in fi-
gure 10 shows that there is a very good agreement between NTCP calculated on
the reference pCT and both sCTs. For dysphagia, grade 2 or higher, the maximum
ΔNTCP, defined as NTCPCBCT/MRI - NTCPpCT, was 2.0% for sCTMR (patient 8) and 1.4%
for sCTCBCT (patient 18).
71
Chapter 3
The mean ΔNTCP value for the entire patient cohort was − 0.1 ± 0.7% for sCTMR
and − 0.1 ± 0.5% for sCTCBCT. For xerostomia (grade 2 or higher) maximum ΔNTCP
values were 0.5% for sCTMR (patient 18) and − 0.68% for sCTCBCT (patient 12). The
mean ΔNTCP value for xerostomia was 0.0 ± 0.2% for both sCTMR and sCTCBCT. For
dysphagia and xerostomia grade 3 or higher similar results were observed. Figures
for grade 3 are presented in the supplementary materials. Due to the low ΔNTCP
values, all investigated patients would have also qualified for PT if the planning
comparison would have been performed on sCTCBCT or sCTMR.
72
4. Discussion
The necessity of accurate volumetric images for daily or online adaptive PT is un-
questioned. Various image modalities are potentially suited to provide an up to
date representation of the patient anatomy. In a daily adaptive workflow both
CBCTs or MRs could be deployed, but MRs might be more suited due to the absence
of additional imaging dose. However, it is not clear which imaging modality results
3
1
in sCTs with the best image quality and subsequently the most accurate proton
dose calculations. For both CBCTs and MRs, various methods to generate sCTs have
been proposed. This work aimed at comparing CBCT- and MR-based sCTs for a
common set of patients. sCTs were generated using a DCNN and evaluated in terms
of image quality and proton dose calculation accuracy. Thereby we could identify
characteristics relevant for daily adaptive PT.
Visual comparison of sCTCBCT, sCTMR and the ground-truth pCT images revealed
higher image fidelity for sCTCBCT then for sCTMR. Especially in areas with fine bone
structures, sCTCBCT showed more details than sCTMR. This was confirmed by quanti-
tative image similarity metrics, such as MAE (sCTCBCT: 40.2 HU vs. sCTMR: 65.4 HU),
ME (sCTCBCT: − 1.7 HU vs. sCTMR: 2.9 HU) and the DSC of bony anatomy (sCTCBCT:
0.95 vs. sCTMR: 0.89). This quite clear image quality difference can be explained
by two main reasons. Firstly, CBCT and the reference pCT are both based on the
same physical principal to generate a volumetric image, the interaction of x-rays
with tissues of different electron density. For MR-imaging, the underlying physi-
cal mechanism is fundamentally different. Image intensities do not correlate with
electron density and show a different contrast than CT images. As a consequence,
the sCT generation based on MR-images is more challenging for the DCNN than a
conversion based on CBCTs. Secondly, and this is also connected to the image in-
tensities, image registration between MR and CT images is more challenging than
a registration between CBCTs and CTs. This has a direct influence on the training of
the DCNN, which depends on paired CBCT-CT and MR-CT image sets. Furthermo-
re, we assume that CBCT or MR and the reference pCT are perfectly aligned when
we calculate image similarity metrics on a voxel by voxel wise manner. This means
that a slight registration error can lead to increased (MAE, ME)/decreased (DSC) si-
milarity metrics during image evaluation. However, the slight misalignment of MR
and CT should only have minor influence on our results since we visually observed
clear differences between the images and we carefully optimized the registration
between MR and CT.
For sCTCBCT the obtained MAE is comparable to Maspero et al (2020) who achieved
a MAE of 51 ± 12 HU for head and neck patients using a cycle-consistent GAN. Chen
73
Chapter 3
et al (2020) achieved a significantly lower MAE of 19 HU for head and neck cancer
patients but also used a registered pCT, together with the CBCT, as input images for
a U-net neural network. For sCTMR, good agreement with the patch based 3D-con-
volutional network of Dinkla et al (2019) was achieved. For head and neck patients
they reported a MAE of 75 ± 9 HU. For brain tumor patients Spadea et al (2019) re-
ported a slightly lower MAE of 54 ± 7 HU using the same DCNN-architecture as in
this work.
Results from proton dose calculations using single-beam plans confirm the fin-
dings from the image quality analysis. On average, sCTCBCT resulted in a 2%/2 mm
gamma pass ratio of 99.3% (SD: 0.8%) which is slightly higher than the mean pass
ratios of 98.2% (SD: 1.9%) observed for sCTMR. The dosimetric differences between
sCTCBCT and sCTMR for the single beam proton plans seem to be not as pronounced
as the image quality differences. The recalculation of clinical treatment plans lead
to overall lower pass ratios of 96.6% (SD: 3.3%) for sCTCBCT and 93.5% (SD: 3.4%) for
sCTMR (2%/2 mm criteria). The used clinical treatment plans for head and neck can-
cer patients usually consisted of four beam angles and covered a much larger area
than the single beam plans with the artificially created target volume. The target
area of clinical plans involved the entire neck while the artificial target was positio-
ned intracranial. Thus, the clinical target area can be considered more challenging
than the intracranial target, since the neck is more susceptible to anatomical chan-
ges and positioning variations.
The analysis of the mean dose to selected OARs and target volumes and the dose-
volume histograms revealed that the lower image quality of sCTMR seems to have
a measurable effect on the global dose distribution (gamma pass ratio) but a very
low effect on the actual dose to anatomical structures (targets volumes and OARs)
relevant for treatment planning and dose calculations. For target volumes (PTV
and CTV) a maximum absolute dose deviation of 0.2 Gy was observed. The clinical
relevancy of sCTs was further confirmed by the very high agreement of NTCP-va-
lues calculated on pCT and sCTs. Since pCT and sCTs were deformably registered
during data preparation, target and OAR delineations could be transferred from
the pCT to the sCTs. However, especially in soft tissues DIR is challenging and can
lead to errors. This could be overcome by experts delineating OARs and targets in-
dividually on each image. Although, for the extent of the dataset we used, this was
not feasible. In the future, deep learning auto segmentation might also enable de-
lineation on the sCTs.
Training the networks with image pairs acquired on the same day (rCT-CBCT and
pCT-MR) insured equal learning conditions for MR-and CBCT-based networks.
74
For evaluation and comparison of sCTs however, CBCTs had to be registered to the
pCT as well. Since the time between pCT and first CBCT is around three weeks, this
could have introduced a small bias towards MR-based sCTs, which were acquired
on the same day as the reference pCT. DIR between CBCT and pCT was used to
minimize this effect.
In contrast to CBCTs, various acquisition sequences and techniques, that alter the
appearance and tissue contrast, exist for MR. In literature, a variety of sequences
has been used for sCT generation based on neural networks. Since we used retro-
spective clinical data, we had no influence on the acquired MR-sequences. The used
3
1
sequences are routinely acquired for head and neck cancer patients and therefore
have clinical relevancy and represent data that is already available. We chose the
in-phase image of a 3D SPGRE sequence since it resulted in images with the hig-
hest resolution and visual quality. The chosen sequence was also used for creation
of sCTs in previous works reported in literature (Maspero et al 2018, Florkow et al
2020). Contrast agents were used for MR-image acquisition and could in principle
interfere with the neural networks ability to learn a correct translation of MR to CT
intensities. In our resulting sCTs we did not observe any visual impairment caused
by the contrast agent, but a general influence on the training performance cannot
be ruled out.
The used network architecture, which is derived from a U-net, has shown its poten-
tial for MR- and CBCT-based sCT conversion in previous works (Spadea et al 2019,
Thummerer et al 2020). Our results have confirmed these findings. Recently also
GANs have been applied to radiotherapy related image synthesis tasks (Liang et al
2019, Liu et al 2019). GANs have the advantage that they cannot only be trained with
paired image data (Jin et al 2019) but also with unpaired imaging data (Wolterink
et al 2017, Maspero et al 2020). This eliminates the need of DIR during data prepa-
ration and therefore speeds up the training process and removes a possible source
of error, introduced by using DIR in the first place. Another approach to impro-
ve MR-based sCTs in terms of network architecture might be the use of multiple
MR-sequences for network training. This can support the neural network training
to better distinguish between different tissues and thereby lead to improved sCTs
(Florkow et al 2020).
A limitation of this study is the reduced axial field of view due to the severity of the
CBCT scatter artifacts below the shoulders. In order to allow a meaningful compa-
rison, this FOV reduction was also performed on the MR-images. The removed part
below the shoulders was still necessary for the recalculation of clinical treatment
plans and therefore parts of the pCT were stitched to the sCTs for clinical dose cal-
75
Chapter 3
culations. The influence of these stitched image parts on the results is minor. Only
very limited parts of the target volumes and some OARs, located in the lower neck,
were not covered by the cropped field of view of CBCT and MR. The image stitching
was also used to add patient couch and head support to the image. These struc-
tures are required for dose calculations since beams from certain angles traverse
them and influence the dose distributions. Our dataset was almost completely li-
mited to patients who received primary RT. Only a single patient received post-
operative RT. Postoperative cases are likely to contain surgery-related features (e.g.
surgical clips, staples, flaps) that can cause image artifacts and interfere with the
image synthesis. This potential influence warrants investigation in future work.
We performed the comparison of sCTCBCT and sCTMR solely for head and neck can-
cer patients. CBCT- and MR-based deep learning techniques have been reported
for many other anatomical locations, including brain (Spadea et al 2019, Han 2017,
Kazemifar et al 2019, Koike et al 2019), breast (Maspero et al 2020), lung (Maspero
et al 2020) and pelvis (Liu et al 2019, Maspero et al 2018, Harms et al 2019, Kurz et al
2019). Further work is necessary to also perform a comparison of sCTCBCT and sCTMR
in these anatomical locations. For CBCT imaging, the head and upper neck are ad-
vantageous sites, since the patient diameter is quite limited and with increasing
diameter also scatter artifacts increase. Therefore, it can be expected that the CBCT
image quality in anatomical locations such as lung or pelvis is lower than for head
and neck cancer patients. This might lead to a reduced image quality of sCTCBCT.
MR-based sCTs do not suffer from these scatter artifacts and therefore the image
quality difference might be smaller in other anatomical locations. In the thorax,
breathing motion can lead to image artifacts on CBCT and MR. These image arti-
facts might impair the image quality and thereby also the accuracy of sCTs. MR-
images were acquired on a diagnostic MR-scanner while for CBCTS an on-board
imaging device was used. Gantry-mounted MR scanners are not yet clinically avai-
lable but the image quality would likely be lower on such a device. This would pro-
bably also influence the image quality of MR-based sCTs and future investigations
are required to study this impact.
This study was performed with a relatively limited dataset of 27 patients. Consis-
tent results were observed across the patient cohort and no major outliers were
detected. A larger patient cohort would be desirable since it is more likely to in-
clude rare edge cases that might lead to sCT-conversion and dose calculation er-
rors. Stringent quality assurance procedures (van Harten et al 2020) are required
to detect these errors and establish trust into the accuracy of sCTs. Especially for
MR-systems, which are known to be susceptible to geometric distortions, further
evaluation and QA-mechanisms for sCTs have to be introduced. Only then MR-ba-
76
sed sCTs can also provide reliable position verification, useful in future MR-only
scenarios.
5. Conclusion
In this work, we presented a comparison of CBCT- and MR-based sCTs generated

with DCNNs, using the same set of patients. CBCT-based sCTs showed a higher
image similarity when compared to pCT images than MR-based sCTs. As a con-
sequence, the dosimetric evaluation using gamma analysis showed higher agree-
ment for sCTCBCT than for sCTMR. A recalculation of clinical treatment plans however
3
1
revealed that the influence of the lower image quality is insignificant for dose-vo-
lume parameters of target volumes and selected OARs. From a dosimetric point of
view, sCTCBCT and sCTMR for head and neck patients seem to be equally suited for
daily adaptive PT.
Acknowledgments
Support by the European Association for Cancer Research in form of a travel grant
for Paolo Zaffino is gratefully acknowledged. The authors would also like to thank
the developer teams of openREGGUI, Plastimatch and Elastix for the provision of
their open source software tools. This work was financially supported by a grant of
the Dutch Cancer Society (KWF research project 11518).
77
Chapter 3
References
Albertini F, Matter M, Nenoff L, Zhang Y and Lomax A 2019 Online daily adaptive proton therapy
Br. J. Radiol. 93 20190594
Arabi H, Dowling J A, Burgos N, Han X, Greer P B, Koutsouvelis N and Zaidi H 2018 Comparison
of synthetic CT generation algorithms for MRI-only radiation planning in the pelvic region
2018 IEEE Nuclear Science Symp. and Medical Imaging Conf. Proc. (NSS/MIC) (Piscataway,
NJ: IEEE) 1–3
Chen L, Liang X, Shen C, Jiang S and Wang J 2020 Synthetic CT generation from CBCT images via
deep learning Med. Phys. 47 1115–25
Dinkla A M, Florkow M C, Maspero M, Savenije M H F, Zijlstra F, Doornaert P A H and Stralen M

2019 Dosimetric evaluation of synthetic CT for head and neck radiotherapy generated by a
patch-based three-dimensional convolutional neural network Med. Phys. 46 4095–104
Florkow M C et al 2020 Deep learning–based MR-to-CT synthesis: the influence of varying gra-
dient echo–based MR images as input channels Magn. Reson. Med. 83 1429–41
Guerreiro F, Burgos N, Dunlop A, Wong K, Petkar I, Nutting C and Harrington K 2017 Evaluation
of a multi-atlas CT synthesis approach for MRI-only radiotherapy treatment planning Phys.
Medica 35 7–17
Guerreiro F, Koivula L, Seravalli E, Janssens G O, Maduro J H, Brouwer C L and Korevaar E W 2019

Feasibility of MRI-only photon and proton dose calculations for pediatric patients with ab-
dominal tumors Phys. Med. Biol. 64 5
Han X 2017 MR-based synthetic CT generation using a deep convolutional neural network met-
hod Med. Phys. 44 1408–19
Hansen D C, Landry G, Kamp F, Li M, Belka C, Parodi K and Kurz C 2018 ScatterNet: a convolutio-
nal neural network for cone-beam CT intensity correction Med. Phys. 45 4916–26
Harms J, Lei Y, Wang T, Zhang R, Zhou J, Tang X and Curran W J 2019 Paired cycle-GAN-based
image correction for quantitative cone-beam computed tomography Med. Phys. 46 3998–
4009
Hoffmann A, Oborn B, Moteabbed M, Yan S, Bortfeld T, Knopf A and Fuchs H 2020 MR-guided
proton therapy: a review and a preview Radiother. Oncol. 15 1–13
Hoffmann L, Alber M, Jensen M F, Holt M I and Møller D S 2017 Adaptation is mandatory for inten-
sity modulated proton therapy of advanced lung cancer to ensure target coverage Radiother.
Oncol. 122 400–5
Hua C, Yao W, Kidani T, Tomida K, Ozawa S, Nishimura T and Fujisawa T E 2017 A robotic C-arm
cone beam CT system for image-guided proton therapy: design and performance Br. J. Radi-
ol. 90 1079
Jin C-B, Kim H, Liu M, Jung W, Joo S, Park E and Cui X 2019 Deep CT to MR synthesis using paired
and unpaired data Sensors 19 2361
Karlsson M, Karlsson M G, Nyholm T, Amies C and Zackrisson B 2009 Dedicated magnetic reso-
nance imaging in the radiotherapy clinic Int. J. Radiat. Oncol. Biol. Phys. 74 644–51
Kazemifar S, McGuire S, Timmerman R, Wardak Z, Nguyen D, Park Y, Jiang S and Owrangi A 2019
MRI-only brain radiotherapy: assessing the dosimetric accuracy of synthetic CT images ge-
nerated using a deep learning approach Radiother. Oncol. 136 56–63
Kida S, Nakamoto T, Nakano M, Nawa K, Haga A, Kotoku J and Nakagawa K 2018 Cone beam com-
78
puted tomography image quality improvement using a deep convolutional neural network
Cureus 10 4
Klein S, Staring M, Murphy K, Viergever M A and Pluim J P W 2010 Elastix: a toolbox for intensity-
based medical image registration IEEE Trans. Med. Imaging 29 196–205
Koike Y, Akino Y, Sumida I, Shiomi H, Mizuno H, Yagi M and Isohashi F 2019 Feasibility of synthe-
tic computed tomography generated with an adversarial network for multi-sequence ma-
gnetic resonance-based brain radiotherapy J. Radiat. Res. 61 92–103
Koivula L, Wee L and Korhonen J 2016 Feasibility of MRI-only treatment planning for proton the-
3
1
rapy in brain and prostate cancers: dose calculation accuracy in substitute CT images Med.
Phys. 43 4634–42
Kupelian P and Sonke -J-J 2014 Magnetic resonance-guided adaptive radiotherapy: a solution to
the future Semin. Radiat. Oncol. 24 227–32
Kurz C, Dedes G, Resch A, Reiner M, Ganswindt U, Nijhuis R and Thieke C 2015 Comparing co-
ne-beam CT intensity correction methods for dose recalculation in adaptive intensity-mo-
dulated photon and proton therapy for head and neck cancer Acta Oncol. (Madr) 54 1651–7
Kurz C, Kamp F, Park Y-K, Zöllner C, Rit S, Hansen D and Podesta M 2016 Investigating deformable
image registration and scatter correction for CBCT-based dose calculation in adaptive IMPT
Med. Phys. 43 5635–46
Kurz C, Maspero M, Savenije M H F, Landry G, Kamp F, Pinto M and Li M 2019 CBCT correction
using a cycle-consistent generative adversarial network and unpaired training to enable
photon and proton dose calculation Phys. Med. Biol. 64 22
Landry G, Hansen D, Kamp F, Hoyle B, Weller J, Parodi K and Kurz C 2019 Comparing Unet training
with three different datasets to correct CBCT images for prostate radiotherapy dose calcula-
tions Phys. Med. Biol. 64 035011
Landry G, Nijhuis R, Dedes G, Handrack J, Thieke C, Janssens G and Orban de Xivry J 2015 Inves-
tigating CT to CBCT image registration for head and neck proton therapy as a tool for daily
dose recalculation Med. Phys. 42 3
Langendijk J A, Lambin P, De Ruysscher D, Widder J, Bos M and Verheij M 2013 Selection of pa-
tients for radiotherapy with protons aiming at reduction of side effects: the model-based
approach Radiother. Oncol. 107 267–73
Liang X, Chen L, Nguyen D, Zhou Z, Gu X, Yang M and Wang J 2019 Generating synthesized com-
puted tomography (CT) from cone-beam computed tomography (CBCT) using CycleGAN for
adaptive radiation therapy Phys. Med. Biol. 64 12
Lim-Reinders S, Keller B M, Al-Ward S, Sahgal A and Kim A 2017 Online adaptive radiation therapy
Liu Y, Lei Y, Wang Y, Shafai-Erfani G, Wang T, Tian S and Yang X 2019 Evaluation of a deep lear-
ning-based pelvic synthetic CT generation technique for MRI-based prostate proton treat-
ment planning Phys. Med. Biol. 64 20
Maspero M, Houweling A C, Savenije M H F, van Heijst T C F, Verhoeff J J C, Kotte A N T J and

van den Berg C A T 2020 A single neural network for cone-beam computed tomography-ba-
sed radiotherapy of head-and-neck, lung and breast cancer Phys. Imaging Radiat. Oncol. 14
24–31
Maspero M, Savenije M, Dinkla A M, Seevinck P R, Intven M, Jurgenliemk-Schulz I M, Kerkmeijer L

and van den Berg C 2018 Dose evaluation of fast synthetic-CT generation using a generative
adversarial network for general pelvis MR-only radiotherapy Phys. Med. Biol. 63 185001
79
Chapter 3
Meyer P, Noblet V, Mazzara C and Lallement A 2018 Survey on deep learning for radiotherapy
Comput. Biol. Med. 98 126–46
Nenoff L, Matter M, Hedlund Lindmar J, Weber D C, Lomax A J and Albertini F 2019 Daily adaptive
proton therapy–the key to innovative planning approaches for paranasal cancer treatments
Acta Oncol. (Madr) 58 1423–8
Oborn B M, Dowdell S, Metcalfe P E, Crozier S, Mohan R and Keall P J 2017 Future of medical phy-
sics: real-time MRI-guided proton therapy: real-time Med. Phys. 44 e77–e90
Park Y-K, Sharp G C, Phillips J and Winey B A 2015 Proton dose calculation on scatter-corrected
CBCT image: feasibility study for adaptive proton therapy Med. Phys. 42 4449–59
Pileggi G, Speier C, Sharp G C, Izquierdo D, Catana C, Pursley J and Amato M F 2018 Proton ran-
ge shift analysis on brain pseudo-CT generated from T1 and T2 MR Acta Oncol. (Madr) 57
1521–31
Shen C, Nguyen D, Zhou Z, Jiang S B, Dong B and Jia X 2020 An introduction to deep learning in
medical physics: advantages, potential, and challenges Phys. Med. Biol. 65 05TR01
Spadea M F, Pileggi G, Zaffino P, Salome P, Catana C, Izquierdo-Garcia D and Amato F 2019 Deep
Convolution Neural Network (DCNN) multiplane approach to synthetic CT generation from
MR images—application in brain proton therapy Int. J. Radiat. Oncol. Biol. Phys. 105 495–503
Staring M, Klein S and Pluim J P W 2007 A rigidity penalty term for nonrigid registration Med.
Phys. 34 4098–108
Stock M, Georg D, Ableitinger A, Zechner A, Utz A, Mumot M and Kragl G 2018 The technological
basis for adaptive ion beam therapy at MedAustron: status and outlook Z. Med. Phys. 28
196–210
Theano Development Team et al 2016 A Python framework for fast computation of mathematical
expressions arXiv1605.02688
Thummerer A, Zaffino P, Meijers A, Marmitt G G, Seco J, Steenbakkers R J H M and Knopf A-C 2020
Comparison of CBCT based synthetic CT methods suitable for proton dose calculations in
adaptive proton therapy Phys. Med. Biol. 65 095002
van Harten L, Wolterink J M, Verhoeff J J C and Išgum I 2020 Automatic online quality control of
synthetic CTs Proc. SPIE 11313 113131M
Veiga C, Alshaikhi J, Amos R, Lourenço A M, Modat M, Ourselin S and Royle G 2015 Cone-beam
computed tomography and deformable registration-based "Dose of the Day" calculations
for adaptive proton therapy Int. J. Part. Ther. 2 404–14
Veiga C, Janssens G, Teng C-L, Baudier T, Hotoiu L, Mcclelland J R and Royle G 2016 First clinical
investigation of cone beam computed tomography and deformable registration for adaptive
proton therapy for lung cancer Int. J. Radiat. Oncol. Biol. Phys. 95 549–59
Wang T, Manohar N, Lei Y, Dhabaan A, Shu H-K, Liu T and Curran W J 2019 MRI-based treatment
planning for brain stereotactic radiosurgery: dosimetric validation of a learning-based pseu-
do-CT generation method Med. Dosim. 44 199–204
Widder J, Van Der Schaaf A, Lambin P, Marijnen C A M, Pignol J-P, Rasch C R and Slotman B J 2016
The quest for evidence for proton therapy: model-based approach and precision medicine
Wolterink J M, Dinkla A M, Savenije M H F, Seevinck P R, van den Berg C A T and Išgum I 2017
Deep MR to CT synthesis using unpaired data Simulation and Synthesis in Medical Imaging.
SASHIMI 2017. Lecture Notes in Computer Science ed S Tsaftaris (Berlin: Springer) 14–23 vol
80
10557
Zaffino P, Raudaschl P, Fritscher K, Sharp G C and Spadea M F 2016 Plastimatch MABS, an open
source tool for automatic image segmentation Med. Phys. 43 5155–60
3
1
81
Chapter 3
A) Visualization of image cropping
Figure S1 a) Red area indicates CBCT cropping due to severe scatter artifacts below the neck. b) Image
parts outside the red mask were stitched to the synthetic CTs to allow clinical proton dose reconstruc-
tions
B) Training performance
Table S1a presents the number of slices that were available for each network trai-
ning (without data augmentation). The number of axial slices was about half the
number of sagittal and coronal slices. Table S1b reports the epochs at which the
stopping criteria (no decrease in validation loss for five consecutive epochs) was
reached.
Slices used for training

Fold 1 Fold 2 Fold 3
CBCT MR CBCT MR CBCT MR
axial 2115 2160 2091 2123 2120 2193
sagittal 3865 4321 3873 4344 3998 4297
coronal 4070 4129 4124 4156 4138 4163
Epochs selected for evaluation
Fold 1 Fold 2 Fold 3
CBCT MR CBCT MR CBCT MR

axial 26 16 26 25 13 10
sagittal 13 17 13 16 12 9
coronal 19 25 13 14 25 17
Figure S2 shows validation loss curves for axial and sagittal training of Fold 1. CBCT
loss is significantly lower than MRI loss but convergence times are similar for both
CBCT and MRI.
82
3
1
Figure S2 Comparison of loss curves of axial and coronal trainings of fold 1.
C) Additional image similarity metrics
In addition to MAE, ME and DSC we also calculated peak signal to noise ratio
(PSNR) and structural similarity (SSIM). PSNR can be defined as
Eq. S1
where Q is the maximum HU value between pCT and sCT and MSE the mean squa-
red error between pCT and sCT. Structural similarity is defined as

Eq. S2
where mux is the mean pixel value of the sCT, muy is the mean pixel value of the
pCT, deltax the variance of the sCT, deltay the variance of the pCT, deltaxy the covari-
ance of pCT and sCT and C1=(0.01*Q)2 and C2=(0.02*Q)2 two variables to stabilize
the division with weak denominators. Q is the maximum pixel value of sCT and
pCT.
An average PSNR of 31.10 ± 1.49 dB and 25.27 ± 1.74 was observed for sCTCBCT and
sCTMR. Figure S3a presents the results for each patient individually. Figure S3b
shows a similar plot for SSIM, where average values of 0.951±0.011 and 0.893±0.018
were observed for sCTCBCT and sCTMR respectively.
83
Chapter 3
Figure S3 Additional Results for a) PSNR and b) SSIM for each patient individually.
D) Additional NTCP results
Figure S4 presents NTCP results for xerostomia and dysphagia of grade 3 or higher.
Similar to the results for grade 2 or higher, very high agreement between NTCP
calculated on the reference pCT and both sCTs was observed. For xerostomia gra-
de 3 or higher, the maximum ΔNTCP was 0.3% for sCTMR (patient 19) and -0.4%
for sCTCBCT (patient 24). The mean ΔNTCP value for the entire patient cohort was
0.0 ± 0.1% for both sCTMR and sCTCBCT. For dysphagia grade 3 or higher, maximum
ΔNTCP values were -0.8% for sCTMR (patient 18) and -0.5% for sCTCBCT (patient 12).
The mean ΔNTCP value for dysphagia was -0.1 ± 0.2% for sCTMR and 0.0 ± 0.2% for
sCTCBCT.
84
3
1
Figure S4 Normal tissue complication probability for a) xerostomia grade 3 or higher and b) dysphagia
grade 3 or higher calculated on pCT, sCTCBCT and sCTMR.
85
Chapter 3
86
Range probing as a quality control tool for CBCT-based synthetic CTs:
In vivo application for head and neck cancer patients
Chapter 4
Range probing as a quality control
tool for CBCT-based synthetic CTs:
In vivo application for head and neck
cancer patients
4
1
Carmen Seller Oria , Adrian Thummerer , Jeffrey Free ,
1 1 1
Johannes A. Langendijk1, Stefan Both1, Antje C. Knopf1, Arturs Meijers1
1
Department of Radiation Oncology, University Medical Center Groningen, University of Groningen,
Groningen, The Netherlands
Published in:
Medical Physics
August 2021, Volume 48, Issue 8
DOI 10.1002/mp.15020
87
Chapter 4
Abstract
Purpose
Cone-beam CT (CBCT)-based synthetic CTs (sCT) produced with a deep convolu-
tional neural network (DCNN) show high image quality, suggesting their potential
usability in adaptive proton therapy workflows. However, the nature of such work-
flows involving DCNNs prevents the user from having direct control over their out-
put. Therefore, quality control (QC) tools that monitor the sCTs and detect failures
or outliers in the generated images are needed.
This work evaluates the potential of using a range-probing (RP)-based QC tool to

verify sCTs generated by a DCNN. Such a RP QC tool experimentally assesses the CT
number accuracy in sCTs.
Methods
A RP QC dataset consisting of repeat CTs (rCT), CBCTs, and RP acquisitions of se-
ven head and neck cancer patients was retrospectively assessed. CBCT-based sCTs
were generated using a DCNN. The CT number accuracy in the sCTs was evaluated
by computing relative range errors between measured RP fields and RP field simu-
lations based on rCT and sCT images.
Results
Mean relative range errors showed agreement between measured and simulated
RP fields, ranging from −1.2% to 1.5% in rCTs, and from −0.7% to 2.7% in sCTs.
Conclusions
The agreement between measured and simulated RP fields suggests the suitabi-
lity of sCTs for proton dose calculations. This outcome brings sCTs generated by
DCNNs closer toward clinical implementation within adaptive proton therapy
treatment workflows. The proposed RP QC tool allows for CT number accuracy as-
sessment in sCTs and can provide means of in vivo range verification.
88
1. Introduction
The outcome of proton therapy treatments can be compromised by anatomical

variations.1 To evaluate and mitigate the impact of anatomical variations on dose
distributions, adaptive treatment strategies can be adopted.2-3 Treatment plan
adaptations require recurrent feedback, which is supplied by different imaging
techniques on which dose calculations are performed.4
CBCTs are currently used in some proton therapy centers for patient alignment
purposes.5-7 Since CBCTs are acquired on a frequent basis, they contain up-to-date
information about the patient anatomy, and they can provide input within an ad-
aptive proton treatment workflow.8-10 4
1
However, CBCTs are subject to various image artifacts that prevent them from
being used directly for proton dose calculations.11 To this end, various corrective
approaches were developed to transform CBCTs into images suitable for proton
dose calculations, typically called synthetic CT (sCT) images.10,12 With the rapid
development of artificial intelligence, a growing number of deep learning-based
correction approaches has recently been presented.13,14
For head and neck cancer (HNC) patients, Thummerer et al. recently showed an
enhanced performance of a deep convolutional neural network (DCNN) approach
to transform CBCTs into sCTs.15 The DCNN strategy was compared against a defor-
mable image registration and an analytical image-based correction method. Furt-
hermore, comparable sCT image quality and dosimetric accuracy were found with
respect to other deep learning sCT generation strategies.15-18 Preliminary dosime-
tric tests showed the potential for employing DCNN CBCT-based sCTs in an adap-
tive proton therapy workflow.15 However, DCNNs are trained on specific datasets
which do not guarantee a predictable performance when they receive input images
that fall outside the training dataset. Outliers from the training dataset could arise
due to different patient anatomy or image acquisition settings.19 Therefore, quality
control (QC) tools that monitor the sCT generation and detect failures or outliers
in the output images are needed.
Range probing (RP) has been suggested as a QC tool for in vivo proton range verifi-
cation.20-26 Farace et al. presented a method to detect setup errors, in which proton
spots in a RP field are measured by a multilayer ionization chamber (MLIC) at the
exit of a head phantom. Meijers et al. acquired RP measurements in HNC patients
and evaluated the discrepancies between measured and simulated depth dose pro-
files, demonstrating the feasibility to employ RP-based QC in clinical practice.26
89
Chapter 4
Although many studies have demonstrated the feasibility of deep learning-based

methods for sCT generation13-15,17,18, their implementation into clinical practice re-
mains as a challenge due to the lack of QC tools to verify the output images. The
work presented here aims to investigate the potential of using RP as a QC tool to
verify CBCT-based sCTs. For the first time, RP patient measurements are used to
experimentally assess the CT number accuracy in sCT images.
90
2. Materials and methods
Retrospective RP QC measurements from seven HNC patients were retrieved, toge-

ther with CBCT and repeat CT (rCT) images, acquired on the same days as the RP.26
RP QC measurements were performed in our clinic as an in vivo QC check for HNC
patients treated with proton therapy.26 RP fields in two different fractions (referred
as sessions 1 and 2) were collected for each patient (numbered from 1 to 7), resul-
ting in a dataset of 14 RP fields with their corresponding rCT and CBCT.
2.1 CBCT and rCT features
CBCT images were acquired using an on-board imaging device of an IBA Pro- 4
1
teus®PLUS gantry (IBA, Louvain-la-Neuve, Belgium), with a tube voltage of
100kVp and a tube current of 160 mA. CBCTs were reconstructed on a grid of 0.51
mm x 0.51 mm x 2.50 mm. They covered a cylindrical field of view with an axial dia-
meter of 260 mm and an inferior–superior length of 175 mm (70 slices).
Figure 1. Coronal and sagittal views of an example patient geometry (patient 3). The treatment iso-
center is shown in yellow and the edges of the RP field are highlighted in orange. In the coronal view,
the beam direction is marked by the arrow and the MLIC would be located at the left side of the patient
(not depicted). In the sagittal view, the proton spots are directed from behind the patient toward the
observer.
rCT scans were acquired on a Siemens SOMATOM Confidence scanner (Siemens,

Erlangen, Germany), using an image reconstruction grid of 0.98 mm x 0.98 mm x
2.00 mm, a varying scan length (between 198 and 229 slices), and an axial field of
view with a diameter of 500 mm. A fixed tube voltage of 120 kV and a variable tube
current were used for rCT acquisition.
91
Chapter 4
2.2 RP acquisition
The setup for the acquisition of the retrieved RP measurements was in accordance
with the methodology described by Meijers et al.26 Each RP field was composed of
9x9 proton spots with a spacing of 5 mm, resulting in 81 proton spots covering an
area of 40x40 mm2. The center of the RP field was aligned with the treatment iso-
center, allowing proton spots to intersect a wide variety of tissues. Figure 1 shows
one example patient geometry, in which the edges of the RP field are highlighted.
Each RP field delivered a dose of approximately 1 cGyRBE. The proton spot energy
was 210 MeV, resulting in a spot size of FWHM = 8.2 mm in air at the isocenter. Gi-
ven that MLIC acquisitions are only possible from cardinal angles, and easy access
to the patient with the equipment is desired, the beams were directed toward the
patient from a gantry angle of 90 degrees.
A Giraffe MLIC (IBA Dosimetry, Schwarzenbruck, DE), composed of 180 parallel
plane ionization chambers, was used to measure the residual integral depth dose
profile (IDD) of each proton spot. Prior to each RP acquisition, a gain calibration
of the MLIC was performed in air. After the CBCT-based patient positioning pro-
cedure was completed, the MLIC was placed along the beam axis at the exit of the
patient on a trolley (Figure 2), and the RP field was delivered before the start of the
treatment.
Figure 2. Setup for RP acquisition. The gantry is set to an angle of 90 degrees, directing proton beams
from right to left through a patient (not depicted) laying on the table. The MLIC is positioned on a
trolley.26
92
2.3 sCT generation
sCTs were generated using a DCNN, initially implemented for MR-to-CT conver-
sion by Spadea et al.,27 and later shown suitable also for CBCT-based sCT gene-
ration.15,16 CBCTs and CTs of 28 HNC patients treated with proton therapy at our
institution were used for training and validation of the neural network (25 for trai-
ning, 3 for validation). CBCTs and rCTs were acquired using the devices and para-
meters described above.
To generate sCTs for the seven HNC patients, a rigid registration of CBCT to plan-
4
1
ning CT images and an automatic segmentation of the patient outline were per-
formed in Plastimatch (www.plastimatch.org). A 25 mm margin was added to the
resulting segmentation mask, to assure full coverage of the patient and the im-
mobilization devices, which were sometimes partially excluded by the automatic
segmentation. Afterwards the trained DCNN was used to generate the sCTs. More
details on the DCNN architecture, as well as a visualization of CBCT, sCT, and rCT
for each patient can be found in the supplementary material section.
2.4 RP simulation in sCT and rCT
In order to evaluate the quality of the sCTs, RP simulations based on rCT and sCT
were compared to RP measurements. The seven patients for whom RP measure-
ments were acquired were not part of the training or validation datasets of the
DCNN.
RP simulations were performed using the clinical Monte Carlo dose engine of
RayStation 9A (RaySearch, Stockholm, Sweden), with a statistical uncertainty of
0.5%.28 The MLIC detector was represented by an homogeneous water volume at-
tached to each CT at the beam exit side of the patient.20 An isotropic dose grid with
a voxel size of 1 mm was used. The IDD in the beam direction for each proton spot
was extracted by integrating the dose in the water volume over the axes perpendi-
cular to the beam direction, using the scripting capabilities of RayStation.
RP simulations were performed for both rCT and sCT. In order to reproduce the
treatment position as closely as possible and to have agreement with the CBCT re-
gistration, a rigid registration of the rCT to planning CT was performed in RayS-
tation. The RP simulations based on rCTs were used as ground truth. The rCT was
acquired on the same day as the RP measurement, so it was used as a reference
regarding the anatomy of that day. However, unlike the CBCTs, the rCTs were not
acquired in the treatment room, but in a separate CT room within the building.
93
Chapter 4
2.5 Data preparation
For some patients, the isocenter was located close to the shoulders, leading to some
proton spots to go across the shoulder area (see RP field edges and beam direction
in Figure 3a). Given that the field size of the CBCTs was limited and did not enclose
the shoulders and trapezius muscles entirely (Figure 3), IDDs going through the
shoulders and trapezius muscles were excluded from the dataset.
Figure 3. Fusion of a rCT (magenta) with the corresponding sCT (green) in an example patient (pa-
tient 1). The treatment isocenter is marked in yellow and the RP field edges are marked in orange. (a):
Coronal view of the patient, in which the beam direction is indicated by an orange arrow. (b): Sagittal
view of the patient in which a region referred as “A” is highlighted by a blue circle. Region A encloses
an exemplary area in the throat that is anatomically unstable.
Since rCTs were not acquired in the treatment room, there were inconsistencies
between the anatomical configuration of the patient during the rCT and the CBCT
acquisition. For instance, the base of the tongue could be in a different position
in the rCT compared to the CBCT. For this reason, IDDs in the base of tongue and
swallowing muscles areas (e.g., Region A in Figure 3(b)), which are anatomically
unstable regions, were excluded for the purpose of this study. The resulting data-
set is referred to as “post-processed” dataset. It enables a comparison between rCT
and sCT in anatomically stable areas. After post-processing, 18–76 IDDs remained
per RP field, depending on the patient.
2.6 Data analysis
The comparison between a measured RP field and its corresponding simulated RP

fields (based on rCT and sCT) was performed by computing the residual range er-
ror for each proton spot in the RP field. Residual range errors were obtained as the
offset that provides the best alignment between a measured and a simulated IDD
along the beam axis, calculated using the least square method.20, 29 This calcula-
94
tion was performed in openREGGUI20,30 (openreggui.org). With this procedure,

residual range errors were obtained with an accuracy of 0.5 mm. The final analysis
is expressed in terms of relative range errors (RREs) with respect to the water-equi-
valent path length of each proton spot across the patient. The water-equivalent
path length was extracted from each measured IDD by calculating the shift with
respect to an air IDD measurement.31
In total, taking into account 2 RP fields for 7 patients, 14 RRE maps were obtained
comparing RP field measurements and the corresponding simulated RP fields ba-
sed on the rCT, and another 14 maps for a comparison between measured RP fields
4
1
and simulated RP fields based on the sCT. For each of the maps, the mean and 1.5
times the standard deviation (1.5SD) of the RREs were computed.
95
Chapter 4
3. Results
Figure 4 shows two RRE maps for rCT- and sCT-based RP simulations. Figure 4
displays in different colors which proton spots were excluded from the dataset due
to proximity to the shoulders (black positions) and due to anatomical instability
(white positions). The post-processed dataset is, thus, composed by the spots cor-
responding to the yellow positions.
Figure 4. RRE maps obtained for patient 1 in session 1 overlaid with a sagittal view of the correspon-
ding CT where RP simulations were performed. Left and right side maps correspond to RP simulations
performed in rCT and sCT, respectively. RREs corresponding to proton spots close to the shoulders
or in anatomically unstable regions are shown in black and white, respectively. RREs included in the
post-processed dataset are shown in yellow. The edges of the RP field are highlighted in orange.
Figure 5. Mean RREs and 1.5SD (error bars) for each patient using the post-processed dataset. Mean
RREs are displayed as a result of comparing RP measurements and RP simulations based on rCT (blue
color) or sCT (red color). The quantification is reported for both measurement sessions (session 1 and
session 2).
96
Figure 5 shows RREs for all patients using the post-processed dataset. Each data
point in the graph represents the mean RRE, which refers to a comparison between
the RP field measured in each patient and the corresponding RP field simulation
in the rCT (blue) or in the sCT (red). Error bars are represented by 1.5SD. There are
four data points for each patient, corresponding to rCT and sCT of sessions 1 and 2.
Figure 5 shows that RP simulations in rCT and sCT lead to similar results in terms
of mean and standard deviations of RREs. Mean RREs range from −1.2% to 1.5% in
rCTs, and from −0.7% to 2.7% in sCTs. Standard deviations lay between −3% and
4
1
+3% in rCTs, and between −3% and 4.5% in sCTs.
97
Chapter 4
4. Discussion
In this work, RP measurements were investigated as a QC tool to verify CBCT-based

sCTs generated by a DCNN. The CT number accuracy of sCTs was assessed by eva-
luating the agreement between measured and simulated IDDs. RP measurements
acquired for seven HNC patients were retrospectively assessed and compared to
corresponding RP simulations based on rCT and sCT in terms of RREs.
Figure 5 shows the agreement between measured and simulated RP fields based
on rCTs, proving the already demonstrated reliability of the RP measurements.26
Furthermore, mean RREs and standard deviations based on rCTs and sCTs are con-
sistent, with a difference in mean RREs of about −1% and standard deviations that
lay mostly within the ±3% boundaries. Thummerer et al.15 described in detail the
generation of CBCT-based sCTs by means of the DCNN employed in this study.
Image quality as well as dosimetric evaluations indicated the potential suitability
of sCTs for proton dose calculations and, thus, their integrability within adaptive
proton therapy workflows.15, 16 Our outcomes suggest that CT numbers in the sCT
images are representative and that sCTs can be used for proton dose calculations in
HNC patients, supporting the hypothesis of Thummerer et al.
A tendency toward positive mean RREs in sCTs were observed in Figure 5. In most
of the patients, the difference in mean RRE between simulations based on rCT and
sCT is about −1%, meaning that RREs in sCTs are slightly higher with respect to
mean RREs in rCTs, although not in all cases. A t-test was carried out between all
RREs based on rCT and all RREs based on sCT, showing that the difference in mean
RREs between rCT and sCT is statistically significant (P-value = 1.4e-27, see Table S1
in supplementary material). If such sCTs were used for dose calculations, a higher-
range uncertainty should be considered than the one employed for dose calculati-
ons based on rCTs.
The −1% mean RRE difference between simulations based on rCT and sCT could
origin from the sCT generation process, resulting in a shift toward lower values
in the CT numbers of the sCTs. To confirm or discard the relevance of a systematic
shift in CT numbers of the sCTs, further investigations would be required. The ap-
pearance of this effect, however, demonstrates the importance of QC tools for sCTs
and the capability of RP measurements to detect small range errors. With the pro-
posed RP QC procedure, residual range errors can be obtained with an accuracy of
0.5 mm.20, 26, 29, 31, 32 For the current dataset, Meijers et al. estimated an overall
uncertainty for the RRE of 1%, taking into account energy fluctuations, interfrac-
tional motion, residual setup errors, rigid registration of the planning CT and rCT,
98
and anatomical inconsistencies between acquisitions in the treatment room and in

the CT imaging room.26
Some acquisition settings for future studies specific to sCT validation could be im-
proved. The location of the RP field sometimes lead to an acquisition too close to
the shoulders, which resulted in a weak MLIC signal. Furthermore, the size of the
CBCTs was limited and did not enclose the shoulders and trapezius muscles entire-
ly. In addition, the center of the RP fields was aligned with the treatment isocenter,
where anatomically unstable areas such as the swallowing muscles and the base
of the tongue were present. In order to make CBCTs and rCTs anatomically com-
4
1
parable, a post-processing of the dataset was required, in which IDDs going across
the shoulders or in anatomically unstable areas were excluded. The post-processed
dataset considered for this study still contained a total of 596 proton spots which
intersected a wide variety of tissues and allowed for a reliable evaluation of the CT
number accuracy in sCTs. Future RP QC acquisitions could include a pre-selection
of spots, avoiding the delivery of those that would be excluded from the analysis.
The features of the RP field (size, number of proton spots, and energy) are based
on prior studies performed with the same MLIC detector.20 However, CT numbers
in the sCT are only assessed within the limits of the RP field, meaning that if the
sCT generation process introduced an artifact outside of the RP field, it would not
be detected by this QC procedure. Therefore, the ability to acquire bigger RP fields
would be desired. The development of larger size detectors would allow for bigger
RP field acquisitions.33
In the current proton therapy treatment workflow, rCTs are frequently acquired on
a weekly basis for QC purposes. The acquisition is done in the same immobilization
as used during the treatment, but often outside of the treatment room. Anatomi-
cal variations can lead to discrepancies between measured RP fields (in the treat-
ment room) and RP simulations based on the rCT (outside the treatment room). If
a RP QC procedure for sCT validation would be established clinically, sCTs could be
used reliably for the purpose of dose calculations. In this way, more frequent and
accurate information on the patient anatomy would be provided by sCTs, and the
necessity of regular rCT acquisition would be reduced.
High dosimetric accuracy is a pre-requisite of any sCT generation method to be

suitable for clinical implementation in adaptive proton therapy workflows. The
dosimetric accuracy of the sCT generation method used in this study was investi-
gated previously,15,16 recalculating clinical treatment plans on sCTs and same-day
rCTs. However, this procedure requires a reference CT image. On the contrary, the
99
Chapter 4
proposed RP QC tool does not rely on reference CT acquisitions to verify the CT

number accuracy of sCTs. Furthermore, it provides in vivo range measurements of
the patient in the treatment room and in treatment position, acquired immediately
after the CBCT acquisition; thus, minimizing anatomical differences and position
inaccuracies between RP and CBCT acquisitions.
Previous studies have demonstrated the feasibility of deep learning-based met-

hods to generate sCTs with a high image quality.13-18, 27 However, the clinical imple-
mentation of sCTs has been hampered by the lack of QC tools that verify the images
generated by the DCNN. The proposed QC tool provides a direct assessment of the
CT number accuracy in the sCTs, by means of in vivo range measurements. Given
that the proposed QC tool is independent from the method to generate sCTs, future
investigations could apply the QC tool developed in this study to verify synthetic
CTs generated by different deep learning models.
The RP QC tool presented here could support adaptive proton therapy workflows
by providing means of in vivo range verification, targeted to the assessment of the
CT number accuracy in sCT images. If the analysis of the RP QC measurements
against RP simulations based on sCT was performed online, the proposed procedu-
re could be compatible with online adaptive proton therapy treatment workflows.
5. Conclusions
The potential of RP as a QC tool for CBCT-based sCTs verification has been demon-
strated.
RP could offer means of in vivo range verification and assessment of the CT number
accuracy of the sCTs; thus, detecting outliers in the sCTs generated by the DCNN.
The agreement between measured and simulated RP fields indicates the suitability
of sCTs for proton dose calculations in HNC patients. This brings sCTs generated by
means of a DCNN closer toward clinical implementation within adaptive proton
therapy treatment workflows.
Acknowledgment
This study was financially supported by a grant from the Dutch Cancer Society
(KWF research project 11518) called “INCONTROL- Clinical Control Infrastructure
for Proton Therapy Treatments”.
100
Conflict of Interest
Langendijk JA is a consultant for proton therapy equipment provider IBA.
Disclosures
University of Groningen, University Medical Centre Groningen, Department of

Radiation Oncology has active research agreements with RaySearch, Philips, IBA,
Mirada, Orfit.
Meijers A discloses being in a paid working relationship with Varian Medical Sys-
4
1
tems, USA, as of 01/Apr/2020 outside of the scope of the work reported on this ma-
nuscript.
101
Chapter 4
References
[1] Lim-Reinders S, Keller BM, Al-Ward S, Sahgal A, Kim A. Online adaptive radiation thera-
py. Int J Radiat Oncol Biol Phys. 2017; 99: 994– 1003.
[2] Yan D, Vicini F, Wong J, Martinez A. Adaptive radiation therapy. Phys Med Biol. 1997; 42:
123– 132.
Br J Radiol. 2020; 93: 20190594.
[4] Sonke JJ, Aznar M, Rasch C. Adaptive radiotherapy for anatomical changes. Semin Radiat
Oncol. 2019; 29: 245– 257.
[5] Veiga C, Janssens G, Teng C-L, et al. First clinical investigation of cone beam computed
tomography and deformable registration for adaptive proton therapy for lung cancer. Int
J Radiat Oncol Biol Phys. 2016; 95: 549– 559.
[6] Veiga C, Alshaikhi J, Amos R, et al. Cone-beam computed tomography and deformable
registration-based “dose of the day” calculations for adaptive proton therapy. Int J Part
Ther. 2015; 2: 404– 414.
[7] Cho MK, Kim JS, Cho Y-B, et al. CBCT/CBDT equipped with the x-ray projection system
for image-guided proton therapy. Med Imaging 2009 Phys Med Imaging. 2009; 7258:
72582V.
[8] Posiewnik M, Piotrowski T. A review of cone-beam CT applications for adaptive radiothe-
rapy of prostate cancer. Phys Medica. 2019; 59: 13– 21.
[9] Kurz C, Dedes G, Resch A, et al. Comparing cone-beam CT intensity correction methods
for dose recalculation in adaptive intensity-modulated photon and proton therapy for
head and neck cancer. Acta Oncol (Madr). 2015; 54: 1651– 1657.
[10] Kurz C, Kamp F, Park Y-K, et al. Investigating deformable image registration and scatter
correction for CBCT-based dose calculation in adaptive IMPT. Med Phys. 2016; 43: 5635–
5646.
[11] Nagarajappa A, Dwivedi N, Tiwari R. Artifacts: the downturn of CBCT image. J Int Soc
Prev Community Dent. 2015; 5: 440.
CBCT image: feasibility study for adaptive proton therapy. Med Phys. 2015; 42: 4449–
4459.
[13] Hansen DC, Landry G, Kamp F, et al. ScatterNet: a convolutional neural network for cone-
beam CT intensity correction. Med Phys. 2018; 45: 4916– 4926.
[14] Landry G, Hansen D, Kamp F, et al. Corrigendum: comparing Unet training with three
different datasets to correct CBCT images for prostate radiotherapy dose calculations
(Physics in Medicine and Biology (2019) 64 (035011) DOI: 10.1088/1361-6560/aaf496).
Phys Med Biol. 2019; 64:089501.
[15] Thummerer A, Zaffino P, Meijers A, et al. Comparison of CBCT based synthetic CT met-
hods suitable for proton dose calculations in adaptive proton therapy. Phys Med Biol.
2020; 65:095002.
[16] Thummerer A, de Jong BA, Zaffino P, et al. Comparison of the suitability of CBCT- And
MR-based synthetic CTs for daily adaptive proton therapy in head and neck patients.
Phys Med Biol. 2020; 65: 235036.
[17] Yuan N, Dyer B, Rao S, et al. Convolutional neural network enhancement of fast-scan
low-dose cone-beam CT images for head and neck radiotherapy. Phys Med Biol. 2020;
65:035003.
[18] Liang X, Chen L, Nguyen D, et al. Generating synthesized computed tomography (CT)
102
from cone-beam computed tomography (CBCT) using cyclegan for adaptive radiation
therapy. Phys Med Biol. 2019; 64: 125002.
[19] van Harten LD, Wolterink JM, Verhoeff JJC, Išgum I. Automatic online quality control of
synthetic CTs. Med Imaging 2020 Image Process Int Soc Opt Photonics. 2020;11313.
[20] Farace P, Righetto R, Meijers A. Pencil beam proton radiography using a multilayer ioni-
zation chamber. Phys Med Biol. 2016; 61: 4078– 4087.
[21] Knopf AC, Lomax A. In vivo proton range verification: a review. Phys Med Biol. 2013; 58:
131– 160.
[22] Schneider U, Pedroni E, Lomax A. The calibration of CT Hounsfield units for radiotherapy
treatment planning. Phys Med Biol. 1996; 41: 111– 124.
[23] Schneider U, Pemler P, Besserer J, Pedroni E, Lomax A, Kaser-Hotz B. Patient specific op-
timization of the relation between CT-Hounsfield units and proton stopping power with
4
1
proton radiography. Med Phys. 2005; 32: 195– 199.
[24] Schneider U, Pedroni E. Proton radiography as a tool for quiality control in proton thera-
py. Med Phys. 1994; 22: 353– 363.
[25] Doolan PJ, Testa M, Sharp G, Bentefour EH, Royle G, Lu HM. Patient-specific stopping
power calibration for proton therapy planning based on single-detector proton radiogra-
phy. Phys Med Biol. 2015; 60: 1901– 1917.
[26] Meijers A, Seller Oria C, Free J, Langendijk JA, Knopf AC, Both S. Technical Note: first re-
port on an in vivo range probing quality control procedure for scanned proton beam the-
rapy in head and neck cancer patients. Med Phys. 2021; 48: 1372– 1380.
[27] Spadea MF, Pileggi G, Zaffino P, et al. Deep Convolution Neural Network (DCNN) multi-
plane approach to synthetic CT generation from MR images—application in brain proton
therapy. Int J Radiat Oncol. 2019; 105: 495– 503.
[28] Widesott L, Lorentini S, Fracchiolla F, Farace P, Schwarz M. Improvements in pencil beam
scanning proton therapy dose calculation accuracy in brain tumor cases with a commer-
cial Monte Carlo algorithm. Phys Med Biol. 2018; 63: 145016.
[29] Meijers A, Free J, Wagenaar D, et al. Validation of the proton range accuracy and optimi-
zation of CT calibration curves utilizing range probing. Phys Med Biol. 2020; 65:03NT02.
[30] Deffet S, Macq B, Righetto R, Vander Stappen F, Farace P. Registration of pencil beam pro-
ton radiography data with X-ray CT. Med Phys. 2017; 44: 5393– 5401.
[31] Meijers A, Seller Oria C, Free J, et al. Assessment of range uncertainty in lung-like tissue
using a porcine lung phantom and proton radiography. Phys Med Biol. 2020; 65: 155014.
[32] Farace P, Righetto R, Deffet S, Meijers A, Vander SF. Technical Note: a direct ray-tracing
method to compute integral depth dose in pencil beam proton radiography with a multi-
layer ionization chamber. Med Phys. 2016; 43: 6405– 6412.
[33] Harms J, Maloney L, Sohn JJ, Erickson A, Lin Y, Zhang R. Flat-panel imager energy-dependent
proton radiography for a proton pencil-beam scanning system. Phys Med Biol. 2020; 65: 0– 10.
103
Chapter 4
Supplementary Materials
Figure s1 shows the architecture of the used DCNN. It consists of an encoding path
to extract features from the CBCT followed by a decoding path to generate the sCTs
with accurate CT numbers. Prior to training the neural network, CBCTs from the
training and validation set were resampled and rigidly registered to their corre-
sponding rCT. Then, the patient outline was automatically segmented on CBCTs
and rCTs and a 25 mm margin was added to the resulting segmentation to fully
include the patient immobilization mask. A deformable registration from rCT to
CBCT was performed using openREGGUI [20], [30] (openreggui.org) to further re-
duce anatomical variations between CBCT and rCT.
Mean absolute error, in combination with L1 regularization, was used as loss func-
tion to train the neural network. To increase the dataset size and avoid overfitting,
data augmentation in form of small translations and mirroring of slices was used.
A batch size of one and a learning rate of 1 x 10-5 was used. The training was stop-
ped whenever the validation loss did not decrease within five consecutive epochs.
Following the approach of Spadea et al., three individual trainings were performed
using only axial, sagittal or coronal slices. Outputs of each network were averaged
into a final sCT. A detailed image quality and dosimetric evaluation was presented
for a similar HNC patient cohort in previous studies [15], [16].
Figure S1. Schematic representation of the architecture of the DCNN. The left and right sides of the ar-
chitecture correspond to the encoding and decoding paths, respectively. Convolutional, max pooling,
upscaling and copying layers are depicted in green and blue, orange, yellow and red colors.
104
4
1
Figure S2. CBCT, sCT and rCT corresponding to session 1 in all patients, numbered from P1 to P7. The
edges of the RP field are highlighted in orange in each CT image.
105
Chapter 4
Table s1 shows the outcome of a t-test assuming unequal variances carried out bet-
ween all RREs based on rCTs and all RREs based on sCTs. The hypothesized mean
difference between the two datasets was set to 0, and the significance level was
0.05. The resulting p-value was 1.4e-27, rejecting the null hypothesis and suppor-
ting the hypothesis that the mean RREs based on rCT and sCT are different. Speci-
fically, the mean value of RREs based on sCT is greater than the mean of RREs based
on rCT.
RREs rCT RREs sCT
Mean 0.003362 0.012031
Variance 0.000155 0.000189
Observations 572 572
Hypothesized Mean Difference 0
P(T<=t) two-tail 1.38E-27
Table 1. t-test between RREs based on rCT and on sCT.
106
Clinical suitability of deep learning based synthetic CTs for
adaptive proton therapy of lung cancer
Chapter 5
Clinical suitability of deep learning
based synthetic CTs for adaptive
proton therapy of lung cancer
Adrian Thummerer1, Carmen Seller Oria1, Paolo Zaffino2, Arturs Meijers1, Gabriel
Guterres Marmitt1, Robin Wijsman1, Joao Seco3,4, Johannes A. Langendijk1, Antje C.
Knopf1,5, Maria F. Spadea2,6, Stefan Both1,6
1 Department of Radiation Oncology, University Medical Center Groningen,

5
1
University of Groningen, Groningen, The Netherlands
2 Department of Experimental and Clinical Medicine,
Magna Graecia University,Catanzaro, Italy

4 Department of Physics and Astronomy,
Heidelberg University, Heidelberg, Germany

Carl von Ossietzky Universität Oldenburg,Oldenburg, Germany

Published in:
Medical Physics
December 2021, Volume 48, Issue 12
DOI 10.1002/mp.15333
107
Chapter 5
Abstract
Purpose
Adaptive proton therapy (APT) of lung cancer patients requires frequent volume-
tric imaging of diagnostic quality. Cone-beam CT (CBCT) can provide these daily
images, but x-ray scattering limits CBCT-image quality and hampers dose calcula-
tion accuracy. The purpose of this study was to generate CBCT-based synthetic CTs
using a deep convolutional neural network (DCNN) and investigate image quality
and clinical suitability for proton dose calculations in lung cancer patients.
Methods
A dataset of 33 thoracic cancer patients, containing CBCTs, same-day repeat CTs
(rCT), planning-CTs (pCTs), and clinical proton treatment plans, was used to train
and evaluate a DCNN with and without a pCT-based correction method. Mean ab-
solute error (MAE), mean error (ME), peak signal-to-noise ratio, and structural si-
milarity were used to quantify image quality. The evaluation of clinical suitability
was based on recalculation of clinical proton treatment plans. Gamma pass ratios,
mean dose to target volumes and organs at risk, and normal tissue complication
probabilities (NTCP) were calculated. Furthermore, proton radiography simulati-
ons were performed to assess the HU-accuracy of sCTs in terms of range errors.
Results
On average, sCTs without correction resulted in a MAE of 34 ± 6 HU and ME of 4 ±
8 HU. The correction reduced the MAE to 31 ± 4 HU (ME to 2 ± 4HU). Average 3%/3
mm gamma pass ratios increased from 93.7% to 96.8%, when the correction was
applied. The patient specific correction reduced mean proton range errors from 1.5
to 1.1 mm. Relative mean target dose differences between sCTs and rCT were below
± 0.5% for all patients and both synthetic CTs (with/without correction). NTCP va-
lues showed high agreement between sCTs and rCT (<2%).
Conclusions
CBCT-based sCTs can enable accurate proton dose calculations for APT of lung
cancer patients. The patient specific correction method increased the image qua-
lity and dosimetric accuracy but had only a limited influence on clinically relevant
parameters.
108
1. Introduction
Proton therapy can deliver highly conformal dose distributions, leading to lower
normal tissue dose, sparing of organs at risk (OAR), and target dose escalation.
The dosimetric advantages of proton therapy can be achieved by the characteristic
depth-dose profile as protons traverse matter and deposit most of their dose at an
energy dependent depth (Bragg peak) after which protons rapidly stop.1 Compared
to conventional photon beam therapy, this behavior results in a lower entrance and
exit dose.
However, the beneficial depth-dose characteristic of proton beams leads to an in-

creased sensitivity of proton dose distributions to density changes along the beam
path. Density shifts can occur due to anatomical variations, patient alignment
errors, and changes in tumor size (growth/shrinkage). Adaptive proton therapy
(APT) aims at detecting such anatomical changes and re-adjusting treatment plans
5
1
according to the updated patient anatomy.2-4 Frequent patient imaging is a pivotal
part of APT and provides the foundation for adaptation decisions. For daily APT,
cone-beam computed tomography (CBCT) images have the potential to serve as an
alternative to conventional computed tomography (CT). In proton therapy, CBCTs
are often routinely acquired for daily pre-treatment patient alignment. The CBCT
acquisition protocols used in radiotherapy are optimized to deliver significantly
lower imaging dose than conventional diagnostic CT scans, making CBCTs more
suitable for repeated imaging. Although the image quality of CBCTs is sufficient for
position verification, they suffer from severe image artifacts which prevent them
from being suitable for accurate proton dose calculations.
To correct CBCT deficiencies and enable CBCT-based adaptive radiotherapy work-

flows, several methods have been developed and investigated in the context of
photon and proton dose calculations in various anatomical locations. For the tho-
rax, this includes techniques based on CBCT calibration,5-7 HU-overrides,8-11 defor-
mable image registration,11-15 and Monte Carlo simulations.16, 17 Recently, research
activities focused heavily on developing and evaluating deep learning methods to
correct CBCTs and generate the so-called synthetic CTs (sCTs).18-22 In previous stu-
dies, deep learning methods have shown the ability to generate sCTs suitable for
proton dose calculations for head and neck,23, 24 pelvis,25 and prostate cancer pa-
tients.26, 27 However, regarding lung cancer treatment adaptation, only results for
photon dose calculations have been reported.22, 28
In this study, we investigated the generation of CBCT-based sCTs for adaptive pro-
ton therapy of lung cancer patients. Synthetic CTs were generated using a deep
109
Chapter 5
convolutional neural network (DCNN) which was previously evaluated for H&N
cancer patients.23, 24 Furthermore, we proposed an accompanying patient-speci-
fic correction strategy to further improve image quality of the resulting sCTs. Syn-
thetic CTs were evaluated in terms of image quality and proton range error. The
clinical suitability was assessed by recalculating clinically used treatment plans on
sCTs and same-day repeat CTs. Based on these dose distributions, gamma pass ra-
tios, dose statistics for target volumes (TV) and organs at risk (OAR), and normal
tissue complication probabilities (NTCP) were calculated.
110
2. Materials and Methods

2.1 Patient datasets
A dataset containing 33 thoracic cancer patients, treated with pencil beam scanning
proton therapy (PBS-PT) at the University Medical Center Groningen (UMCG), was
used in this study to train and evaluate a DCNN with an accompanying patient-
specific correction technique. 27 patients were treated for lung cancer, while the re-
maining six patients were either diagnosed with thoracic thymoma or mediastinal
lymphoma. All 33 patients were imaged with the same acquisition protocols. Only
lung cancer patients were included in the dosimetric evaluation of the sCTs. The
lung cancer patients (15 female, 12 male) were aged between 46 and 83 years, with
a median age of 69 years. For eight patients, the tumor was located in the left lung;
for 18 in the right lung; and in one patient, tumor tissue was present on both sides.
The tumor position also varied between upper (13 patients) and middle/lower lobe 5
1
(14 patients). A table with patient demographic information is available in the Sup-
porting Information (Table S1).
2.2 Imaging data
For each patient CBCT, planning CT and repeat CT images were used. CBCT images
were acquired with an IBA Proteus Plus (IBA, Belgium) and reconstructed with the
clinically used protocol. Repeat 4D-CT and planning 4D-CT scans were acquired
on a Siemens SOMATOM Confidence (Siemens Healthineers, Germany) and on a
Siemens SOMATOM Definition AS scanner, respectively, using the same imaging
protocol. For treatment planning and dose calculation, average 4DCTs were gene-
rated from the 10 breathing phases of pCTs and rCTs. More detailed imaging and
reconstruction parameters for CBCT, rCT, and pCT are listed in the Supporting In-
formation (Table S2).
Repeat CT scans were acquired on the same day as the CBCT, used for training of
the DCNN and selected as reference for image quality and dosimetric evaluation.
There was a time difference of a few weeks between the pCT acquisition, used wit-
hin the patient specific correction workflow, and the rCT/CBCT acquisition. For all
patients, the first available rCT-CBCT pair (acquired in the first week of treatment)
was chosen.
2.3 Image pre-processing
Before training the DCNN with CBCT-rCT image pairs, several pre-processing
steps were performed. First, CBCTs were rigidly registered to the same-day rCT
111
Chapter 5
and the patient outline was automatically segmented on CBCT and rCT using Plas-
timatch29 (www.plastimatch.org). Voxels outside the patient outline were set to
−1000 HU on CBCT and rCT. To account for the limited CBCT field-of-view (FOV)
in superior-inferior direction, the rCT and the respective mask were cropped to co-
ver the same FOV. To reduce residual errors between CBCT and rCT, a deforma-
ble image registration (DIR) algorithm, implemented in the open-source MATLAB
toolbox openREGGUI (www.openreggui.org), was used to deformably register the
rCT to the CBCT. This DIR-algorithm has been found to be suitable for CBCT/CT
image registration in previous studies.14, 30, 31 The resulting image pairs of CBCT
and deformed rCT were used to train the neural network.
2.4 Neural network
To generate CBCT-based sCTs, a deep convolutional neural network (DCNN), pre-

viously described by Spadea et al.32 was utilized. Figure S3 in the Supporting Infor-
mation depicts the network architecture. The DCNN is composed of an encoding
and decoding path to extract features from the CBCT and reconstruct it with ac-
curate CT-numbers. Similar to Spadea et al., three individual networks were trai-
ned exclusively with axial, sagittal, or coronal slices. A final sCT was created by
averaging the network outputs from each training. Mean absolute error together
with L1-regularization was used as loss function to train the network. Due to the
limited dataset size, threefold cross validation was applied. This allowed utilizing
all 33 patients for evaluation purposes. We randomly split the dataset into three
subsets of 11 patients each. Two subsets were used for training, and the third subset
was used for evaluation. The training was repeated so that each subset was used
for evaluation once. Based on previous experience with this network architecture,
a batch size of 1 was used, and the training process was stopped when no decrease
in loss was observed for five consecutive epochs.23, 24 A NVIDIA GTX 1080 Ti with 11
GB of VRAM was used for training and inference of the neural network.
2.5 Planning CT-based patient-specific correction method
In addition to the DCNN, we introduced a patient-specific correction method. The

correction workflow utilizes each patient's pCT, which contains accurate lung CT-
numbers but was acquired several weeks before the CBCT acquisition. In a first
step, the planning CT was deformably registered to the synthetic CT using open-
REGGUI. Afterward, the registered pCT was subtracted from the original sCT (sCTo-
rig), generating a difference image. A threshold was applied to the difference image
to exclude differences bigger than ±150 HUs. This thresholding makes the correc-
tion method insensitive to major anatomical changes (e.g., tumor growth, align-
ment errors) since it is excluding areas that significantly change between pCT and
112
CBCT acquisition. The threshold value (150 HU) controls the impact the pCT has on
the final sCT and was found empirically by calculating gamma pass ratios for sCTs
with a variety of threshold values. Afterwards, three HU-regions were segmented
on the sCT: region 1, containing air volumes and lung tissue (−1000 HU to −300
HU); region 2, covering soft tissues (−300 to 200 HU); and region 3, containing bo-
nes (> 200 HU). Using the above-described segmentation masks, each HU-region
of the difference image was smoothed individually using a 3D Gaussian kernel (6
voxel standard deviation) and combined into the final correction map. Smoothing
each region individually ensures sharp edges between varying tissues (e.g., lung
tissue—soft tissue, soft tissue—bone). In a final step, the correction map was sub-
tracted from the original sCT to create the corrected sCT (sCTcor).
2.6 Image evaluation
The image similarity between the synthetic CTs (sCTorig/ sCTcor) and the deformed 5
1
same day rCT was evaluated by calculating mean absolute error (MAE), mean error
(ME), peak signal noise ratio (PSNR), and structural similarity index (SSIM),33, 34
defined in Equations (1) to (4):
(Eq. 1)
(Eq. 2)
(Eq. 3)
(Eq. 4)
where rCTi and sCTi are the respective HU values of sCT and rCT, n is the total num-
ber of voxels within the patient outline, Q is the maximum HU value of sCT and
rCT, μsCT and μrCT are the mean pixel values of sCT and rCT, σsCT and σrCT are the
variances of sCT and rCT, δsCT,rCT is the covariance between sCT and rCT, and L is
the dynamic range of sCT and rCT. All image similarity metrics were only calcula-
ted for voxels within the patient outline. To analyze the MAE of various tissues, an
MAE spectrum was generated for sCTorig and sCTcor by grouping voxels in bins of 20
HU and calculating MAE for each bin. Wilcox signed-rank tests were used to check
for statistical significance of differences between sCTorig and sCTcor.
113
Chapter 5
2.7 Dosimetric evaluation
For the 27 lung cancer patients, clinical treatment plans were recalculated on both
sCTs (sCTorig,/sCTcor) and compared to the same-day rCTs using global gamma ana-
lysis (dose threshold of 10%, 2%/2 mm, and 3%/3 mm criteria). Dose calculations
were performed in RayStation Research (Version 9A) using the clinical Monte Carlo
dose engine with an uncertainty of 1% and a dose grid of 3 × 3 × 3 mm3. Clinical
treatment plans consisted of three beam directions and were generated on the ave-
rage 4D pCT using multi-field and robust optimization, with a range uncertainty
of ±3% and a setup error of 6 mm as defined in our clinical protocol.35 Dose was
prescribed to the ITV. All dosimetric evaluations were performed for the entire plan
(all fields combined). The clinical suitability was evaluated by calculating mean
dose differences in TVs (GTV, CTV) and selected OARs (heart, lung, esophagus, spi-
nal cord). For the CTV, additional dosimetric parameters were also investigated
(Dmax, D95, D98, V95, and V100). For the spinal cord, maximum dose instead of
mean dose was reported. Delineations of TVs and OARs were transferred from the
pCT, which for this purpose was deformably registered to the rCT.
2.8 Comparison between sCTcor and deformed pCT
Deformable image registration (DIR)-based strategies for CBCT-based proton dose

calculations have been investigated in previous studies.31, 14 Within these approa-
ches, the patient-specific pCT, containing accurate HUs, is deformed to the CBCT
to represent the daily patient geometry. Our proposed sCT correction method also
relies on the deformed pCT for HU correction but is combined with additional
smoothing and thresholding operations. Therefore, we performed an image quali-
ty and dosimetric comparison between a pure DIR-based strategy (pCTdef) and our
corrected sCT (sCTcor). pCTdef was generated with the same DIR-algorithm and
settings used in the patient-specific correction described above. The comparison
includes image quality metrics (MAE and ME) and gamma analysis. The same-day
rCT was used as reference for image quality and dosimetric evaluations.
2.9 NTCP
Based on the recalculation of clinical treatment plans, we also calculated NTCP,

using NTCP-models described in the Dutch National Indication Protocol for pro-
ton therapy of lung cancer (NIPP).36 NTCP models are used to estimate the risk of
developing certain side effects during radiation therapy. The indication protocol
for lung cancer includes models for radiation pneumonitis,37 acute dysphagia,38
and 2-year mortality.39 Certain clinical parameters (e.g., age, smoking status, and
tumor location) and mean dose values of OARs (heart, lung, and esophagus) are
114
used as input parameters for the NTCP models. At Dutch proton therapy centers,
NTCP models are used within the patient selection process for proton therapy. In
our work, we used these models to evaluate the clinical similarity of rCTs and sCTs,
by calculating the NTCP difference (ΔNTCP) between them.
2.10 Radiography simulations
To visualize and quantify the similarity between rCT and sCTs in terms of proton
range, we performed proton radiography simulations (PR) on rCT, sCTorig, and
sCTcor using a dedicated proton radiography module of openREGGUI (ww.open-
reggui.org). It employs a direct ray tracing algorithm to simulate PRs as acquired
with a multi-layer ionization chamber.40 PRs were simulated from a gantry angle
of 0 degrees (anterior–posterior direction), a pencil beam spacing similar to the
5
1
rCT imaging grid (1 mm left-to-right, 2 mm inferior–superior), and an energy of
210 MeV. Range error maps were computed between rCT and sCTorig and between
rCT and sCTcor for all patients. Range error was calculated similarly to previous stu-
dies.41-43 The resulting range error maps were analyzed by calculating mean abso-
lute range error (MARE) and mean range error (MRE). The influence of lung tissue
on range errors was investigated by calculating MARE only including pencil beams
traversing the lungs. To select these beams, the lungs were segmented on the rCT
and the resulting lung mask was projected along the proton beam direction. This
resulting 2D-lung mask was applied to the range error maps.
115
Chapter 5
3 Results
Figure 1 presents an overview of CBCT, sCTorig, sCTcor, and the reference rCT together
with difference images between rCT and sCTorig/sCTcor. On average, sCTcor resulted
in a significantly lower MAE than sCTorig with respective values of 30.7 ± 4.4 HU
and 34.1 ± 5.5 HU (p-value: < 1 × 10–6). Average ME changed from 4.3 ± 7.7 HU for
sCTorig to 2.4 ± 3.9 HU for sCTcor, but the difference was not found to be statistically
significant (p-value > 0.05). Overall, ME showed a trend toward positive values in-
dicating lower HU values on sCTs when compared to rCTs. SSIM remained virtually
unchanged with values of 0.938 ± 0.019 for sCTorig and 0.941 ± 0.019 for sCTcor. The
PSNR showed a slight improvement from sCTq to sCTQ with values of 30.7 ± 3.3 dB
to 31.2 ± 3.4 dB respectively but was not statistically significant (p-value > 0.05).
Figure 1. Axial and coronal slices of CBCT, sCTorig, sCTcor, and the reference rCT together with difference
maps between sCTorig/cor and rCT. A HU-window of 2000 (width)/ 0 (level) was used for sCTorig, sCTcor,
and rCT
Detailed MAE and ME results for each patient individually are visualized in Figu-
re 2. Individual results for the other metrics are reported in the Supporting Infor-
mation (Figure S4). Figure 2c shows the MAE spectrums for sCTorig and sCTcor. For
voxels below 300 HU, sCTcor shows slightly lower MAE than sCTorig, indicating the
effectiveness of the patient-specific correction. Error regions however overlap for
116
the entire HU-range.
5
1
Figure 2. (a) MAE and (b) ME for sCTorig and sCTcor for each patient. The dashed lines indicate the
average values. (c) Average MAE spectrum of sCTorig and sCTcor. The shaded area indicates one stan-
dard deviation. In green, an average image histogram is presented.
117
Chapter 5
Results from the gamma analysis of clinical treatment plans are shown in Figure
3a (3%/3 mm and 2%/2 mm criteria). The patient-specific correction technique in-
creased average 3%/3 mm gamma pass ratios from 93.7 ± 4.8% to 96.8 ± 2.4%. This
difference was found to be statistically significant (p-value: 4*10–4). Furthermore,
the lowest observed 3%/3 mm pass ratio increased from 82.8% (sCTorig, patient 23)
to 90.7% (sCTcor, patient 20). A similar trend was observed for 2%/2 mm pass ratios.
Figure 3b presents boxplots of relative dose differences for TVs and OARs. For GTV
and CTV, both sCTs showed very good agreement with mean doses calculated on
the rCT. The mean dose differences were within ±0.5% for all patients. For the CTV,
also Dmax, D95, D98, and V95 agreed well with the rCT for both sCTorig and sCTcor.
Differences for all patients were within ±5% and mean differences close to 0%.
Only V100 values showed larger discrepancies of up to −15% for sCTorig. Applying
the correction reduced the maximum V100 difference to −7%. A figure containing
CTV dose differences for these dosimetric parameters is presented in the Suppor-
ting Information (Figure S5). For OAR, mean doses varied greatly, with values bet-
ween 0.3 and 37.9 GyRBE. Results for OAR with an absolute dose below 1 Gy were ex-
cluded. Overall, higher relative dose differences were observed for OAR. The largest
differences occurred in the heart (mean dose) and the spinal cord (max dose), with
values up to 10%. For lung and esophagus, values were within ±5% for all patients.
Across all TVs and OARs, sCTcor resulted in a slightly lower variance than sCTorig.
In Figure 4, HU and dose profiles of rCT, sCTorig, and sCTcor are presented for patient
12. The selected profiles run parallel to the proton beam direction (gantry angle of
300 degrees) and the displayed dose values are only for the 300 degree beam di-
rection instead of the entire plan. The HU- and dose-profiles visualize the relation-
ship of lung tissue inaccuracies of sCTorig and the resulting dose shift. Applying the
patient-based correction (sCTcor) restored accurate proton range, and good agree-
ment with the rCT dose profile can be observed.
3.3 Comparison between pCTdef and sCTcor
Almost similar MAE and ME values were observed for pCTdef and sCTcor. Average
MAE/ME values were 31.5 ± 6.6 HU/2.9 ± 4.7 HU for pCTdef and 30.7 ± 4.4 HU/2.4 ±
3.9 HU for sCTcor. The comparison of proton dose distributions resulted in higher
gamma pass rates for sCTcor than for pCTdef (3%/3 mm: 96.8% vs. 95.6%, 2%/2 mm:
93.1% vs. 91.6%). For three patients, a 3%/3 mm pass rate below 85 % was observed
for pCTdef, while for sCTcor, all patients achieved pass rates above 90%. These pCTdef
118
outliers appeared when a significant anatomical change occurred between acqui-

sition of pCT and CBCT. An example of such an anatomical change and figures sho-
wing evaluation results are presented in the Supporting Information (Figure S7).
5
1
Figure 3. (a) Gamma pass ratios (top: 3%/3 mm, bottom: 2%/2 mm) of sCTorig and sCTcor for each
patient individually. The dotted line in the corresponding color indicates the mean value of sCTorig and
sCTcor. This figure shows results for lung cancer patients only. (b) Relative dose differences between
sCTs and rCT for target volumes and selected organs at risk. Mean dose was used for all structures,
except the spinal cord (max dose).
119
Chapter 5
Figure 4. HU and dose profiles for rCT, sCTorig, and sCTcor. The selected profile is indicated with the
blue arrow. Solid lines represent the HU-profiles; dashed lines the corresponding dose profiles. The
displayed dose is from the 330° beam direction only and does not represent the full plan dose.
3.4 NTCP
A boxplot of ΔNTCP values for sCTorig and sCTcor is presented in Figure 5. A high
level of agreement between NTCP values, calculated on rCT and both sCTs, was
observed for all patients and across all predicted toxicities. For radiation pneumo-
nitis, the average ΔNTCP values were 0.0 ± 0.3% for sCTorig and 0.0 ± 0.2% for sCTcor
(max. ΔNTCP values for pneumonitis: −1% for sCTorig, −0.9% for sCTcor). For dyspha-
gia, average ΔNTCP values were −0.2 ± 0.4% for sCTorig and −0.1 ± 0.3% for sCTcor
(max. values of −1.3% and −0.9%, respectively). The endpoint of 2-year mortality
resulted in average ΔNTCP values of 0.0 ± 0.3% for sCTorig and 0.0 ± 0.2% for sCTcor
(max. values of −1.1% for sCTorig and −0.7% for sCTcor). Individual NTCP values for
each patient and toxicity are presented in the Supporting Information (Figure S6).
3.5 Proton radiography simulations
In Figure 6, range error maps between the reference rCT and the two synthetic CTs
(Figure 6a: sCTorig; Figure 6b: sCTcor) are presented for patient 23, in which the lar-
gest relative MARE decrease was achieved by using the patient-specific correction.
Figure 6c shows an accompanying water-equivalent thickness map. A reduction
120
in range errors is clearly visible in the lungs, while the surrounding areas remain
mainly unchanged.
5
1
Figure 5. Delta NTCP values (NTCPrCT − NTCPsCT) for dysphagia, radiation pneumonitis, and 2-year
mortality, calculated on sCTorig and sCTcor.
Figure 6. Range error maps for patient 23 between rCT and sCTorig (a) and between rCT and sCTcor (b).
Panel (c) shows the corresponding water equivalent thickness map (calculated based on the rCT).
Positive range errors indicate larger range on sCTs than rCTs; negative range errors lower range on
sCTs than rCTs
On average, MARE was reduced from 1.5 ± 0.5 mm on sCTorig to 1.1 ± 0.4 mm on
sCTcor. This difference was found to be statistically significant (p-value: < 10–5).
MRE decreased from 0.6 ± 0.8 mm on sCTorig to 0.3 ± 0.6 mm on sCTcor (not signifi-
cant, p-value: 0.2).
121
Chapter 5
Figure 7 shows MARE and MRE results for each patient individually. Overall, MRE
is shifted toward positive values, indicating that an increased proton range was
found on PR simulations based on sCTs. This is consistent with the observation of
a positive shift in ME values. The MARE calculation considering only beams traver-
sing lung tissue resulted in an average MARE of 2.1 ± 0.9 mm for sCTorig and 1.6 ± 0.6
for sCTcor. The MARE for the remaining beams is significantly lower on both sCTorig
and sCTcor, with values of 0.9 ± 0.6 and 0.7 ± 0.5 mm, respectively. This indicates
that the overall range error is mainly determined by the range error in lung tissue.
Additional range error maps are presented in the Supporting Information.
Figure 7. (a) Mean absolute range error for sCTorig and sCTcor. (b) Mean range error of sCTorig and sCTcor.
The dashed lines indicate the mean values.
122
4 Discussion
Frequent imaging is a prerequisite for APT in lung cancer patients. Deep convolu-
tional neural networks have previously shown their ability to correct HU deficien-
cies of routinely acquired CBCTs and thus enable CBCT-based APT in other treat-
ment sites. This study aimed at investigating image quality, dosimetric accuracy,
and clinical suitability of deep learning based sCTs for lung cancer patients. We
also proposed an accompanying patient-specific correction technique, utilizing HU
information from the pCT, which further improved the sCT in terms of dosimetric
accuracy and image quality.
The patient-specific correction method reduced the average MAE from 34.1 ± 5.5
HU to 30.7 ± 4.4 HU. This MAE is lower than previously reported results in literatu-
re. Maspero et al. achieved an MAE of 83 ± 10 HU. Their study used a single network
for head and neck, lung, and breast cancer patients and a different network archi-
5
1
tecture (generative adversarial network, GAN).22 Eckl et al. used a similar GAN ar-
chitecture and reported a comparable MAE of 94 ± 32 HU.28 However, image qua-
lity comparisons between different studies are challenging since sCT image quality
depends on the specific CBCT acquisition protocols and image similarity metrics,
such as MAE, are sensitive to the CBCT field-of-view and the used reference image
(e.g., same- day, rigid- or deformable registration).
The studies by Maspero et al. and Eckl et al. only reported results for photon dose
calculations. In the present study, for the first time, proton dose calculation accura-
cy of deep learning based synthetic CTs for APT of lung cancer patients was presen-
ted. We achieved average gamma pass ratios (3%/3 mm) of 93.7 ± 4.8% for the un-
corrected and 96.8 ± 2.4% for the corrected sCTs. The lowest observed pass ratios
increased from 82.8% to 90.7%. The results showed that sCTs with the lowest pass
ratio benefited most from the patient-specific correction strategy. This outcome is
relevant for clinical implementation of sCTs, where outliers have to be avoided and
consistent sCTs for a large patient cohort are desired. Besides investigating global
dose differences using gamma analysis, we also performed a local dose evaluation
for target volumes and organs at risk. The observed difference in mean dose to tar-
get volumes was below ± 0.5% for both sCTorig and sCTcor. Larger differences of up
to 12% were measured for organs at risk (spinal cord and heart). However, the ab-
solute dose in organs at risk varied greatly between patients and the largest relative
deviations were seen for the lowest absolute doses. By calculating NTCP, we also
showed that these dose differences are negligible for calculating the risk of develo-
ping certain side effects (dysphagia, radiation pneumonitis and 2-year mortality).
The average differences between NTCP calculated on the reference rCT and both
123
Chapter 5
sCTorig and sCTcor were below 0.2% and maximum ΔNTCP values did not exceed 2%.
The evaluation of several image quality and dose metrics shows the need for a
broad evaluation of sCTs. Global image quality (MAE, ME, PSNR, SSIM) and dose
(gamma analysis) metrics alone do not provide enough insights to assess the cli-
nical suitability of sCTs. Local dose metrics (target and organ at risk doses, NTCP
models) and proton radiography are valuable tools that provide evidence on the
locality and clinical impact of sCT errors.
Proton radiography simulations served two purposes: 1) to visualize the similarity

of HU values obtained in sCTs with respect to rCTs and thereby highlight anatomi-
cal areas with less accurate HUs and 2) to quantify the differences in proton range
between rCTs and sCTs (with and without correction). Applying the patient-speci-
fic correction reduced the overall MARE between rCT and sCT from 1.5 ± 0.5 mm
to 1.1 ± 0.4 mm. This confirms the effectiveness of the correction strategy and is
consistent with improvements seen in MAE and gamma pass ratios. Exclusively
evaluating range errors for the lung region showed significantly higher MARE for
both sCTorig (2.1 ± 0.9 mm) and sCTcor (1.6 ± 0.6 mm), which approximately doub-
les the range error observed in the remaining tissues. Based on these results, we
conclude that it is more challenging for the DCNN to accurately reproduce HUs of
lung tissues and that the patient-specific correction technique is able to partially
correct for it. The proton radiography simulations were performed for the entire
patient, which were beneficial to assess the HU accuracy but do not represent a
clinically feasible field size. Due to the detector size, in vivo proton radiography
measurements using a multi-layer ionization chamber are usually limited to small
fields of a few square centimeters.40, 42 In vivo proton radiography measurements
are envisioned to be utilized as a quality control tool to verify deep learning based
sCTs within APT workflows.44
The proposed patient-specific correction technique was introduced to correct low

frequency HU variations of the sCTs, which were most prevalent in lung tissue.
Lung tissue is prone to scatter artifacts on the CBCT, which lead to a lack of detail
and interferes with the fine structure. Correction maps and proton radiography si-
mulations highlight the larger errors of lung tissue. The patient-specific correction
relies on accurate CT HU-values. For the first few treatment fractions, the pCT is
the only available CT image and was therefore chosen for our study. However, in a
clinical workflow, pCTs can also be replaced with more recent rCT images if avai-
lable. The results presented in our study show a worst-case scenario, in which the
correction is based on a CT image acquired several weeks before treatment (pCT).
Although, by deforming the pCT to the sCT, thresholding the difference map and
124
applying a smoothing filter, the influence of time and anatomical differences of the
pCT are mitigated. Some patients (e.g., patients 8, 17, and 20) showed increased ME
after applying the DIR-based patient-specific correction. However, no correlation
between the deformation vector field properties (e.g., mean vector amplitude, Ja-
cobian) and the ME was found. An increased ME after applying the correction does
not necessarily indicate worse image quality. ME alone is not suitable as an image
quality and similarity metric, since negative and positive HU errors can cancel each
other out. ME should be considered in combination with a metric that also uses the
absolute differences between two images (e.g., MAE).
The comparison of pure deformable image registration (pCTdef) and sCTcor highl-
ighted the benefit of the combination of DCNN-based sCT and patient-specific HU-
correction. sCTcor resulted in an accurate representation of the daily anatomy and
provided improved HU accuracy. For many patients, pCTdef resulted in comparable
dosimetric accuracy as sCTcor, but for some patients, anatomical changes between 5
1
acquisition of pCT and CBCT, which cannot always be modeled accurately by DIR,
lead to outliers with significantly lower dose calculation performance (gamma
pass ratio decrease of up to 17%). This makes clinical implementation of pure DIR-
based sCT generation challenging and favors DCNN-based sCTs in combination
with a pCT-based correction method.
This study was performed with a limited dataset of 33 thorax cancer patients. To
efficiently use the entire dataset for image and dose evaluation, a threefold cross
validation procedure was used. Our dataset was limited since the treatment of lung
cancer patients with proton therapy only recently started at our institution. The
patient number will increase in the future and will enable studies to investigate
the influence of the dataset size on image and dosimetric quality. The image quality
of the initial CBCT has a major influence on sCT quality too. Higher image quali-
ty, especially of lung tissue, could further improve sCT accuracy. Currently, CBCT
imaging parameters are chosen for patient alignment purposes. Better CBCT image
quality would most likely be connected to an increased imaging dose, which would
have a large impact on the imaging dose burden, particularly if CBCT images are
acquired daily. Therefore, potential gains in image quality have to be carefully ba-
lanced with the dose exposure of patients. The influence of imaging parameters on
lung sCTs should be investigated in further studies.
We used a similar neural network architecture in previous head and neck cancer
patient studies,23, 24 which enables to establish a comparison of image quality
and dosimetric accuracy between these anatomical locations. Lung sCTs (sCTorig)
resulted in an average MAE of 34.1 ± 5.5 HU (sCTorig), which is lower than the MAE
125
Chapter 5
observed for H&N sCTs (40.2 ± 3.9 HU). This is contrary to the observed 3%/3 mm
gamma pass ratios of clinical treatment plans, which were significantly higher for
H&N cancer patients (98.8% vs. 93.7%). Even with the applied patient specific cor-
rection, lung sCTs resulted in a lower pass ratio (98.8% vs. 96.8%). This discrepan-
cy is caused by the different tissue compositions, larger tissue heterogeneity, and
the increased radiological depth in the lung region. The evaluation of NTCP values
and target/OAR doses show similar accuracy for H&N and lung cancer patients,
indicating similar clinical suitability of deep learning based sCTs for both treat-
ment sites.
5 Conclusion
In this study, we proposed and evaluated a CBCT-based sCT generation method for
APT of lung cancer patients. We have shown that a DCNN in combination with a
patient-specific correction method can generate accurate sCTs for proton dose cal-
culations. Clinically relevant dose statistics and NTCP values showed high agree-
ment between sCTs and same-day rCTs, indicating the potential suitability of the
generated sCTs for application in APT workflows for lung cancer patients.
Acknowledgment
(KWF research project 11518).
J. A. Langendijk is a consultant for International Scientific Advisory Committees

of IBA and RaySearch. The Department of Radiation Oncology, University Medical
Centre Groningen, has active research agreements with IBA, RaySearch, Siemens,
Elekta, Leoni, and Mirada. A. Meijers is employed by Varian Medical Systems. Work
in the context of this manuscript was conducted prior to the employment with Va-
rian.
126
References
[1] Newhauser WD, Zhang R. The physics of proton therapy. Phys Med Biol. 2015; 60(8):
R155– R209.
[2] Sonke JJ, Belderbos J. Adaptive radiotherapy for lung cancer. Semin Radiat Oncol. 2010;
20(2): 94– 106.
Oncol. 2019; 29(3): 245– 257.
Br J Radiol. 2020; 93(1107):20190594.
[5] Fotina I, Hopfgartner J, Stock M, Steininger T, Lütgendorf-Caucig C, Georg D. Feasibility
of CBCT-based dose calculation: comparative analysis of HU adjustment techniques. Ra-
diother Oncol. 2012; 104(2): 249– 256.
[6] de Smet M, Schuring D, Nijsten S, Verhaegen F. Accuracy of dose calculations on kV cone
beam CT images of lung cancer patients. Med Phys. 2016; 43(11): 5934– 5941.
[7] Kaplan LP, Elstrøm UV, Møller DS, Hoffmann L. Cone beam CT based dose calculation in
the thorax region. Phys Imaging Radiat Oncol. 2018; 7: 45– 50.
5
1
[8] Usui K, Ichimaru Y, Okumura Y, et al. Dose calculation with a cone beam CT image in
image-guided radiation therapy. Radiol Phys Technol. 2012; 6(1): 107– 114.
[9] Dunlop A, McQuaid D, Nill S, et al. Comparison of CT number calibration techniques for
CBCT-based dose calculation. Strahlentherapie und Onkol. 2015; 191(12): 970– 978.
[10] Chen S, Le Q, Mutaf Y, et al. Feasibility of CBCT -based dose with a patient-specific step-
wise HU -to-density curve to determine time of replanning. J Appl Clin Med Phys. 2017;
18(5): 64– 69.
[11] Giacometti V, King RB, Agnew CE, et al. An evaluation of techniques for dose calculation
on cone beam computed tomography. Br J Radiol. 2019; 92(1096):20180383.
[12] Peroni M, Ciardo D, Spadea MF, et al. Automatic segmentation and online virtualCT in
head-and-neck adaptive radiation therapy. Int J Radiat Oncol. 2012; 84(3): e427– e433.
[13] Veiga C, McClelland J, Moinuddin S, et al. Toward adaptive radiotherapy for head and
neck patients: feasibility study on using CT-to-CBCT deformable registration for “dose of
the day” calculations. Med Phys. 2014; 41(3):31703.
[14] Veiga C, Janssens G, Teng CL, et al. First clinical investigation of cone beam computed
tomography and deformable registration for adaptive proton therapy for lung cancer. Int
J Radiat Oncol Biol Phys. 2016; 95(1): 549– 559.
[15] Yuan Z, Rong Y, Benedict SH, Daly ME, Qiu J, Yamamoto T. Dose of the day” based on cone
beam computed tomography and deformable image registration for lung cancer radio-
therapy. J Appl Clin Med Phys. 2019; 21(1): 88– 94.
clinical cone beam computed tomography imaging made possible by the combination
of Monte Carlo simulations and a ray tracing algorithm. Acta Oncol (Madr). 2013; 52(7):
1477– 1483.
[17] Thing RS, Bernchou U, Hansen O, Brink C. Accuracy of dose calculation based on artefact
corrected cone beam CT images of lung cancer patients. Phys Imaging Radiat Oncol. 2017;
1: 6– 11.
[18] Kida S, Nakamoto T, Nakano M, et al. Cone beam computed tomography image quality
improvement using a deep convolutional neural network. Cureus. 2018; 10(4):e2548.
[19] Harms J, Lei Y, Wang T, et al. Paired cycle-GAN-based image correction for quantitative
cone-beam computed tomography. Med Phys. 2019; 46(9): 3998– 4009.
127
Chapter 5
[20] Liu Y, Lei Y, Wang T, et al. CBCT-based synthetic CT generation using deep-attention cyc-
leGAN for pancreatic adaptive radiotherapy. Med Phys. 2020; 47(6): 2472– 2483.
from cone-beam computed tomography (CBCT) using CycleGAN for adaptive radiation
therapy. Phys Med Biol. 2019; 64(12):125002.
[22] Maspero M, Houweling AC, Savenije MH, et al. A single neural network for cone-beam
computed tomography-based radiotherapy of head-and-neck, lung and breast cancer.
Phys Imaging Radiat Oncol. 2020; 14: 24– 31.
2020; 65(9):95002.
[24] Thummerer A, de Jong BA, Zaffino P, et al. Comparison of the suitability of CBCT- and
Phys Med Biol. 2020; 65(23):235036.
[25] Zhang Y, Yue N, Su M, et al. Improving CBCT quality to CT level using deep learning with
generative adversarial network. Med Phys. 2021; 48(6): 2816– 2826.
[26] Landry G, Hansen D, Kamp F, et al. Comparing Unet training with three different data-
sets to correct CBCT images for prostate radiotherapy dose calculations. Phys Med Biol.
2019; 64(3):35011.
[27] Kurz C, Maspero M, Savenije MHF, et al. CBCT correction using a cycle-consistent ge-
nerative adversarial network and unpaired training to enable photon and proton dose
calculation. Phys Med Biol. 2019; 64(22):225004.
[28] Eckl M, Hoppen L, Sarria GR, et al. Evaluation of a cycle-generative adversarial network-
based cone-beam CT to synthetic CT conversion algorithm for adaptive radiation thera-
py. Phys Medica. 2020; 80: 308– 316.
[29] Zaffino P, Raudaschl P, Fritscher K, Sharp GC, Spadea MF. Technical note: plastimatch
mabs, an open source tool for automatic image segmentation. Med Phys. 2016; 43(9):
5155– 5160.
Ther. 2015; 2(2): 404– 414.
[31] Landry G, Nijhuis R, Dedes G, et al. Investigating CT to CBCT image registration for head
and neck proton therapy as a tool for daily dose recalculation. Med Phys. 2015; 42(3):
1354– 1366.
[32] Spadea MF, Pileggi G, Zaffino P, et al. Deep convolution neural network (DCNN) multi-
therapy. Int J Radiat Oncol Biol Phys. 2019; 105(3): 495– 503.
[33] Wang Z, Simoncelli EP, Bovik AC, Multi-scale structural similarity for image quality as-
sessment. Conference Record of the Asilomar Conference on Signals, Systems and Com-
puters. IEEE; 2003; 2: 1398– 1402. https://doi.org/10.1109/acssc.2003.1292216
[34] Renieblas GP, Nogués AT, González AM, Gómez-Leon N, del Castillo EG. Structural simi-
larity index family for image quality assessment in radiological images. J Med Imaging.
2017; 4(3):035501.
[35] van der Laan HP, Anakotta RM, Korevaar EW, et al. Organ sparing potential and inter-
fraction robustness of adaptive intensity modulated proton therapy for lung cancer. Acta
Oncol (Madr). 2019; 58(12): 1775– 1782.
[36] Nederlandse vereniging voor Radiotherapie en Oncologie. Landelijk indicatieproto-
col protonen thorax. https://nvro.nl/images/documenten/rapporten/LIPP_longen_fi-
nal_01122019.pdf. Accessed November 9, 2021.
[37] Appelt AL, Vogelius IR, Farr KP, Khalil AA, Bentzen SM. Towards individualized dose
128
constraints: adjusting the QUANTEC radiation pneumonitis model for clinical risk fac-
tors. Acta Oncol (Madr). 2014; 53(5): 605– 612.
[38] Dankers FJWM, Wijsman R, Troost EGC, et al. External validation of an NTCP model for
acute esophageal toxicity in locally advanced NSCLC patients treated with intensity-mo-
dulated (chemo-)radiotherapy. Radiother Oncol. 2018; 129(2): 249– 256.
[39] Defraene G, Dankers FJWM, Price G, et al. Multifactorial risk factors for mortality after
chemotherapy and radiotherapy for non-small cell lung cancer. Radiother Oncol. 2020;
152: 117– 125.
zation chamber. Phys Med Biol. 2016; 61(11): 4078– 4087.
[41] Farace P, Righetto R, Deffet S, Meijers A, Vander Stappen F. Technical note: a direct ray-
tracing method to compute integral depth dose in pencil beam proton radiography with
a multilayer ionization chamber. Med Phys. 2016; 43(12): 6405– 6412.
[42] Meijers A, Seller Oria C, Free J, Langendijk JA, Knopf AC, Both S. Technical note: first re-
port on an in vivo range probing quality control procedure for scanned proton beam the-
rapy in head and neck cancer patients. Med Phys. 2021; 48(3): 1372– 1380.
[43] Seller Oria C, Marmitt GG, Both S, Langendijk JA, Knopf AC, Meijers A. Classification of
various sources of error in range assessment using proton radiography and neural net-
works in head and neck cancer patients. Phys Med Biol. 2020; 65(23):235009.
5
1
[44] Seller Oria C, Thummerer A, Free J, et al. Range probing as a quality control tool for CBCT-
based synthetic CTs: in vivo application for head and neck cancer patients. Med Phys.
2021; 48(8): 4498– 4505.
129
Chapter 5
Supplementary Materials
S1 Patient demographics
Nr. of Patients 27
female 15
male 12
Median Age (years) 69
Tumor stage
T1 N0 M0 3
T1 N1 M0 1
T1 N2 M0 4
T1 N2 M1 1
T2 N0 M0 2
T2 N2 M0 3
T4 N0 M0 3
T4 N2 M0 8
T4 N3 M0 2
Location
Left 9
upper 6
lower/middle 3
Right 17
upper 6
lower/middle 11
Both sides 1
upper 1
Motion amplitude
< 10mm (small movers) 21
average amplitude (mm) 7
>10mm (big movers) 6
average amplitude (mm) 13
Table S1 Demographics for lung cancer patients included in the dosimetric evaluation (27 out of 33).
130
S2 Imaging parameters
CBCT rCT pCT
Scanner IBA Proteus Plus Siemens SOMATOM Siemens SOMATOM

Confidence Definition AS
Voltage [kVP] 110 120 120
Current [mA] 320 variable variable
Acq. matrix 768 x 768 x 70 512 x 512 x ~191 512 x 512 x ~191
Voxel size [mm] 0.6 x 0.6 x 2.5 1.0 x 1.0 x 2.0 1.0 x 1.0 x 2.0
FOV [mm] RL: 500 RL: 500 RL: 500

AP: 500 AP: 500 AP: 500
IS: 175 IS: ~400 IS: ~400
Source-Isocenter-Dis- 2536 - -
tance [mm]
5
1
Imaging panel 1440 x 1440 - -
resolution
Imaging panel 0.3 x 0.3 - -

pixel size [mm]
Table S2 Imaging and reconstruction parameters used for CBCT and CT images.
S3 Neural network architecture
Figure S3 Schematic overview of the used neural network, originally described by Spadea et al. The left
side (green background) shows the encoding pathway to extract features, consisting of stacked con-
volutions and max pooling sequences. The right side (blue background), shows the decoding pathway
with up-scalings, skip connections and convolutional layers.
131
Chapter 5
S4 Additional image quality metrics
Figure S4a PSNR results of sCTorig and sCTcor for each patient individually. The dotted line indicates
mean PSNR values.
Figure S4b SSIM results of sCTorig and sCTcor for each patient individually. The dotted line indicate
mean SSIM values.
132
S5 CTV dosimetric parameters
Figure S5 Relative dose difference between sCTcor/orig and the reference rCT, for several dosimetric
parameters (Dmean, Dmax, D95, D98, V95, V100) of the CTV. 5
1
S6 NTCP results
133
Chapter 5
Figure S6 NTCP values calculated on rCT, sCTorig and sCTcor for Radiation pneumonitis, Dysphagia,
and 2 year mortality. These figures only show results for lung cancer patients.
S7 Comparison between pCTdef and sCTcor
Figure S7a Comparison of MAE and ME values between the corrected sCT (sCTcor) and the deformed
planning CT (pCTdef). The dotted lines indicate the mean values.
134
Figure S7b Comparison of 3%/3mm and 2%/2mm gamma pass ratios of sCTcor and pCTdef. The dot-
5
1
ted lines indicate the mean values.
Figure S7c Axial slices of pCTdef, rCT, sCTcor of patient 17 highlighting anatomical changes in the lung
tissue that lead to the decreased dosimetric performance of pCTdef. On sCTcor and CBCT the anatomy
is represented accurately.
135
Chapter 5
S8 Additional range error maps
Figure S8a Worst case scenario (higesht MARE): Patient 31
Comments: For patient 31 heavy scatter artifacts were observed in the lungs. This
scatter lead to missing tissue at the interface of lung and rib cage on the sCT, which
in turn resulted in significant range errors across the entire lung. This behavior was
unique to this patient. Comparing range error maps for sCTorig and sCTcor shows
the improvement of applying the patient specific correction method. This patient
was not included in the dosimetric evaluation since he was treated for Thymoma
instead of lung cancer.
CBCT
sCTorig
rCT
136
CBCT rCT
sCTorig

rCT
CBCT rCT
Figure S8b 2nd worst scenario (2nd higesht MARE): Patient 20
Comments: Patient 20 was significantly larger than the average patient. This lead
to lower CBCT image quality and more scatter in the lungs. The neural network
also had difficulties accurately generating HU for areas close to the image boun- 5
1
daries (area indicated by yellow square). Additionally, structures in the lung show
much less detail on the CBCT (due to scatter) than they do on the rCT (red area).
Due to this lack of detail, the neural network couldn’t recover correct HUs for these
regions. The axial slices of CBCT and rCT show the difference in scatter between
CBCT and rCT. The patient specific correction was able to correct the sCT to a li-
mited extent.
AVERAGE SCENARIO (approx. average MARE): Patient 25
Figure S8c Best case scenario (lowest MARE): Patient 9 and Average scenario (average MARE): Patient
25
137
Chapter 5
Comments: Patient 25 shows average MARE and is representative of the behavior

of the patient specific correction method. Parts in the lung show significantly lower
error when the correction is applied but there still remains a residual range error.
138
Deep learning–based 4D-synthetic CTs from sparse-view CBCTs
for dose calculations in adaptive proton therapy
Chapter 6
Deep learning–based 4D-synthetic
CTs from sparse-view CBCTs for
dose calculations in adaptive
proton therapy
Adrian Thummerer1, Carmen Seller Oria1, Paolo Zaffino2, Arturs Meijers3,
Gabriel Guterres Marmitt1, Robin Wijsman1, Joao Seco4,5, Johannes A. Langendijk1,
Antje C. Knopf6, Maria F. Spadea2, Stefan Both1
6
1
1
Department, of Radiation Oncology, University Medical Center Groningen,
University of Groningen, Groningen, Netherlands
2
Department of Experimental and Clinical Medicine,
Magna Graecia University, Catanzaro, Italy
3
Center for Proton Therapy, Paul Scherrer Institute, Villigen, Switzerland
4
Department of Biomedical Physics in Radiation Oncology,
Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Germany
5
Department of Physics and Astronomy, Heidelberg University, Heidelberg, Germany
6
Department I of Internal Medicine, Center for Integrated Oncology Cologne,
University Hospital of Cologne, Cologne, Germany
Published in:
Medical Physics
November 2022, Volume 49, Issue 11
DOI 10.1002/mp.15930
139
Chapter 6
Abstract
Background
Time-resolved 4D cone beam–computed tomography (4D-CBCT) allows a daily
assessment of patient anatomy and respiratory motion. However, 4D-CBCTs suffer
from imaging artifacts that affect the CT number accuracy and prevent accurate
proton dose calculations. Deep learning can be used to correct CT numbers and ge-
nerate synthetic CTs (sCTs) that can enable CBCT-based proton dose calculations.
Purpose
In this work, sparse view 4D-CBCTs were converted into 4D-sCT utilizing a deep
convolutional neural network (DCNN). 4D-sCTs were evaluated in terms of image
quality and dosimetric accuracy to determine if accurate proton dose calculations
for adaptive proton therapy workflows of lung cancer patients are feasible.
Methods
A dataset of 45 thoracic cancer patients was utilized to train and evaluate a DCNN
to generate 4D-sCTs, based on sparse view 4D-CBCTs reconstructed from projecti-
ons acquired with a 3D acquisition protocol. Mean absolute error (MAE) and mean
error were used as metrics to evaluate the image quality of single phases and ave-
rage 4D-sCTs against 4D-CTs acquired on the same day. The dosimetric accuracy
was checked globally (gamma analysis) and locally for target volumes and organs-
at-risk (OARs) (lung, heart, and esophagus). Furthermore, 4D-sCTs were also com-
pared to 3D-sCTs. To evaluate CT number accuracy, proton radiography simula-
tions in 4D-sCT and 4D-CTs were compared in terms of range errors. The clinical
suitability of 4D-sCTs was demonstrated by performing a 4D dose reconstruction
using patient specific treatment delivery log files and breathing signals.
Results
4D-sCTs resulted in average MAEs of 48.1 ± 6.5 HU (single phase) and 37.7 ± 6.2 HU
(average). The global dosimetric evaluation showed gamma pass ratios of 92.3% ±
3.2% (single phase) and 94.4% ± 2.1% (average). The clinical target volume showed
high agreement in D98 between 4D-CT and 4D-sCT, with differences below 2.4%
for all patients. Larger dose differences were observed in mean doses of OARs (up
to 8.4%). The comparison with 3D-sCTs showed no substantial image quality and
dosimetric differences for the 4D-sCT average. Individual 4D-sCT phases showed
slightly lower dosimetric accuracy. The range error evaluation revealed that lung
tissues cause range errors about three times higher than the other tissues.
140
Conclusions
In this study, we have investigated the accuracy of deep learning–based 4D-sCTs
for daily dose calculations in adaptive proton therapy. Despite image quality dif-
ferences between 4D-sCTs and 3D-sCTs, comparable dosimetric accuracy was ob-
served globally and locally. Further improvement of 3D and 4D lung sCTs could be
achieved by increasing CT number accuracy in lung tissues.
6
1
141
Chapter 6
1. Introduction
The intent of adaptive radiation therapy (ART) is to ensure accurate dose delivery
throughout the treatment course by closely monitoring patient's anatomy and res-
toring dose conformity by adapting the treatment in accordance with anatomical
changes.1-4 To detect anatomical changes, frequent volumetric imaging is essen-
tial for any ART workflow. Cone beam–computed tomography (CBCT) is routine-
ly used for daily patient position verification in photon and proton radiotherapy.
CBCTs are considered valuable for ART, given that they provide a daily represen-
tation of the patient anatomy in the actual treatment position.5 However, due to
imaging artifacts such as scatter, beam hardening, or a smaller field of view (FOV),
CBCT image quality is inferior to diagnostic CT image quality.6 Although this is ac-
ceptable for patient alignment, it affects the dose calculation accuracy and thereby
the suitability for ART, especially in proton therapy. Hence, within ART workflows,
a correction of CBCTs is required before CBCT-based dose calculations can be per-
formed.7
In proton therapy, the characteristic dose falloff after the dose maximum (Bragg
peak) and the associated sensitivity to density changes along the beam path leads
to even higher demands on the CT number accuracy than conventional photon
dose calculations.8 Various CBCT correction approaches have been previously in-
vestigated for photon and proton dose calculations, including methods based on
histogram matching,9 deformable image registration (DIR),10-12 projection-based
corrections,13-15 and Monte Carlo simulations.16, 17 Recently, the focus shifted to-
wards deep learning–based correction methods.18 These approaches utilize deep
convolutional neural networks (DCNNs) for image-to-image translation and learn
a correction of CBCT artifacts from assumingly artifact-free diagnostic CT images.
The corrected images, also referred to as synthetic CTs (sCT), have shown promi-
sing results for dose calculations in various anatomical regions, such as brain,19
head and neck,20-22 lung,23 and prostate.24, 25
Previous studies on deep learning–based sCTs focused on 3D-imaging, neglecting

any internal motion of the patient (e.g., respiratory and cardiac). In the thorax and
abdomen, respiratory motion affects the position of target volumes and healthy
tissues and may have a severe impact on the dose distribution and treatment qua-
lity. In clinical practice, the quasiperiodic breathing motion, in combination with
the high sensitivity of proton beams to density changes, motivates the routine use
of time-resolved 4D imaging during treatment planning and treatment verifica-
tion. Patient alignment CBCTs can also be acquired with dedicated 4D-CBCT ac-
quisition protocols and reconstruction algorithms. Although 4D-CBCTs are more
142
commonly used in the photon radiotherapy field, they are slowly being adopted
in particle therapy.26 4D-CBCT acquisitions with an image quality comparable to
that of 3D-CBCTs require more projection data, resulting in higher imaging dose
and longer acquisition times, and are not yet commonly used in proton therapy.
To overcome these limitations and still achieve acceptable image quality, advan-
ced reconstruction algorithms have been developed to reconstruct 4D-CBCTs from
sparse-view acquisitions (a limited number of projections) originally used for 3D-
CBCTs.27, 28
Previous studies already investigated non–deep learning strategies to correct 4D-

CBCTs and enable dose calculations for proton therapy. Niepel et al. and Bondes-
son et al. investigated a DIR–based method,29, 30 whereas Schmitz et al. extended
a previously investigated projection- and prior-based scatter correction algorithm
to a 4D scenario.31 All three studies utilized an ex vivo porcine lung phantom to
validate their CBCT correction strategies and showed the general feasibility of 4D-
CBCT-based proton dose calculations. However, none of these approaches used ac-
tual patient 4D-CBCT data. Madesta et al. presented a self-contained deep learning
method to improve 4D-CBCT image quality.32 This method showed considerable
6
1
improvements in image quality, but it was not evaluated in the context of photon
or proton dose calculations.
Daily time–resolved 4D imaging can reduce the impact of breathing motion–in-

duced uncertainties, which impact the quality of proton therapy treatments in the
thorax. The purpose of this study was to investigate the ability of a DCNN to ge-
nerate 4D-sCTs from sparse view 4D-CBCTs and to evaluate the suitability of 4D-
sCTs for proton dose calculations in adaptive proton therapy workflows.
143
Chapter 6
2. Materials and Methods

2.1 Patient datasets
A dataset of 45 thoracic cancer patients, treated at the University Medical Center

Groningen, was used to train and evaluate a DCNN for 4D-sCT generation. Patients
were aged between 18 and 83 years (mean age: 61.3 years). For each patient, a 4D-
CT, raw CBCT projections, structure sets, treatment plans, breathing signals, and
treatment delivery log files were retrospectively collected. All patients were treated
with pencil beam scanning intensity-modulated proton therapy (PBS-IMPT) using
three beam angles. Patients breathing rates varied from 13.4 to 24.5 breathing cyc-
les/min in the testing set and from 11.4 to 26.4 cycles/min in the training set. The
Supporting Information contain a table showing breathing rates for each patient
(Table S2 and Figure S2).
2.2 Imaging data
2.2.1 CBCT reconstruction

To reconstruct 4D-CBCTs, raw CBCT projections from a single fraction were col-
lected for each patient. CBCT projections were acquired using the gantry-mounted
CBCT scanner of an IBA Proteus Plus proton therapy system (IBA, Belgium), with a
resolution of 1440 × 1440 pixels, a pixel size of 0.3 × 0.3 mm, a full 360-degree arc, a
detector offset to enlarge the axial-FOV (500 mm), a tube voltage of 110 kVp, a tube
current of 320 mA, and an exposure time of 12.5 ms. Each acquisition consisted
of 472 projections, with an angular spacing of 0.8 degrees and a total acquisition
time of approx. 70 s. Two different FOV settings in cranio-caudal (CC) direction
were used: (1) long scan: All 1440 pixels rows were irradiated, (2) short scan: Only
the central 720 rows were irradiated. Short scans were chosen for imaging dose
reduction in patients that did not require the full FOV for pretreatment position
verification.
CBCT projections were reconstructed into 4D-CBCTs with six breathing phases
using the iterative MA-ROOSTER reconstruction algorithm.28 The algorithm per-
forms spatial and temporal regularization utilizing a-priori information from a
planning 4D-CT scan. Similar reconstruction parameters as the ones used by den
Otter et al. and Mory et al. were chosen for the reconstruction of 4D-CBCTs, except
for the number of breathing phases.26, 28 To have a uniform FOV in CC direction,
CBCTs for all patients were reconstructed with a reduced FOV using only the cen-
tral 720 pixels in CC direction. Reconstruction was performed using the MA-ROOS-
TER implementation of the open-source reconstruction toolkit RTK (www.openrtk.
144
org).33 Preprocessing of projections, and motion estimation, using a diffeomorphic

morphons algorithm, was performed using the 4D-CBCT-reconstruction workflow
of the open-source MATLAB toolbox open-REGGUI (www.openreggui.org).
2.2.2 4D-CTs
In this study, 4D-CTs were used for two purposes. First, for the reconstruction of
4D-CBCTs using the MA-ROOSTER algorithm. Deformable vector fields were ge-
nerated between each breathing phase and a reference phase (50% phases, maxi-
mum exhale) of the planning 4D-CT. Second, follow-up 4D-CTs, acquired on the
same day as the CBCT projections, were used as ground-truth images to evaluate
the quality of 4D-sCTs.
Planning 4D-CT images were acquired 1–2 week before start of treatment, using
a Siemens SOMATOM Definition AS scanner (Siemens Healthineers, Germany),
whereas verification CT scans were acquired on a Siemens SOMATOM Confidence
scanner, directly at the proton therapy center. Both planning and follow-up 4D-
CTs were acquired with a tube voltage of 120 kVp, a variable tube current, a pixel
size of 1.0 × 1.0 mm2, an axial resolution of 512 × 512 pixels, and a slice thickness of
6
1
2 mm. All 4D-CTs were reconstructed with 10 breathing phases.
To reduce the impact of anatomical and positional differences on the sCT evalua-
tion, a phase-by-phase DIR between 4D-CBCT and the ground-truth 4D-CT was
performed. 4D-CTs were reconstructed into 10 breathing phases, whereas 4D-
CBCTs only into 6. Therefore, to generate a six-phase ground-truth image, the clo-
sest 4D-CT phase was chosen and registered to the respective 4D-CBCT phase (e.g.,
4D-CBCT 17% phase to 4D-CT 20% phase). A diffeomorphic morphons algorithm,
implemented in open-REGGUI and extensively investigated for deformable regis-
tration of CBCT and CT images,11, 12, 34 was used for the DIR. For simplicity, we will
still refer to the deformed 4D-CT as just 4D-CT in the following sections.
2.3 Synthetic CTs
To generate sCTs, a U-net-like DCNN architecture, originally proposed by Spadea et

al.,35 was utilized. This network architecture was already thoroughly investigated
for image synthesis in the context of MR- and CBCT-based proton dose calcula-
tions for brain,35 head-and-neck,20, 21 and thoracic23 cancer patients. A figure de-
picting the neural network architecture is presented in Figure S1. A special feature
of the DCNN proposed by Spadea et al. is that separate networks are trained with
axial, coronal, and sagittal slices. Each individual sCT is then combined into a final
sCT by averaging voxel values from the three anatomical views. For more details on
145
Chapter 6
the network architecture, the reader is referred to previous work by Spadea et al.35
Our dataset was randomly split into a training (27 patients), validation (3 pati-
ents), and testing set (15 patients). Training of the DCNN was performed with pairs
of 0%-phase images of 4D-CBCT and 4D-CT. To generate a full 4D-sCT during infe-
rence, the networks, trained exclusively with 0%-phase images, were applied to all
other 4D-CBCT phases. Similarly to previous studies, a batch size of 1 was used, and
training was stopped when no decrease in training loss was observed for five con-
secutive epochs. Training and inference were performed on an NVIDIA GTX 1080 TI
graphical processing unit (GPU) with 11 GB of VRAM.
2.4 Comparison with 3D-sCTs
For comparison purposes, 3D-sCTs, based on the clinically used 3D-CBCTs, were
generated for all patients. These 3D-CBCTs were reconstructed from the same set
of projections as 4D-CBCTs, but with the clinically used reconstruction algorithm
and settings of the clinically used IBA Adapt Insight software (IBA, Belgium). The
reconstructed 3D-CBCTs were converted into 3D-sCTs with a previously trained
network. A full description of this DCNN for 3D lung sCTs is provided by Thumme-
rer et al.23 A figure depicting the entire workflow for 4D and 3D evaluation of sCTs
is presented in Figure S3.
Image quality of sCTs was evaluated against ground-truth 4D-CTs for a variety
of scenarios: (1) for the two extreme 4D-sCT phases (0% maximum inhale, called
“4D-sCT-0%” and 50% maximum exhale, called “4D-sCT-50%”); (2) for a “4D-CT-
average” obtained by averaging voxel values of the six breathing phases; and (3) for
the 3D-sCT. For the single-phase 4D-sCTs, the respective 4D-CT phases were used
as a ground-truth image. For the 4D-sCT-average, an average 4D-CT was genera-
ted in the same way the 4D-sCT-average was created and used as reference. For the
3D-sCT, the average projection of a same-day verification 4D-CT was deformed to
the 3D-CBCT and used as ground truth. Detailed image quality results for the other
breathing phases (17%-, 33%-, 67%-, and 83%-phase) are presented in Figure S4
and Table S5.
Image similarity between sCTs and the reference CTs was quantified via mean ab-
solute error (MAE), mean error (ME), peak signal-to-noise ratio (PSNR), structural
similarity (SSIM), and the dice similarity coefficient (DSC). These metrics are defi-
ned in the following equations:
146
where rCTi and sCTi are the respective HU values of the i-th voxel of sCT and CT, n
is the total number of voxels within the patient outline, Q stands for the highest
observed HU value of sCT and rCT, μsCT and μrCT for the average pixel values of sCT
6
1
and rCT, σsCT and σrCT for the variances of sCT and rCT, δsCT,rCT for the covariance
between sCT and rCT, and L for the dynamic range of sCT and rCT. MAE and ME
were only calculated for voxels within the patient volume, PSNR and SSIM for the
entire image. Air and bone regions were approximately segmented using a simple
thresholding technique (air <400 HU, bone >300 HU) to calculate the DSC bet-
ween these regions on sCTs and rCTs. Finally, an MAE spectrum was obtained by
grouping voxels into bins of 20 HU and calculating MAE for each bin. This allows
one to quantify the MAE of various tissues across a HU range (−1000 to 500 HU).
To assess the difference between 4D-CTs and 4D-sCTs in terms of proton range, we
performed proton radiography (PR) simulations using the PR module of the open-
source toolkit openREGGUI. It features a direct ray-tracing algorithm to simulate
PR acquisitions with a multilayered ionization chamber.36 All PRs were simulated
with a gantry angle of 0 degrees (anterior–posterior direction), an energy of 210
MeV, and a spacing between individual pencil beams of 1 mm in left–right and 2
mm in CC direction (similar to the 4D-CT grid). Range errors between PRs were
calculated according to previous studies.37-39
Range error maps were computed for 4D-sCT-50%, 4D-sCT-average, and the 3D-
sCT. The range error maps were analyzed by calculating mean and standard de-
147
Chapter 6
viation for range probes within the patient outline and reported for each patient
individually. Range probes were divided into two groups: range probes traversing
lung tissue and range probes not traversing lung tissue. This allowed one to isolate
the contribution of lung tissue in terms of HU accuracy and to perform a compari-
son to the remaining tissues. The lung area was selected by first generating a lung
segmentation on the 4D-CT (using a threshold of −600 HU) and then projecting
the 3D lung segmentation along the beam direction (0 degrees). This resulted in a
2D mask that could be applied to the range error maps.
To quantify dosimetric differences between 4D-CT and 4D-sCT, clinically used

PBS-IMPT treatment plans were recalculated on 4D-sCT-0%, 4D-sCT-50%, 4D-
sCT-average, and 3D-sCT. Dose calculations were performed using the Monte Car-
lo dose engine of RayStation 10B (RaySearch, Sweden), with a statistical uncertain-
ty setting of 1% and a dose grid of 3 × 3 × 3 mm3. The resulting dose distributions
were evaluated globally by performing gamma analysis using a 3%/3-mm criterion
and a 10% dose threshold. Local dose differences were investigated for the clinical
target volume (CTV) D98 and organs-at-risk (OARs) (Dmean of lung, heart, and
esophagus). Structures were transferred from the planning CT to the correspon-
ding ground-truth image using the DIR feature of RayStation.
To showcase a real 4D use case of 4D-sCTs, a log-file-based 4D dose reconstruction

was performed for each patient using treatment delivery log files and patient-spe-
cific breathing signals acquired during treatment using a pressure belt system (AN-
ZAI, Japan). For 10 patients, breathing signals were not available retrospectively. In
those cases, instead of recorded pressure belt signals, an artificial breathing signal
with a cycle duration of 4.5 s was used.
The 4D dose reconstruction was done according to the procedure described by Mei-
jers et al.40 Six subplans corresponding to each breathing phase were generated ba-
sed on treatment delivery log files, breathing signals, and the treatment plan. Each
subplan contained only spots delivered in the corresponding breathing phase. For
each subplan, dose was calculated on the respective 4D-sCT and 4D-CT phase. Af-
terward, the subplan doses were warped and accumulated onto a reference phase
(50% phase) using the RayStation treatment planning system. The resulting accu-
mulated dose distributions were evaluated similarly to the other dose distributions
(gamma analysis, local dose differences CTV, lung, heart, and esophagus).
148
2.8 Normal tissue complication probability (NTCP)
Normal tissue complication probability (NTCP) models predict the risk to develop
specific radiotherapy-induced side effects. In the Netherlands, NTCP models are
used to determine which patients benefit most from proton radiotherapy. In the
Dutch national indication protocol for proton therapy,41 NTCP models for radiation
pneumonitis (RP),42 acute esophageal toxicity (AET),43 and 2-year mortality44 are
included. Clinical parameters (e.g., age, smoking status, and tumor location) and
mean dose parameters for OARs (heart, lungs, and esophagus) are used as inputs
for NTCP models. In this study, we used clinically employed NTCP models to trans-
late dosimetric differences into more clinically relevant parameters by calculating
the NTCP difference between 4D-CTs and 4D-sCTs.
6
1
149
Chapter 6
3 Results
3.1 Image quality
Figure 1 presents axial slices of CBCTs (4D-CBCT-0%, 3D-CBCT), sCTs (4D-sCT-0%,

4D-sCT-average, and 3D-sCT), and ground-truth CT images (4D-CT-0%, 4D-CT-
average, and 3D-CT) of patient 2. A visual inspection of Figure 1 clearly shows a
lower image quality of 4D-CBCTs with respect to 3D-CBCTs. However, after con-
verting CBCTs into sCTs, this image quality difference is substantially reduced.
4D-sCTs, especially single-phase images, show less details and more artifacts in
soft tissues and lung than 3D-sCTs, as shown in Figure 2 with HU difference maps
between sCTs and ground-truth CTs.
Figure 1. Overview of axial slices from cone beam–computed tomographies (CBCTs) (4D-0%, 3D), syn-
thetic computed tomographies (sCTs) (4D-0%, 4D-average, 3D) generated using a deep convolutional
neural network compared to the CTs (4D-0%, 4D-average, and 3D) for patient 2.
Image similarity differences were globally quantified via MAE, ME, PSNR, and
SSIM. On average, the lowest MAE was observed for the 3D-sCT and 4D-sCT-aver-
age, with MAEs of 37.2 ± 7.7 and 37.7 ± 6.2 HU. 4D-sCT-0% and 4D-sCT-50% resul-
ted in a higher MAE of 47.5 ± 6.1 and 48.1 ± 6.5 HU. The remaining breathing phases
showed a variation of less than 1 HU, ranging from 47.9 ± 6.3 to 48.6 ± 6.6 HU. Ave-
rage MEs were −3.0 ± 7.2 HU for 4D-sCT-average, 0.1 ± 7.4 HU for 4D-sCT-0%, −4.5
150
± 7.1 HU for 4D-sCT-50%, and −3.1 ± 7.8 HU for the 3D-sCT.
Figure 2. Difference maps for 0% 4D-synthetic computed tomography (sCT), 4D-sCT average, and the
3D-sCT of patient 2.
6
1
Figure 3. (a) Mean absolute error for 0% (blue) and 50% (orange) 4D-synthetic computed tomography
(sCT) phases, 4D-sCT average (green), and the 3D-sCT (red). The dashed lines indicate mean values of
the entire dataset. (b) Mean error for 0% and 50% 4D-sCT phases, 4D-sCT average, and the 3D-sCT
Figure 3 shows bar charts of MAE and ME for 4D-sCT-0%, 4D-sCT-50%, 4D-sCT-
ave, and the 3D-sCT. Similar to MAE, the highest PSNR values were observed for
the 3D-sCT (45.4 ± 1.9 dB) and 4D-sCT-average (45.4 ± 1.9 dB). The 0% and 50%
phase resulted in lower PSNR of 43.3 ± 1.5 and 43.3 ± 1.4 dB, respectively. The remai-
ning breathing phases showed almost no differences with PSNR ranging from 43.1
to 43.3 dB. High uniformity was observed for individual breathing phases, with all
phase images resulting in an SSIM value of 0.93 ± 0.02. The 4D-sCT-average and
3D-sCT resulted in slightly higher SSIM of 0.94 ± 0.02 and 0.95 ± 0.02, respectively.
151
Chapter 6
For bone and air regions, DSC was used to assess the similarity between sCTs and
reference CTs. For the air region, almost no difference was observed between indi-
vidual phase images, average sCT, and the 3D-sCT. 3D-sCT, 4D-sCT-average, and
4D-sCT-00% resulted in a DSC of 0.98 ± 0.01. 4D-sCT-50% and all other 4D-pha-
ses showed a DSC of 0.97. Significantly, lower DSC-values were observed for bones
with values of 0.66 ± 0.08 for the 3D-sCT, 0.65 ± 0.06 for the 4D-sCT-average, 0.60
± 0.07 for 4D-sCT-00%, and 4D-sCT-50%. The remaining breathing phases sho-
wed values between 0.60 and 0.61. Additional charts for PSNR, SSIM, and DSC and
results for all breathing phases are presented in Figure S4 and Table S5.
In Figure 4, the MAE spectrum and an average HU histogram are presented. As ex-
pected, the spectrum confirms the higher MAE of the single-phase 4D-sCT when
compared to 4D-sCT-average and 3D-sCT. All sCT-types show the lowest MAE in
soft tissues between −150 and 50 HU. This is also the HU region with the most vo-
xels. The MAE-spectrum also confirms the larger error for bone structures obser-
ved with the DSC. 3D-sCT shows a peak for very low HUs, whereas 4D-sCTs do not
show this behavior. However, this appears in an HU region with very low number
of voxels and, therefore, has only a negligible impact on the overall MAE of 3D-
sCTs.
Figure 4. Average mean absolute error spectrum for 4D-synthetic computed tomography (sCT)-50%
(orange), 4D-sCT average (green), and the 3D-sCT (red). The corresponding error regions represent
the standard deviation of the dataset. The dashed black line shows an average image histogram.
Figure 5 presents HU-profiles of patient 2 for 4D-sCT 0%, 4D-sCT-average, and

the 3D-sCT compared to HU-profiles of the respective reference images. Figure 5a
shows a profile in anterior–posterior direction, whereas the profile in Figure 5b is
going from right to left. Both profiles were chosen to intersect the CTV. High agree-
152
ment between sCT and CT profiles were observed. In general, synthetic CT profiles
are smoother than reference CT profiles. Tissue boundaries (e.g., soft tissue—lung
or soft tissue—bone), however, are well represented on the sCTs and differences
between profiles are primarily seen within tissues and not at interfaces.
6
1
Figure 5. Comparison of HU profiles of 4D-sCT 0%, average 4D-sCT, and 3D-sCT to their reference
images along the red line indicated on CT scan. The clinical target volume (CTV) is shown in yellow on
the CT scan: (a) right-to-left direction, (b) anterior-to-posterior direction.
Figure 6 presents range error maps of patient 2 for 4D-sCT-50%, 4D-sCT-average,

and the 3D-sCT, overlayed with patient surface and lung contours. Figure 6d shows
a water equivalent thickness map of the same patient. For all sCT types, the largest
range errors could be observed for range probes traversing lung tissue, whereas the
153
Chapter 6
surrounding soft tissues and bones show lower range errors.
Figure 6. Range error maps of 4D-synthetic computed tomography (sCT) 50% (a), 4D-sCT average (b),
and the 3D-sCT (c) for patient 2, accompanied by a water-equivalent thickness map of this patient (d).
The green contour shows the patient outline, the orange contour the lung region.
This is consistent throughout the dataset, as shown in Figure 7, where the mean
and standard deviation of range errors for each patient are depicted, for either the
entire patient (Figure 7a), only lung tissues (Figure 7b), or for everything besides
lung tissues (Figure 7c). This confirms the visual observation of a systematically
higher range error in lung tissues compared to surrounding tissues.
To quantify the range error for the entire dataset, we calculated mean absolute ran-
ge errors (MARE). On average, the 3D-sCT resulted in the lowest overall (for the
entire patient) MARE of 1.5 ± 0.6 mm. 4D-sCT-average and 4D-sCT-50% showed a
slightly higher MARE of 1.6 ± 0.5 and 1.8 ± 0.6 mm, respectively. When only range
probes traversing lung tissue were considered, the average MARE increased for all
sCT-types: for the 3D-sCT to 2.2 ± 1.2 mm, for 4D-sCT-average to 2.3 ± 1.1 mm, and
for 4D-sCT-50% to 2.8 ± 1.4 mm. The opposite effect was observed when excluding
range probes that went through the lungs. MARE decreased to 0.7 ± 0.3 mm for 3D-
sCT, to 0.9 ± 0.3 for 4D-sCT-average, and to 1.0 ± 0.3 for 4D-sCT-50%.
Results from the global dosimetric evaluation using gamma analysis are shown in
Figure 8. The clear differences in image quality were not similarly observable in
the dose calculation accuracy. Average 3%/3-mm gamma pass ratios did not show
large differences between the various sCT types and ranged from 92.3 ± 3.2% of 4D-
sCT-50% to 94.4 ± 2.1% of 4D-sCT-average. 4D-sCT-0% and the 3D-sCT resulted in
154
average pass ratios of 93.2 ± 2.1% and 93.7 ± 2.1%, respectively.
6
1
Figure 7. Results from the range error evaluation for (a) the entire patient, (b) all tissues except lung
tissues, and (c) only lung tissues. Error bars extend to one standard deviation.
Figure 8. Results from the gamma analysis using a 3%/3-mm criterion and a dose threshold of 10%.
Whiskers extend to the min/max observed values.
155
Chapter 6
Figure 9 shows boxplots for local dose differences in the CTV and selected OARs
(lung, heart, and esophagus). Excellent agreement was found for CTV and lung
doses, with differences below 2.4% for all patients and sCT types. Larger dose
differences were observed for heart and esophagus. Due to their smaller size and
location in close proximity to steep dose gradients near the target volume, these
OARs are more sensitive to dose shifts. For all sCT types, the highest dose diffe-
rences were observed in the heart (sCT-3D: 8.4%, sCT-4D-ave: 4.6%, sCT-4D-0%:
6.7%, and sCT-4D-50%: 8.0%). No systematic differences in OAR doses were found
among sCT-4D-0%, sCT-4D-50%, sCT-4D-ave, and the 3D-sCT
Figure 9. Relative dose differences for clinical target volume (CTV), lung, heart, and esophagus calcu-
lated for 4D-synthetic computed tomography (sCT)-0%, 4D-sCT-50%, 4D-sCT-average, and the 3D-
sCT. Whiskers extend to the last value within the interquartile range. Outliers are visualized by a dot.
3.4 4D dose accumulation
A comparison between 4D accumulated doses based on 4D-sCT and 4D-CT resul-

ted in an average gamma pass ratio of 93.7 ± 4.9%. This is similar to average pass ra-
tios of 3D-sCT (93.7%) and 4D-sCT-average (94.4%). In target volumes and OARs,
similar dose differences as in the other sCT types were observed. The heart showed
the largest mean dose difference with up to 4.5%. For esophagus, lung (both mean
dose), and CTV (D98), maximum dose differences did not exceed 1.4%, 2.3%, and
2.4%, respectively. Figure 10 shows the reconstructed doses of 4D-sCT and 4D-CT
for each breathing phase together with the accumulated dose. Due to averaging
effects, the accumulated dose shows noticeably lower dose differences than the in-
dividual breathing phases. Table 1 presents mean 3%/3-mm and 2%/2-mm gamma
pass ratios for subplan doses of each breathing phase.
156
6
1
Figure 10. Rows 1–6 show dose distributions for individual breathing phases (calculated with the corre-
sponding subplans used in the dose accumulation) and the difference between 4D-computed tomography
(CT) and 4D-synthetic computed tomography (sCT) doses. Row 7 presents the accumulated dose and dose
difference. Results are shown for patient 2.
157
Chapter 6
3.5 Normal tissue complication probabilities (NTCP)
OAR doses were used in combination with clinical features to calculate NTCP for
grade ≥2 RP, grade ≥2 AET, and 2-year mortality. Results for each model and sCT
type are presented in Figure 11. As expected from the relatively low-dose differences
observed in OARs, NTCP values showed high agreement between sCTs and ground-
truth CTs. Median values for all sCT types and toxicities were close to zero. Maxi-
mum differences did not exceed ±1.7%. No substantial differences were observed
between the various sCT types.
Figure 11. Normal tissue complication probability (NTCP) differences between synthetic computed
tomographies (sCTs) and CTs for 2-year mortality (2yM), grade ≥2 acute esophageal toxicity (AET),
and grade ≥2 radiation pneumonitis (RP). Whiskers extend to the last value within the interquartile
range. Outliers are visualized by a dot.
158
4 Discussion
In this study, we investigated the use of a DCNN to correct sparse view 4D-CBCTs
and generate 4D-sCTs to enable accurate daily proton dose calculations. Further-
more, a comparison against 3D-sCTs was performed to assess the impact of a 4D-
CBCT reconstruction using the same set of projections as for the 3D-CBCTs.
Visually, 4D-CBCTs showed considerably lower image quality than 3D-CBCTs.

This difference can be attributed to the low number of projections available for a
single phase of the 4D-CBCT (1/6 of the total number of projections). The low num-
ber of projections causes additional sparse view artifacts and leads to the further
decreased image quality of 4D-CBCTs. Our study used a retrospective dataset; hen-
ce, we were limited to the clinically used acquisition protocol, which is currently
optimized for 3D-CBCTs. An advanced iterative 4D reconstruction, in the form of
the MA-ROOSTER algorithm, was used to optimize the 4D-CBCT image quality.
Furthermore, 4D-CBCTs were only reconstructed with six phases, whereas for the
reference 4D-CTs 10 breathing phases were available. The reconstruction with less
phases is a tradeoff between temporal resolution of 4D-CBCTs and a higher image
6
1
quality of each individual phase image. Because we utilized DIR to deform refe-
rence CT images to the CBCTs, the mismatch in breathing phases had a negligible
influence on the results.
In the future, better image quality and/or a larger number of breathing phases
could be achieved by using dedicated 4D-CBCT acquisitions protocols, containing
significantly higher number of projections. However, this comes at the cost of an
increase in imaging dose and acquisition time, which lengthens the time a patient
stays in the treatment room and might be counterproductive for efficient online
adaptive proton therapy workflows.
During sCT generation, the DCNN was able to mostly compensate for the large
image quality differences between 3D- and 4D-CBCTs. Although individual 4D-
sCT phases resulted in a 10-HU higher MAE than 3D-sCTs (47 vs. 37 HU), 4D-sCT-
average had a similar MAE as the 3D-sCT. This outcome suggests that averaging six
low-quality single-phase images results in a similar MAE as a 3D-sCT based on a
high-quality 3D-CBCT.
The evaluation of proton dose calculations on the various synthetic CT types sho-
wed small differences between 3D and 4D images. 3D-sCT and 4D-sCT-average re-
sulted in the highest pass ratios (94.4% and 93.7%). Single-phase 4D-sCTs showed
a marginal difference of 1.4% lower average pass ratio with respect to 3D-sCTs, a
159
Chapter 6
trend that was also observed in local dose differences and NTCP differences, ho-
wever, at a lower magnitude. Highest agreement was observed for CTV and lung
structures. The dosimetric results show a comparable performance of 4D-sCTs
with respect to 3D-sCTs, suggesting the potential suitability of sparse view 4D-
CBCT-based sCTs for proton dose calculations in adaptive proton therapy work-
flows.
The clinical suitability of 4D-sCTs was demonstrated by performing 4D dose re-

constructions using treatment log files and breathing signals. Good agreement was
found between doses calculated on 4D-CT and 4D-sCT for single breathing pha-
ses and the accumulated dose. For the accumulated doses, an average pass ratio of
94.2% was measured. For comparison, the phantom-based proton dose calculation
study by Schmitz et al.31 resulted in higher 3%/3-mm gamma pass ratios between
97.3% and 99.7%. Niepel et al. reported 3%/3-mm pass ratios >95% for a dual beam
plan.29 Bondesson et al. performed a 4D dose accumulation similar to this study
and reported gamma pass ratios of 96.7%. However, all the previous studies utili-
zed a porcine lung phantom to mimic lung tissue properties, and no dosimetric re-
sults of actual patients were reported yet. Further studies with actual patient data
are required to compare these non–deep learning methods to our proposed deep
learning approach.
PR simulations were used to visualize and quantify range errors of sCTs. The higher
HU accuracy of 3D-sCT and 4D-sCT-average was reflected in slightly lower range
errors compared to single-phase 4D-sCTs. PR simulations also revealed that the
main error contribution in thoracic sCTs stems from lung tissue. Range probes
through lung tissue showed range errors roughly three times higher than range
probes not crossing the lungs. This was consistent throughout all 3D- and 4D-sCTs.
In clinical practice, the impact of range errors is mitigated by using multiple beam
directions from target-specific angles. There are multiple causes that could lead to
increased range errors in lung tissue: First, the low-density lung tissue shows low
HU accuracy because it is heavily affected by streaking and scattering artifacts in
CBCTs. For 4D-CBCTs, this effect is amplified due to the low number of projections
used for reconstruction. Second, we observed substantial density variations of lung
tissues between different patients. Such density variations might be represented
on a diagnostic CT scan, but this may not be the case for CBCTs or sCTs, due to the
previously mentioned shortcomings of CBCTs. Third, these large variations in lung
density and the disparity between CTs and CBCTs hinder the mapping of lung HUs
between CBCT and CT performed by the DCNN. A visualization of differences in
lung tissues of CBCTs, CTs, and sCTs, as observed in our dataset, is presented in
Figure S6. The presented results confirm our previous conclusion23 that the lung re-
160
gion is the most challenging part for the deep learning–based sCT generation. The
large influence of lung tissues on sCT HU accuracy suggests that a further impro-
vement of sCTs should focus on this area. Advances of imaging hardware and soft-
ware of proton therapy CBCT systems might be necessary to improve the accuracy
in the representation of lung tissues of sCTs. Besides directly improving CBCTs, uti-
lizing and developing more advanced neural network architectures might further
improve image quality. The network architecture used in this study was similar to
previous 3D-studies and not adapted to 4D input data. The DCNN was solely trai-
ned with 0% phase images for two reasons: First, the difference between indivi-
dual phase images of a single patient is much smaller than the difference between
patients. Second, for the number of patients we wanted to include in the training
set the limited amount of memory of the GPU used in this study (Nvidia GTX 1080
TI) did not allow us to utilize all breathing phases at the same time. Our results
showed that training our type of neural network on a single phase does not have
a negative impact on other phases. However, with a more sophisticated network
architecture, the relation between individual breathing phases or between 3D and
4D images could potentially be exploited and further improve the image quality of
4D-sCTs.
6
1
No real ground-truth images were available for this study because 4D-CBCT and
4D-CT were acquired in two different sessions. To overcome this limitation, DIR
was used to generate reference images by registering individual phases of the sa-
me-day 4D-CT and 4D-CBCT. Our results, therefore, depend on the accuracy of
the DIR itself. A quantification of DIR accuracy is very challenging,45, 46 and in this
work, we relied on visual inspection of deformation results. Furthermore, DIR was
also used during 4D-CBCT reconstruction, dose accumulation, and to transfer
structures from the planning CT to sCTs, which makes this study quite dependent
on DIR accuracy.
Our previous study about thoracic 3D-sCTs proposed a patient-specific correction

method to increase sCT accuracy in lungs.23 This method showed promising results
but relies on computationally expensive DIR of diagnostic CTs. For 4D-sCTs, this
would lead to a substantial prolongation of the sCT conversion because it has to be
performed individually for each breathing phase. Therefore, this patient-specific
correction was considered to be unsuitable for (online) adaptive proton therapy
workflows and was not investigated in this study.
Deep learning–based sCT generation has shown to be sufficiently fast for online
adaptive proton therapy. Conversion times as fast as a few seconds have been re-
ported in literature.24, 47, 48 The DCNN method presented in this work requires ab-
161
Chapter 6
out 2–3 min to generate a full resolution 3D-sCT or single 4D-sCT phase image
(using an NVIDIA GTX 1080Ti GPU). However, with further software and hardware
optimizations, a substantial conversion time reduction could be achieved. The 4D-
sCT generation can also be parallelized because each phase is independent from
the other phases. Therefore, if enough computing hardware is available, 4D-sCT
generation should not be more time-consuming than 3D-sCT generation. What
currently requires significantly more time is the reconstruction of 4D-CBCTs using
the MA-ROOSTER registration algorithm. Reconstructing a six-phase 4D-CBCT
required on average 45 min. To reconstruct 4D-CBCTs with optimal image qua-
lity and clinically feasible characteristics (e.g., reconstruction duration), further
investigation into (alternative) 4D reconstruction algorithms (e.g., MA-ROOSTER,
4D-FDK, and MC-FDK) and their impact on proton dose calculation accuracy is
required. Vendor-side implementation of various reconstruction algorithms in
clinically used systems is desirable for clinical implementation in online adaptive
proton therapy workflows.
The standard deviations of the image similarity metrics (e.g., 7.7 HU for MAE of 4D-
sCT-average), dosimetric metrics (e.g., 2.1% for gamma pass ratio of 4D-sCT-aver-
age), and the variability in the magnitude of range errors (e.g., 0.6 mm for 4D-sCT-
average) indicate that sCT quality varies across patients. Quality control workflows
for sCTs need to be developed to ensure reliable sCT quality, to identify potential
outliers, and to enable the clinical translation of sCTs. The lack of ground-truth
images for daily CBCT-based sCTs complicates efforts to introduce such quality
control tools. Recently, Seller Oria et al. demonstrated the use of in vivo PR acqui-
sitions as a quality control tool for sCTs in head-and-neck cancer patients.49 A si-
milar approach could be taken for lung cancer patients, but as respiratory motion
would have to be considered, the PR image interpretation would be more challen-
ging. An option for automated quality control of synthetic sCTs would be to use
uncertainty measures directly linked to deep neural networks.50, 51
162
5 Conclusion
In this study, we have shown the ability of a DCNN to generate 4D-sCTs based on
4D-CBCTs reconstructed from a sparse view projection set. Despite image quality
differences between 4D- and 3D-sCTs, comparable dosimetric accuracy and NTCP
accuracy was observed. For further improvement of 3D and 4D lung sCTs, the HU
accuracy in lung tissues should be targeted. This study evaluated the image quality
and dosimetric accuracy in reference to same day CT images. Further studies are
required to study the role of 4D-CBCT-based sCTs in clinical adaptive proton the-
rapy workflows and the criteria to determine the clinical suitability of 4D-sCTs on
a patient-specific basis.
Acknowledgment
(KWF research project 11518). 6
1
Langendijk JA is a consultant for International Scientific Advisory Committees of

IBA and RaySearch.
The Department of Radiation Oncology, University Medical Centre Groningen, has

active research agreements with IBA, RaySearch, Siemens, Elekta, Leonie and Mi-
rada.
163
Chapter 6
References
[1] Lim-Reinders S, Keller BM, Al-Ward S, Sahgal A, Kim A. Online adaptive radiation the-
rapy. Int J Radiat Oncol Biol Phys. 2017; 99(4): 994- 1003. https://doi.org/10.1016/j.ij-
robp.2017.04.023
Oncol. 2019; 29(3): 245- 257. https://doi.org/10.1016/j.semradonc.2019.02.007
Br J Radiol. July 2019; 93:20190594. https://doi.org/10.1259/bjr.20190594
[4] Paganetti H, Botas P, Sharp GC, Winey B. Adaptive proton therapy. Phys Med Biol. 2021;
66(22): 10. https://doi.org/10.1088/1361-6560/ac344f
[5] Landry G, Hua CH. Current state and future applications of radiological image guidan-
ce for particle therapy. Med Phys. 2018; 45(11): e1086- e1095. https://doi.org/10.1002/
MP.12744
[6] Schulze R, Heil U, Groß D, et al. Artefacts in CBCT: a review. Dentomaxillofac Radiol. 2011;
40(5): 265- 273. https://doi.org/10.1259/DMFR/30642039
[7] Giacometti V, Hounsell AH, McGarry CK. A review of dose calculation approaches with
cone beam CT in photon and proton therapy. Physica Med. 2020; 76: 243- 276. https://doi.
org/10.1016/J.EJMP.2020.06.017
[8] Newhauser WD, Zhang R. The physics of proton therapy. Phys Med Biol. 2015; 60(8):
R155- R209. https://doi.org/10.1088/0031-9155/60/8/r155
[9] Arai K, Kadoya N, Kato T, et al. Feasibility of CBCT-based proton dose calculation using
a histogram-matching algorithm in proton beam therapy. Physica Med. 2017; 33: 68- 76.
https://doi.org/10.1016/j.ejmp.2016.12.006
[10] Landry G, Nijhuis R, Dedes G, et al. Investigating CT to CBCT image registration for head
and neck proton therapy as a tool for daily dose recalculation. Med Phys. 2015; 42(3):
1354- 1366. https://doi.org/10.1118/1.4908223
[11] Kurz C, Kamp F, Park YK, et al. Investigating deformable image registration and scatter
correction for CBCT-based dose calculation in adaptive IMPT. Med Phys. 2016; 43(10):
5635- 5646. https://doi.org/10.1118/1.4962933
[12] Veiga C, Janssens G, Teng CL, et al. First clinical investigation of cone beam computed to-
mography and deformable registration for adaptive proton therapy for lung cancer. Int J
Radiat Oncol Biol Phys. 2016; 95(1): 549- 559. https://doi.org/10.1016/j.ijrobp.2016.01.055
CBCT image: feasibility study for adaptive proton therapy. Med Phys. 2015; 42(8): 4449-
4459. https://doi.org/10.1118/1.4923179
[14] Hansen DC, Sørensen TS. Fast 4D cone-beam CT from 60 s acquisitions. Phys Imaging
Radiat Oncol. 2018; 5: 69- 75. https://doi.org/10.1016/j.phro.2018.02.004
[15] Andersen AG, Park YK, Elstrøm UV, et al. Evaluation of an a priori scatter correction al-
gorithm for cone-beam computed tomography based range and dose calculations in
proton therapy. Phys Imaging Radiat Oncol. 2020; 16: 89- 94. https://doi.org/10.1016/J.
PHRO.2020.09.014
[16] Thing RS, Bernchou U, Hansen O, Brink C. Accuracy of dose calculation based on artefact
corrected cone beam CT images of lung cancer patients. Phys Imaging Radiat Oncol. 2017;
1: 6- 11. https://doi.org/10.1016/j.phro.2016.11.001
clinical cone beam computed tomography imaging made possible by the combination
of Monte Carlo simulations and a ray tracing algorithm. Acta Oncol (Madr). 2013; 52(7):
1477- 1483. https://doi.org/10.3109/0284186x.2013.813641
164
[18] Spadea MF, Maspero M, Zaffino P, Seco J. Deep learning-based synthetic-CT generation in
radiotherapy and PET: A review. Med. Phys. 2021; 48: 6537- 6566. https://doi.org/10.1002/
mp.15150
[19] Harms J, Lei Y, Wang T, et al. Paired cycle-GAN-based image correction for quantitati-
ve cone-beam computed tomography. Med Phys. 2019; 46(9): 3998- 4009. https://doi.
org/10.1002/mp.13656
2020; 65(9):95002. https://doi.org/10.1088/1361-6560/ab7d54
Phys Med Biol. 2020; 65(23):235036. https://doi.org/10.1088/1361-6560/abb1d6
therapy. Phys Med Biol. 2019; 64(12):125002. https://doi.org/10.1088/1361-6560/ab22f9
[23] Thummerer A, Seller Oria C, Zaffino P, et al. Clinical suitability of deep learning based
synthetic CTs for adaptive proton therapy of lung cancer. Med Phys. 2021; 48(12): 7673-
7684. https://doi.org/10.1002/MP.15333
[24] Landry G, Hansen D, Kamp F, et al. Comparing Unet training with three different datasets
to correct CBCT images for prostate radiotherapy dose calculations. Phys Med Biol. 2019;
[25]
64(3):35011. https://doi.org/10.1088/1361-6560/aaf496
Kurz C, Maspero M, Savenije MHF, et al. CBCT correction using a cycle-consistent genera- 6
1
tive adversarial network and unpaired training to enable photon and proton dose calcu-
lation. Phys Med Biol. 2019; 64(22):225004. https://doi.org/10.1088/1361-6560/AB4D8C
[26] den Otter LA, Chen K, Janssens G, et al. Technical note: 4D cone-beam CT reconstruction
from sparse-view CBCT data for daily motion assessment in pencil beam scanned proton
therapy (PBS-PT). Med Phys. 2020; 47(12): 6381- 6387. https://doi.org/10.1002/MP.14521
[27] Shieh CC, Gonzalez Y, Li B, et al. SPARE: sparse-view reconstruction challenge for
4D cone-beam CT from a 1-min scan. Med Phys. 2019; 46(9): 3799- 3811. https://doi.
org/10.1002/MP.13687
[28] Mory C, Janssens G, Rit S. Motion-aware temporal regularization for improved 4D co-
ne-beam computed tomography. Phys Med Biol. 2016; 61(18): 6856- 6877. https://doi.
org/10.1088/0031-9155/61/18/6856
[29] Niepel K, Kamp F, Kurz C, et al. Feasibility of 4DCBCT-based proton dose calculation:
an ex vivo porcine lung phantom study. Z Med Phys. 2019; 29: 249- 261. https://doi.
org/10.1016/j.zemedi.2018.10.005
[30] Bondesson D, Meijers A, Janssens G, et al. Anthropomorphic lung phantom based vali-
dation of in-room proton therapy 4D-CBCT image correction for dose calculation. Z Med
Phys. 2022; 32(1): 74- 84. https://doi.org/10.1016/J.ZEMEDI.2020.09.004
[31] Schmitz H, Rabe M, Janssens G, et al. Validation of proton dose calculation on scatter
corrected 4D cone beam computed tomography using a porcine lung phantom. Phys Med
Biol. 2021; 66(17):175022. https://doi.org/10.1088/1361-6560/ac16e9
[32] Madesta F, Sentker T, Gauer T, Werner R. Self-contained deep learning-based boosting
of 4D cone-beam CT reconstruction. Med Phys. 2020; 47(11): 5619- 5631. https://doi.
org/10.1002/MP.14441
[33] Rit S, Vila Oliva M, Brousmiche S, Labarbe R, Sarrut D, Sharp GC. The Reconstruction
Toolkit (RTK), an open-source cone-beam CT reconstruction toolkit based on the In-
sight Toolkit (ITK). J Phys Conf Ser. 2014; 489(1):012079. https://doi.org/10.1088/1742-
6596/489/1/012079
165
Chapter 6
Ther. 2015; 2(2): 404- 414. https://doi.org/10.14338/IJPT-14-00024.1
[35] Spadea MF, Pileggi G, Zaffino P, et al. Deep convolution neural network (DCNN) multi-
therapy. Int J Radiat Oncol Biol Phys. 2019; 105(3): 495- 503. https://doi.org/10.1016/j.ij-
robp.2019.06.2535
zation chamber. Phys Med Biol. 2016; 61(11): 4078- 4087. https://doi.org/10.1088/0031-
9155/61/11/4078
[37] Farace P, Righetto R, Deffet S, Meijers A. Vander Stappen F. Technical note: a direct ray-
tracing method to compute integral depth dose in pencil beam proton radiography
with a multilayer ionization chamber. Med Phys. 2016; 43(12): 6405- 6412. https://doi.
org/10.1118/1.4966703
[38] Meijers A, Seller Oria C, Free J, Langendijk JA, Knopf AC, Both S. Technical Note: first
report on an in vivo range probing quality control procedure for scanned proton beam
therapy in head and neck cancer patients. Med Phys. 2021; 48(3): 1372- 1380. https://doi.
org/10.1002/mp.14713
[39] Oria CS, Marmitt GG, Both S, Langendijk JA, Knopf AC, Meijers A. Classification of va-
rious sources of error in range assessment using proton radiography and neural net-
works in head and neck cancer patients. Phys Med Biol. 2020; 65(23):235009. https://doi.
org/10.1088/1361-6560/ABC09C
[40] Meijers A, Jakobi A, Stützer K, et al. Log file based dose reconstruction and accumulation
for 4D adaptive pencil beam scanned proton therapy in a clinical treatment planning sys-
tem: implementation and proof-of-concept. Med Phys. 2019; 46(3): 1140- 1149. https://
doi.org/10.1002/mp.13371. Published online 2019.
[41] Nederlandse Vereiniging voor Radiotherapie en Oncologie. Rapporten. https://nvro.nl/
images/documenten/rapporten/LIPP_longen_final_01122019.pdf. Accessed August 23,
2022.
[42] Appelt AL, Vogelius IR, Farr KP, Khalil AA, Bentzen SM. Towards individualized dose cons-
traints: adjusting the QUANTEC radiation pneumonitis model for clinical risk factors.
Acta Oncol (Madr). 2014; 53(5): 605- 612. https://doi.org/10.3109/0284186X.2013.820341
[43] Dankers FJWM, Wijsman R, Troost EGC, et al. External validation of an NTCP model for
acute esophageal toxicity in locally advanced NSCLC patients treated with intensity-
modulated (chemo-)radiotherapy. Radiother Oncol. 2018; 129(2): 249- 256. https://doi.
org/10.1016/j.radonc.2018.07.021
[44] Defraene G, Dankers FJWM, Price G, et al. Multifactorial risk factors for mortality after
chemotherapy and radiotherapy for non-small cell lung cancer. Radiother Oncol. 2020;
152: 117- 125. https://doi.org/10.1016/j.radonc.2019.09.005
[45] Nenoff L, Ribeiro CO, Matter M, et al. Deformable image registration uncertainty for in-
ter-fractional dose accumulation of lung cancer proton therapy. Radiother Oncol. 2020;
147: 178- 185. https://doi.org/10.1016/J.RADONC.2020.04.046
[46] Amstutz F, Nenoff L, Albertini F, et al. An approach for estimating dosimetric uncertain-
ties in deformable dose accumulation in pencil beam scanning proton therapy for lung
cancer. Phys Med Biol. 2021; 66(10):105007. https://doi.org/10.1088/1361-6560/ABF8F5
[47] Kida S, Nakamoto T, Nakano M, et al. Cone beam computed tomography image quali-
ty improvement using a deep convolutional neural network. Cureus. 2018; 10(4):e2548.
https://doi.org/10.7759/cureus.2548. Published online 2018.
[48] Maspero M, Houweling AC, Savenije MH, et al. A single neural network for cone-beam
Phys Imaging Radiat Oncol. 2020; 14: 24- 31. https://doi.org/10.1016/j.phro.2020.04.002
[49] Seller Oria C, Thummerer A, Free J, et al. Range probing as a quality control tool for CBCT-
166
based synthetic CTs: in vivo application for head and neck cancer patients. Med Phys.
2021; 48(8): 4498- 4505. https://doi.org/10.1002/mp.15020
[50] Maspero M, Bentvelzen LG, Savenije MHF, et al. Deep learning-based synthetic CT gene-
ration for paediatric brain MR-only photon and proton radiotherapy. Radiother Oncol.
2020; 153: 197- 204. https://doi.org/10.1016/j.radonc.2020.09.029
[51] van Harten LD, Wolterink JM, Verhoeff JJC, Išgum I. Automatic online quality control of
synthetic CTs. Proc. SPIE 11313, Medical Imaging 2020: Image Processing. 2020:11313.
https://doi.org/10.1117/12.2549286
6
1
167
Chapter 6
S1 Neural network architecture
Figure S1 Neural network architecture used in this study
S2 Breathing rates
TRAINING SET
BREATHING CYCLES/MIN
P1 20.72 ± 1.65
P2 22.98 ± 2.89
P3 23.33 ± 5.40
P4 20.87 ± 3.65
P5 21.98 ± 4.33
P6 16.43 ± 5.88
P7 19.29 ± 2.62
P8 16.22 ± 2.08
P9 16.45 ± 2.00
P10 14.54 ± 1.01
P11 17.06 ± 1.83
P12 11.35 ± 1.22
P13 26.43 ± 4.36
P14 18.46 ± 2.93
P15 15.18 ± 3.77
P16 16.12 ± 1.82
P17 13.01 ± 2.90
P18 20.87 ± 2.48
P19 18.04 ± 2.87
P20 23.54 ± 4.47
P21 22.79 ± 4.95
P22 15.98 ± 1.58
P23 22.80 ± 4.04
P24 15.19 ± 2.67
P25 17.21 ± 2.12
P26 20.14 ± 1.65
P27 17.84 ± 2.50
168
P28 14.82 ± 2.02

P29 22.00 ± 2.99
P30 19.34 ± 3.83
MEAN 18.70 ± 2.95
TEST SET
BREATHING CYCLES/MIN
P1 18.70 ± 1.16
P2 18.39 ± 2.37
P3 13.52 ± 1.70
P4 14.69 ± 2.81
P5 18.13 ± 1.68
P6 17.16 ± 1.55
P7 13.35 ± 1.90
P8 16.46 ± 1.98
P9 16.23 ± 2.52
P10 13.37 ± 2.56
P11 19.12 ± 1.50
P12
P13
14.29
23.49
±
±
2.78
3.47 6
1
P14 18.19 ± 3.04
P15 20.77 ± 2.03
MEAN 17.06 ± 2.20
Table S2 Breathing rates for training and testing dataset. Breathing rates were calculated based on
pressure belt signal used during treatment delivery.
Figure S2 Exemplary breathing signal extracted from the pressure belt system for patient 2. Each red
line indicates a peek in the breathing signal.
169
Chapter 6
S3 Reconstruction and correction workflow
Figure S3 Figure depicting the workflow of 3D/4D-CBCT reconstruction and 3D/4D-sCT generation
using deep convolutional neural networks.
170
S4 Figures for PSNR, SSIM and DSC
6
1
Figure S4a (top) PSNR for 0% (blue) and 50% (orange) 4DsCT phases, 4DsCT average (green) and the
3DsCT (red). The dashed lines indicate mean values of the entire dataset. (bottom) SSIM of bones for
0% and 50% 4D-sCT phases, 4D-sCT average and the 3D-sCT.
Figure S4b (top) DSC of air cavities for 0% (blue) and 50% (orange) 4DsCT phases, 4DsCT average
(green) and the 3DsCT (red). The dashed lines indicate mean values of the entire dataset. (bottom)
DSC of bones for 0% and 50% 4D-sCT phases, 4D-sCT average and the 3D-sCT.
171
Chapter 6
S5 Results for all breathing phases
MAE
4DsCT 4DsCT 4DsCT 4DsCT 4DsCT 4DsCT 4DsCT 3DsCT
00% 16% 33% 50% 66% 83% ave
P1 41.9 44.8 44.0 42.8 43.1 44.0 31.6 46.3
P2 45.1 44.4 43.0 43.3 43.7 44.6 32.0 29.0
P3 55.7 58.7 57.0 56.3 57.3 58.3 45.3 40.0
P4 53.3 55.4 55.3 54.6 54.0 53.8 45.4 43.4
P5 50.8 48.9 49.1 49.3 49.4 50.1 40.5 38.3
P6 40.4 40.5 40.8 40.7 40.7 40.4 30.9 29.6
P7 57.9 57.2 57.4 57.0 57.6 57.0 45.1 50.1
P8 56.0 57.1 58.0 57.2 57.4 57.4 50.7 49.7
P9 44.9 43.8 43.4 42.8 43.8 44.3 32.1 31.5
P10 49.8 52.9 51.5 52.6 52.8 51.7 36.8 35.2
P11 41.6 41.2 40.6 40.5 41.7 41.4 31.2 29.8
P12 38.6 39.7 39.8 41.6 42.6 40.4 31.0 26.0
P13 40.6 43.3 41.3 40.8 41.0 42.7 32.5 32.2
P14 44.2 44.6 43.9 43.5 43.8 43.9 34.4 30.3
P15 49.6 51.0 51.1 51.0 51.6 50.9 37.0 39.0
mean 47.3 48.2 47.8 47.6 48.0 48.1 37.1 36.7

std 6.2 6.4 6.5 6.4 6.3 6.2 6.4 7.6
ME
00% 16% 33% 50% 66% 83% ave
P1 2.3 9.1 8.7 8.6 6.1 5.4 6.6 -9.8
P2 0.1 3.5 2.7 4.2 4.8 5.4 3.5 9.3
P3 -3.4 1.6 -1.9 0.8 1.0 0.9 -0.1 1.3
P4 13.6 17.8 16.3 15.0 12.8 12.5 14.7 10.4
P5 -3.1 4.9 3.3 3.7 0.0 0.0 1.5 9.1
P6 -3.0 1.1 0.4 1.2 -0.8 -0.7 -0.3 3.4
P7 -8.9 -3.8 -3.3 -3.0 -3.9 -4.6 -4.5 -12.6
P8 1.7 4.8 5.5 9.1 7.8 5.7 5.8 14.7
P9 -5.6 -1.1 0.1 -0.5 -0.4 -1.2 -1.4 -7.0
P10 -2.3 1.7 1.9 5.1 -3.9 -1.2 0.9 2.1
P11 -7.0 -4.2 -5.9 -3.8 -3.6 -3.3 -4.5 -2.6
P12 -3.9 0.2 1.3 3.5 4.0 1.6 1.2 0.0
P13 7.7 10.0 9.7 9.5 9.1 9.8 9.4 3.6
P14 1.5 5.3 3.0 5.5 0.6 1.1 2.9 7.6
P15 -1.0 1.8 2.2 3.6 1.9 2.5 1.8 14.6
mean -0.8 3.5 2.9 4.2 2.4 2.2 2.5 2.9

std 5.5 5.4 5.3 4.8 4.8 4.6 4.9 8.1
172
PSNR
00% 16% 33% 50% 66% 83% ave
P1 45.1 44.4 44.4 44.9 44.9 44.4 47.6 43.9
P2 42.8 42.9 43.3 43.3 43.2 42.9 46.1 46.7
P3 43.2 42.7 43.0 43.3 43.2 42.9 45.2 46.2
P4 43.4 43.0 43.0 43.1 43.3 43.3 44.9 45.3
P5 41.3 41.9 41.7 41.7 41.7 41.6 43.3 44.5
P6 43.2 43.2 43.2 43.2 43.2 43.3 45.8 46.1
P7 42.4 42.4 42.4 42.4 42.4 42.5 44.5 43.6
P8 41.1 41.0 40.9 40.9 41.2 41.0 41.6 42.1
P9 42.8 42.9 43.0 43.2 43.0 42.9 45.6 45.9
P10 43.3 42.5 43.0 43.1 42.7 43.1 46.3 46.3
P11 44.8 45.2 45.2 45.3 44.8 44.9 47.4 46.8
P12 46.9 46.7 46.7 46.1 46.0 46.5 49.0 50.2
P13 43.9 43.5 43.8 44.1 44.0 43.5 46.0 46.1
P14 43.5 43.6 43.8 43.9 43.8 43.8 45.9 47.3
P15 42.9 42.8 42.7 42.8 42.6 42.8 45.3 46.0
mean
std
43.4
1.4
43.2
1.3
43.3
1.3
43.4
1.3
43.3
1.2
43.3
1.3
45.6
1.7
45.8
1.8
6
1
SSIM
00% 16% 33% 50% 66% 83% ave
P1 0.945 0.942 0.943 0.946 0.945 0.943 0.954 0.956
P2 0.913 0.914 0.916 0.916 0.915 0.914 0.929 0.944
P3 0.943 0.943 0.944 0.946 0.946 0.944 0.952 0.965
P4 0.942 0.941 0.942 0.943 0.943 0.942 0.948 0.961
P5 0.922 0.923 0.922 0.922 0.922 0.922 0.933 0.938
P6 0.897 0.897 0.897 0.897 0.897 0.897 0.910 0.919
P7 0.939 0.939 0.939 0.939 0.938 0.939 0.946 0.951
P8 0.905 0.905 0.905 0.899 0.904 0.904 0.905 0.925
P9 0.913 0.912 0.914 0.914 0.913 0.913 0.929 0.925
P10 0.941 0.938 0.940 0.941 0.938 0.940 0.952 0.960
P11 0.949 0.949 0.950 0.950 0.949 0.948 0.957 0.963
P12 0.960 0.960 0.960 0.960 0.959 0.960 0.967 0.972
P13 0.931 0.929 0.931 0.932 0.932 0.930 0.943 0.954
P14 0.937 0.938 0.939 0.940 0.940 0.939 0.948 0.957
P15 0.943 0.943 0.943 0.943 0.942 0.942 0.954 0.961
mean 0.93 0.93 0.93 0.93 0.93 0.93 0.94 0.95

std 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
173
Chapter 6
DSC air
00% 16% 33% 50% 66% 83% ave
P1 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.98
P2 0.98 0.98 0.98 0.98 0.98 0.98 0.99 0.98
P3 0.96 0.96 0.96 0.96 0.96 0.96 0.97 0.97
P4 0.97 0.96 0.96 0.96 0.97 0.97 0.97 0.97
P5 0.94 0.94 0.93 0.94 0.93 0.94 0.95 0.96
P6 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98
P7 0.97 0.97 0.97 0.97 0.97 0.97 0.98 0.97
P8 0.94 0.95 0.94 0.95 0.94 0.94 0.95 0.96
P9 0.98 0.98 0.98 0.98 0.98 0.98 0.99 0.98
P10 0.98 0.98 0.98 0.98 0.97 0.98 0.98 0.98
P11 0.98 0.99 0.99 0.98 0.98 0.98 0.99 0.99
P12 0.99 0.99 0.99 0.98 0.98 0.99 0.99 0.99
P13 0.98 0.98 0.98 0.98 0.99 0.98 0.99 0.99
P14 0.98 0.98 0.98 0.98 0.98 0.98 0.99 0.98
P15 0.98 0.98 0.98 0.98 0.98 0.98 0.99 0.99
mean 0.98 0.97 0.97 0.97 0.97 0.97 0.98 0.98

std 0.01 0.01 0.02 0.01 0.02 0.01 0.01 0.01
DSC bone
00% 16% 33% 50% 66% 83% ave
P1 0.61 0.59 0.59 0.61 0.61 0.59 0.68 0.69
P2 0.65 0.66 0.66 0.65 0.64 0.64 0.71 0.76
P3 0.69 0.68 0.69 0.69 0.70 0.70 0.71 0.78
P4 0.71 0.70 0.70 0.71 0.73 0.72 0.73 0.73
P5 0.57 0.58 0.58 0.58 0.58 0.57 0.59 0.58
P6 0.53 0.55 0.53 0.52 0.52 0.52 0.59 0.58
P7 0.52 0.53 0.53 0.52 0.51 0.52 0.56 0.59
P8 0.59 0.59 0.58 0.60 0.61 0.60 0.60 0.51
P9 0.61 0.61 0.62 0.63 0.64 0.63 0.64 0.76
P10 0.57 0.56 0.58 0.61 0.58 0.58 0.69 0.68
P11 0.58 0.62 0.62 0.62 0.59 0.59 0.67 0.69
P12 0.64 0.64 0.62 0.60 0.61 0.64 0.68 0.75
P13 0.61 0.62 0.62 0.62 0.60 0.60 0.63 0.63
P14 0.63 0.62 0.62 0.63 0.66 0.65 0.67 0.75
P15 0.42 0.43 0.40 0.39 0.43 0.44 0.54 0.58
mean 0.60 0.60 0.60 0.60 0.60 0.60 0.65 0.67

std 0.07 0.06 0.07 0.07 0.07 0.07 0.06 0.08
Table S5 Results of MAE, ME, PSNR, SSIM and DSC evaluation for each breathing phase, the 4D-sCT-
average and the 3D-sCT
174
S6 Lung density variations
6
1
Figure S6 Comparison of lung tissue between 4D-CBCT, 4D-sCT and 4D-CT for patient 3, 5, 10 and 14.
The red boxes indicate regions of interest that show less details on the sCT than on the reference CT.
175
Chapter 6
176
Summarizing Discussion
Chapter 7
7
1
177
Chapter 7
CBCT-based dose calculations play a pivotal role in daily adaptive proton thera-
py strategies that aim to account for changes in patient anatomy throughout the
treatment. Daily, high-quality images allow to assess the dosimetric impact of po-
tentially occurring anatomical changes and establish a foundation for treatment
plan adaptations. It is not feasible to acquire daily CT images due to the additional
dose burden and time constraints. As an alternative, CBCTs acquired for patient
positioning provide a daily representation of the patients’ anatomy in treatment
position but lack accurate HU for (proton) dose calculations. Therefore, CBCTs
have to be corrected to facilitate proton dose calculations for adaptive proton the-
rapy workflows. This thesis investigated a deep learning technique, among other
approaches, to correct CBCTs and enable CBCT-based proton dose calculations for
head and neck, and lung cancer patients.
7.1 Comparison of CBCT correction methods
A variety of CBCT correction methods for radiotherapy applications have been pro-
posed in the literature. However, with the limitation of mainly focusing on the sui-
tability for photon dose calculations1. The increased demands of proton therapy on
image quality warranted an investigation into CBCT correction methods specifical-
ly for proton therapy applications. Therefore, Chapter 2 aimed to identify a met-
hod suitable to correct CBCTs for accurate proton dose calculations. A comparison
of three CBCT correction methods was performed, including deformable image,
analytical image-based correction and a deep learning (DL) method. A cohort of
head and neck cancer patients, treated with proton therapy at our institution, was
used to evaluate these methods. In terms of image quality and dosimetric accuracy,
the best results were observed for the deep learning and deformable image regis-
tration-based method. Mean absolute errors (MAE) of 37 HU and 44 HU were ob-
served for the DL- and DIR-based methods. The AIC method showed significantly
higher errors with a MAE of 77 HU. The dosimetric evaluation resulted in 3%/3mm
gamma pass rates of 99.3 %, 98.7% and 97.3% for DL, DIR and AIC method respec-
tively.
Although the metrics showed comparable results for DL and DIR based methods,
specific strength and weaknesses are associated with each method. Fast CBCT cor-
rection is essential for online adaptive proton therapy workflows, where the aim is
to assess and adapt the treatment plan while the patient is in the treatment room.
Once DL-models are trained, DL provides the fastest CBCT correction. Conversion
times for our neural network implementation were about 1-2 minutes for a full
178
CBCT. However, in a clinical environment a significant acceleration is possible by

using more powerful graphical processing units (GPUs) or optimizing the neural
network code for faster conversion. Conversion times as low as a few seconds for an
entire image stack were reported in literature2–4. Our implementation of DIR-ba-
sed CBCT correction, which resulted in highly accurate deformations, used an open
source diffeomorphic morphons DIR algorithm which required up to 45 minutes.
Although time requirements for DIR could be reduced by further optimizing regis-
tration parameters or using a different DIR algorithm, it will not be possible with
conventional DIR algorithms to reach the speed of DL-based conversion. Recently,
deep learning was also investigated to be utilized for DIR and showed a significant
speed up over conventional DIR algorithms5–7. Deep learning-based DIR was ho-
wever not yet investigated for CBCT correction in adaptive radiotherapy.
Another advantage of DL is that no prior CT is required to correct a CBCT and gene-

rate a synthetic CT. DIR on the other hand always requires such a prior image which
is deformed to the daily CBCT anatomy. In clinical practice a gap of up to several
weeks can exist between CBCT and CT acquisitions. Significant anatomical chan-
ges can occur in this time span, which might affect the DIR results since large ana-
tomical differences between CT and CBCT might not always be correctly modelled
(e.g., tumor shrinkage, (partial) lung collapse, nasal cavity filling). An example of
this scenario is presented in Figure 1. It shows a patient with the sinusoidal cavity
7
1
appearing empty on the CBCT and filled on the planning CT. In this situation DIR is
not able to accurately represent the patient anatomy of the day since the planning
CT, with the filled sinus, is deformed to the CBCT. On the contrary, DL successfully
shows an empty cavity on the sCT, given that it relies exclusively on the CBCT.
CBCT sCT DL sCT DIR CT same-day
Figure 1. Visualization of sCT failure using a DIR-based method. Region of interest indicated by red
rectangle. Sinusoidal cavity was filled on day of planning CT acquisition, but not on day of CBCT. DIR
cannot model an empty cavity. Deep learning solely relies on CBCT and correctly shows the empty
cavity.
The main drawback of DL-based methods is that large amounts of training data are
to accurately learn the relationship between input (CBCT) and target images (CT).
As a result, the outcomes of trained neural networks are conditioned by the images
represented in the training data set. Images that differ significantly from images
179
Chapter 7
included in the training set are referred to as out-of-distribution data. When such
out-of-distribution data is presented to neural networks, unexpected outputs
might be observed. In the case of CBCT-to-CT conversion, out-of-distribution data
could be images acquired with different imaging protocols, showing rare features
(e.g., implants), from a different anatomical region, or using unusual patient posi-
tioning. To improve the robustness of DL-solutions, larger and more comprehen-
sive datasets should be used, so that a wider variety of cases are represented. Cur-
rently, most studies concerning DL-based CBCT image correction, including the
work presented in this thesis, utilize imaging data from a single imaging system, a
single institution and for a single anatomical region. More variety in training data-
sets could potentially also increase the robustness of DL-based techniques. Maspe-
ro et al. recently showed that a single neural network can be trained with a dataset
containing CBCTs of breast, pelvis and lung cancer patients without losing image
quality or dose calculation accuracy compared to individual neural networks for
each anatomy8. For DIR and AIC methods, complications due to dataset size, vary-
ing anatomical location or imaging system do not occur, since a prior image (usu-
ally a planning CT) is utilized for the CBCT correction.
All chapters of this thesis make use of a U-net like deep convolutional neural net-
work, originally described by Spadea et al9. This neural network architecture has
shown to generate accurate synthetic CTs based on MRs and CBCTs9,10. In litera-
ture, multiple variations of this network architecture and more advanced archi-
tectures, such as generative adversarial neural networks (GANs), have been inves-
tigated11–14. However, there is no clear evidence yet about which neural network
architecture performs best for applications in photon or proton radiotherapy. Whi-
le there is a consensus about evaluation metrics for synthetic CTs, there is a lack of
shared, public datasets and currently each study uses its own dataset for training
and testing. This makes intercomparison of DL-based methods quite challenging
since the results are highly dependent on the training data. In the future, public
datasets can be used to overcome this limitation and allow a fair and proper com-
parison of neural network architectures for CBCT correction.
The AIC method showed the lowest image quality and dose calculation accuracy.
However, after the investigation present in Chapter 2, the method was further
improved and is now available within the commercial treatment planning system
RayStation 11B (RaySearch, Sweden). Preliminary results showed improved image
quality and dose calculation accuracy. Having the CBCT correction available in a
commercial treatment planning system has the advantage that end-users do not
have to develop and implement it in-house and it can easily be integrated in exis-
ting workflows. This might be beneficial for smaller institutions that do not have
180
the staff and resources to develop and train their own deep-learning models.
Due to the above-mentioned advantages of DL-techniques for CBCT correction,

the following chapters focus mainly on DL approaches. Currently CBCT is the most
suitable in-treatment room imaging modality to enable daily adaptive proton the-
rapy workflows. In the field of photon radiotherapy, the recent integration of MR-
scanners into linear accelerators facilitate daily MR-based adaptive radiotherapy.
For proton therapy there are currently no on-board or in-room MR scanners avai-
lable. In the future however, in-room MRs might become available and provide a
basis for daily adaptive proton therapy workflows, similar to CBCTs today. First
in-room MR prototypes have shown promising results for application but are still
in an experimental state and not available for clinical applications yet15.
In Chapter 3, we utilize MR-images acquired for treatment planning and the neu-
ral network architecture presented in Chapter 2 to compare CBCT- and MR-based
synthetic CT generation. CBCT- and MR-based synthetic CTs have two distinct ap-
plications in the proton therapy workflow. CBCT-based synthetic CTs enable daily
proton dose calculations for treatment plan adaptation decisions. MR-based syn-
thetic CTs on the other hand, are currently mainly intended to eliminate the need
for planning CT images, since no daily MRs are available yet. The results presented
in Chapter 3 showed higher image quality for CBCT-based sCTs than for MR-based
7
1
sCTs, with a MAE of 40 HU and 60 HU respectively. The dose calculation accuracy,
quantified with a 2%/2mm gamma analysis, showed higher accuracy for CBCT-ba-
sed sCTs than for MR-based sCTs, with pass rates of 96.1% and 93.3% respectively.
The higher accuracy of CBCT-based sCTs is likely due to higher similarity between
input CBCT and target CT image. MR and CT are created by two very different phy-
sical contrast mechanisms, which makes the image synthesis more challenging
than between CBCT and CT. Chapter 3 demonstrates the high flexibility of neu-
ral network architectures, as we utilized the same neural network architecture for
CBCT- and MR-based synthetic CTs with no modifications besides the use of diffe-
rent training data sets.
7.2 Synthetic CTs for the thorax
A study by Hoffmann et al showed that for a large proportion of advanced lung

cancer patients, treated with intensity modulated proton therapy, treatment ad-
aptation is mandatory to ensure target coverage16. This demands the use of fre-
quent thoracic imaging to decide on treatment plan adaptations. In Chapter 5,
the application of the previously investigated deep convolutional neural network
architecture for CBCTs of thoracic cancer patients was investigated. As an initial
181
Chapter 7
step, clinically used CBCTs, acquired for patient positioning, were used to investi-
gate the proton dose calculation accuracy for a static scenario (3D). Results showed
that overall proton dose calculations were less accurate in the thorax than in the
head and neck region. Due to more inhomogeneous tissues, respiratory motion and
deep-seated tumors, proton dose calculations are more challenging in the thorax.
The global dose evaluation, by performing a gamma analysis with a 3%/3mm pas-
sing criteria, showed a mean pass ratio of 93.7% for lung cancer patients, inferior to
the 98.7% rate observed for head and neck cancer patients.
Proton radiography simulations were performed to compare sCTs to same day re-
peat CTs in terms of proton range. The results revealed a significantly larger range
error for proton beams traversing lung tissues (2.1 ± 0.9 mm), which was twice as
large as for beams traversing all other tissues except lung (0.9 ± 0.6 mm).
Since lung tissue was identified as a major contributor to the increased range er-
ror and decreased dose calculation accuracy of sCTs in the thorax region, a patient
specific correction method using a prior CT was proposed in Chapter 5. This cor-
rection method increased mean pass rates from 93.7% to 96.8%, which is closer to
those achieved in head and neck patients. Range errors in beams traversing lung
tissue were reduced from 2.1 ± 0.9 mm to 1.6 ± 0.6 mm. The correction technique
however relies on deformable image registration between a prior CT and the sCT.
Although the deformation parameters were tuned to minimize the processing
time, the patient specific correction still required about 7-10 minutes per sCT. As
a result, this patient specific correction approach is more suitable for offline APT
than online APT. The use of a prior CT in the patient specific correction technique
eliminates the advantage of DL-based synthetic CTs being independent of prior
images. Proton radiography simulations in Chapter 5 not only allowed to identify
the lung tissue as a cause of the decreased image quality, but also provided a visual
representation of these deficiencies to localize such errors. The evaluation of dose-
volume-histogram parameters showed high agreement between synthetic CT and
same-day reference CT suggesting clinical suitability of synthetic CTs for thoracic
cancer patients.
The thorax region is heavily influenced by respiratory motion. Static 3D imaging

cannot represent the patient’s quasi-periodical moving anatomy and only provides
an average representation of the patients anatomy. To capture breathing motion,
4D-imaging is routinely applied in radiotherapy17. Chapter 6 extended the work
on thoracic synthetic CTs of Chapter 5 to a dynamic scenario. To generate 4D-sCTs,
several challenges had to be overcome. First, our institution currently does not use
4D-CBCTs for patients receiving proton therapy. Clinically, motion is only asses-
sed at planning stage, with a 4D planning CT, and throughout the treatment by
182
acquiring weekly 4D repeat CTs. In order to reconstruct 4D-CBCTs, raw projection

data was collected directly from the treatment machine and used a state-of-the-
art iterative reconstruction algorithm (MA-ROOSTER) to generate 4D-CBCTs from
projection data acquired for 3D-CBCTs18,19. The image quality of 4D-CBCTs was
significantly lower than 3D-CBCTs. 4D-CBCTs image quality improvements are
possible with dedicated acquisition protocols. Second, the low number of projecti-
ons led to the reconstruction of only six breathing phases for 4D-CBCTs, compared
to the ten breathing phases used for the 4D-CT reconstruction. This prompted the
use of deformable image registration to generate reference images to match the
closest 4D-CT phases with the 4D-CBCT phases. And third, due to memory limit-
ations, it was not feasible to adjust the neural network architecture to the time-re-
solved 4D imaging data. A similar neural network as used for 3D thoracic sCTs was
trained, meaning that the neural network was trained with 2D-slices from a single
breathing phase (0%-phase).
Despite the large image quality difference between 3D and 4D-CBCTs, 4D-sCTs all-
owed reasonably accurate proton dose calculations. A single phase of the 4D-sCT
showed an average 3%/3mm gamma pass rate of 93.2 ± 2.1 % (0% phase), which is
similar to the pass rate observed for 3D-sCTs (93.7 ± 2.1 %) in this patient cohort.
3D-CBCT 4D-CBCT 7
1
3D-sCT 4D-sCT
Figure 2. Figure illustrating the image quality difference between 3D- and 4D-CBCTs and the corre-
sponding synthetic CTs. Despite large CBCT image quality differences, synthetic CTs show comparable
image quality.
183
Chapter 7
Figure 2 illustrates the image quality difference between 3D- and 4D-CBCT, re-
constructed from the same set of projections, and the resulting 3D-sCT and 4D-sCT
generated by the respective neural networks. This example illustrates the general
ability of neural networks to extract useful information from highly deteriorated
images.
Currently, our implementation of 4D-image synthesis is relatively naive in the sen-

se that it treats each phase image as an individual 3D-image. The existing relation
between phase images is not exploited on any level for further image quality im-
provements. Furthermore, generating a 4D-sCT is currently too time consuming
for an online implementation in adaptive proton therapy treatments. Reconstruc-
ting the 4D-CBCT takes approximately 45 minutes, followed by converting the
phase images which requires 2-3 minutes per phase. Further (vendor-side) impro-
vements in sparse-view 4D-image reconstruction might be required to improve the
4D-sCT generation time to a level suitable for online adaptive proton therapy.
An offline use of daily 4D-sCTs is retrospective proton dose reconstructions20.
Chapter 6 includes a dose reconstruction using 4D-sCTs, patient specific breat-
hing signals, and a log-file interpreter. Based on the treatment log-files, the origi-
nal treatment plan was split into individual treatment plans containing only spots
delivered in the respective breathing phases. The dose was then calculated for each
breathing phase using the corresponding subplan and accumulated onto a referen-
ce phase. For the evaluation of 4D-sCTs the accumulated doses based on the 4D-
sCT was compared to the accumulated dose of the clinically used 4D repeat CT and
resulted in an average gamma pass ratio of 93.7%, which is comparable to gamma
pass rates observed in 3D-sCTs.
The patient specific correction method proposed in Chapter 5 was not applied to
4D-sCTs since it was too time consuming to apply it to each phase image indivi-
dually.
7.3 Evaluation of synthetic CTs
The aim of this thesis was to identify a CBCT correction method that can enable ac-
curate proton dose calculations in adaptive proton therapy. Metrics that quantify
the image quality and dosimetric accuracy are essential for finding and characteri-
zing the most suitable method. This section discusses the most relevant and novel
metrics used throughout this thesis and some of their strengths and weaknesses.
A quantitative evaluation of synthetic CTs is challenging since no real ground-truth
volumetric image, that can be used as a reference, exists. To get as close as possible
to a ground truth image, all studies presented in this thesis used same day CT and
184
CBCT scans. In addition, deformable image registration was used to reduce the im-
pact of residual anatomical differences, caused by variations in the patient position
between CT and CBCT acquisition. The use of deformable image registration all-
owed us to mainly focus on the image quality difference between sCT and CT and
avoid disturbance caused by anatomical differences. Remaining minor anatomical
differences however are unavoidable and can still impact the evaluation metrics.
MAE is one of the most established image similarity metrics in the field of medical
image synthesis21. It compares the synthetic CT and a (deformed) reference CT on
a voxel-by-voxel basis. Table 1 compares mean absolute error results from all chap-
ters in this thesis to some comparable DL-studies in the field.
Study Site (description) MAE

Thummerer et. al. (Chapter 2) H&N 36 ± 6 HU
Thummerer et. al. (Chapter 3) H&N 40 ± 4 HU
Liang et al.22 H&N 30 ± 5 HU
Yuan et al.23 H&N 49 HU
Thummerer et. al. (Chapter 5) Thorax 3D 34 ± 6 HU
Thummerer et. al. (Chapter 6) Thorax 3D 37 ± 8 HU
Thummerer et. al. (Chapter 6) Thorax 4D (0% phase) 48 ± 6 HU
7
1
Thummerer et. al. (Chapter 6) Thorax 4D (average) 38 ± 6 HU
Eckl et. al.24 Thorax 3D 94 ± 32 HU
Table 1. A comparison of mean absolute error results for all chapters in this thesis and some compara-
ble studies from literature, H&N = head and neck.
Across all chapters of this thesis, we observed quite consistent mean absolute error
results (34 – 40 HU), even across head-and-neck and thorax patient cohorts. Only
the 4D single phase image shows a significantly higher MAE of 48 HU. The compa-
rison to other studies in the field reveals larger discrepancies. For head and neck,
Liang et al. achieved a lower MAE of 30 HU while Yuan et al. presented worse MAE
results of 49 HU22,23. Also in the thorax, results by Eckl et al. showed significantly
higher MAE (94 HU) than presented in Chapters 5 and 6 of this thesis24. However,
caution has to be taken when comparing MAE results between studies, since MAE
is influenced by where and how it is calculated (e.g., field-of-view, calculation in-
side patient outline vs. over the entire image). The presented studies above also
varied in training/testing dataset size, neural network architecture and training
procedure which all influences the MAE results. In addition to the use as an image
similarity metric, MAE is also a commonly used as a loss function to train and opti-
mize convolutional neural networks for image related tasks.
185
Chapter 7
To assess the proton dose calculation accuracy of synthetic CTs, we focused on two
types of evaluations: first, we performed a gamma analysis to compare the dose
distribution between sCT and CT. Gamma analysis provides insights into the simi-
larity of the global dose distribution but does not provide much information about
clinically relevant regions such as target volumes or OARs. For example, larger dose
differences in low dose regions might be more acceptable than in high dose regi-
ons such as target volumes or radiation sensitive areas such as OARs. Therefore,
to investigate the dose locally in target volumes and organs-at-risk, we performed
a dose evaluation using dose-volume histogram parameters. For target volumes,
mainly the dose that at least 98% of the target volume receives (D98) was used, for
organs-at-risk, mostly the mean dose (Dmean). These DVH parameters allowed to
show the similarity between synthetic CTs and reference CTs using clinically esta-
blished parameters relevant for clinical decision making. Table 2 summarizes gam-
ma pass ratios for all studies presented in this thesis.
Thesis chapter Site (description) Gamma pass ratio

(3%/3mm)
Chapter 2 H&N (non-clinical plan) 99.9 %
Chapter 3 H&N (clinical plan) 98.8 %
Chapter 5 Thorax 3D 93.7%
Chapter 5 Thorax 3D (with correction) 96.8 %
Chapter 6 Thorax 3D 93.2 %
Chapter 6 Thorax 4D (0% phase) 92.3 %
Chapter 6 Thorax 4D (average) 94.4 %
Table 2. A comparison of gamma analysis results for all chapters in this thesis.
During the work on this thesis, we observed that often differences in image quality
metrics are not necessarily reflected in the dose metrics. Especially local dose me-
trics, such as dose to organs-at-risk and target volumes, were not sensitive to in-
creased voxel-wise image quality differences. An example for such an observation
is the larger image difference (quantified by MAE) between 4D- and 3D-sCTs, that
did not lead to a significant difference in dose measured in target volumes or OARs.
Further clinical interpretability was provided by calculating normal tissue com-
plication probabilities (NTCPs) based on CT and sCT. NTCP metrics are routinely
employed for proton therapy patient selection in the Netherlands25. In this work,
NTCP evaluations allowed to compare predicted toxicities between sCT and CT and
showed the clinical impact of the reduced image quality of synthetic CTs on certain
toxicities. Overall, we observed very low NTCP differences between synthetic CTs
and reference CTs in all studies presented in this thesis, suggesting clinical equiva-
lency between CT and sCTs.
This thesis proposed a novel evaluation tool, specific to proton therapy, which was
186
centered on performing proton radiography simulations. Proton radiography uti-

lizes a (simulated) beam with similar characteristics as the treatment beam and
thereby allows to draw direct conclusions about the suitability of synthetic CTs
for proton dose calculations. In Chapters 5 and 6, range error maps, generated by
comparing proton radiography simulations between synthetic and reference CTs,
allowed to identify lung tissue as major contributor to a lower dose calculation ac-
curacy. Proton radiography is a projectional imaging technique that generates 2D
images. As a result, proton radiographies are angle dependent. Proton radiography
simulations in Chapters 5 and 6 were performed from a 0-degree gantry angle
(anterior-posterior direction). To further refine proton radiography as a synthetic
CT image evaluation metric, PR simulations from multiple angles should be con-
sidered, ideally from angles that coincide with beam angles used for patient treat-
ment.
7.4 Quality assurance of synthetic CTs
Due to the large number of parameters and their convoluted nature, neural net-
works are often referred to as ‘black-boxes’, meaning that it is not obvious how
inputs and outputs are connected, or which features are extracted to generate an
output. In addition, neural networks are also sensitive to the before mentioned
out-of-distribution data that might cause unexpected anomalies in the outputs.
7
1
Problematic variations of input data can be caused by imaging artifacts, e.g., due
to implants or device malfunction, unusual patient anatomies or changes in acqui-
sition parameters. These perturbations are not always clearly identifiable before-
hand and due to the nature of neural networks it’s unclear what impact these per-
turbations have on the output images. This warrants stringent quality assurance
procedures to catch potential outliers and to safely deploy synthetic CT generation
for clinical applications. Figure 3 shows two examples of synthetic CT failures ob-
served during the work on this thesis.
To detect outliers and failures, Chapter 4 presents a quality control procedure

based on in-vivo proton radiography measurements in head and neck cancer pa-
tients. Proton radiography utilizes proton beams to image (part of) the patient and
allows to directly derive proton range information. In Chapter 4, in-vivo PR mea-
surements, acquired before the treatment delivery26, were compared against PR-si-
mulations based on synthetic CTs and repeat CTs in terms of range errors. This all-
owed to evaluate the HU accuracy of synthetic CTs and to compare of range errors
of synthetic CTs and same-day repeat CTs, which are the current standard in our
clinic to monitor the impact of changing patient anatomy during treatment. Re-
peat CTs were acquired in treatment position but not in the treatment room, which
187
Chapter 7
led to the removal of some range probes in anatomically unstable regions.
Results presented in Chapter 4 showed comparable range errors for synthetic CTs
CBCT sCT CT
Figure 3. Synthetic CT failures observed during the work on this thesis. The first row shows a head and
neck cancer patient with imaging artifacts on the CBCT. The neural network interpreted this artifact as
boney structure. The second row shows a lung cancer patient with unusual lung tissue that is misre-
presented on the synthetic CT.
compared to repeat CTs, with mean relative range errors ranging from -1.2% to 1.5%
in repeat CTs, and from -0.7% to 2.7% in synthetic CTs. These results suggest the
general suitability of DL-based synthetic CTs for APT of head and neck cancer pa-
tients. To facilitate fast quality control checks of synthetic CTs using proton radio-
graphy some limitations of the proposed technique still have to be overcome. With
a size of 4 by 4 cm, the used PR-field was limited in size and therefore only a limited
area around the treatment iso-center was verified in the synthetic images. For a
complete sCT verification it is required to image at least the areas that are within
the beam path or ideally the entire sCT field-of-view. An increase in PR-field size
would require the use of a detector with a larger readout area. The used multi-layer
ionization chamber had a maximum read-out diameter of 12 cm. For clinical im-
plementations multi-layer ionization chambers with larger sensitive areas or flat
panel detectors could be investigated27,28. To facilitate online APT, fast acquisition
of PR-fields is essential. The current measurement setup requires manual positio-
ning of the bulky multi-layer ionization chamber on a trolley next to the patient.
Ideally the detector would be integrated into the treatment machine (e.g gantry or
couch mounted) to allow PR measurements without the need to manually position
the detector next to the patient. After solving these two issues, proton radiography
will be closer to be routinely integrated into online APT workflows to verify the
188
accuracy of CBCT-based synthetic CTs.

Besides measurement-based techniques like PR, uncertainty predictions inherent
to the neural network architecture can be utilized for quality control of sCTs29. They
allow to not only generate an output image, but also provide an uncertainty map,
that describes the networks uncertainty associated with each voxel. The so-called
Monte Carlo dropout technique randomly deactivates a fraction of the neurons or
convolutional layers of a neural network, while repeatedly performing inference30.
The multiple output images are then used to calculate the variation of each voxel
which can be visualized in an uncertainty map. For voxels where the network is
very certain/uncertain a low/high variability is expected. This approach was alrea-
dy investigated for medical image synthesis, but further work is required to see the
value for application in radiotherapy. If a correlation between uncertainty map and
dose calculation accuracy exists, Monte Carlo dropout could be a valuable tool to
assess the quality of synthetic CTs for adaptive proton therapy in the future. Figure
4 shows such an uncertainty map together with a corresponding image difference
and range error map. This example shows that the projection of image difference,
uncertainty map and range error map highlight similar areas.
While proton radiography measurements provide ground-truth information on
proton range, which cannot be derived from other imaging methods, uncertain-
ty maps allow to localize potential errors and increase the interpretability of the
neural networks. The combination of measurement- and neural network-based
7
1
quality assurance techniques could facilitate the safe implementation of DL-based
synthetic CTs in the future.
Figure 4. a) Uncertainty map, generated using a Monte Carlo dropout technique (2D projection), b)
Difference between sCT and pCT (2D projection), c) Range error map between sCT and pCT. All images
show a projection along the anterior-posterior direction. This figure shows preliminary results courte-
sy of Arthur Galapon, UMCG.
7.5 Clinical implementation and future perspectives
189
Chapter 7
The results presented in Chapters 2 - 6 of this thesis, and by other authors, sug-
gest a general suitability of deep learning based synthetic CTs to enable accurate
proton dose calculations. However, to this date there is a lack of reports on actual
clinical implementation of deep learning-based image synthesis for daily adaptive
proton therapy workflows. Previous studies, including the ones presented in this
thesis, focus mainly on a preclinical evaluation of image quality and dose calcu-
lation accuracy in a controlled setting (e.g., by generating reference images using
deformable image registration). It remains challenging to judge and quantify the
image quality of synthetic CTs under clinical conditions on a patient specific basis,
since multiple sources of errors, such as sCT generation, variation in patient posi-
tion, and anatomical changes are overlayed and hard to isolate.
Follow up studies are necessary to investigate how reliable synthetic CTs are in real
clinical environments to assist with clinical adaptation decisions. As a transitory
phase, CBCT-based synthetic CTs can be employed in parallel with already existing
adaptive workflows (e.g., based on weekly repeat CTs). This would allow to closely
monitor the accuracy and reliability of synthetic CTs for plan adaptation decisions
in relation to the current standard. In the future, synthetic CTs might then be able
to replace the existing adaptation workflows based on repeat CTs completely. Our
results revealed that synthetic CT generation is more challenging in the thoracic
region than in the head and neck area. This suggests that synthetic CTs should be
first implemented for head and neck cancer patients before going to more challen-
ging regions.
In the thorax, the interference of pencil beam scanning delivery and target motion,
also known as interplay effects, can have an adverse effect on the delivered dose
distribution. In the future 4D-sCTs in combination with log-file-based 4D proton
dose reconstructions can enable a daily assessment of these effects.
The lack of established quality control and quality assurance procedures further
hampers the clinical implementation of synthetic CTs in adaptive proton therapy
workflows. In Chapter 4, a proton radiography-based patient specific quality con-
trol method was proposed, but further improvements are required to deploy this
technique on a large scale (see section 7.4). Additional quality assurance methodo-
logies and tools have to be developed to reach a broad acceptance and trust in the
radiotherapy community. The clinical acceptance of DL-based sCTs by non-phy-
sicist staff could benefit from advanced visualization tools that present synthetic
CTs in combination with quality control images, such as range error or uncertainty
maps. A focus should also be on the automation of QA tools to keep the workload
190
for clinical staff to a minimum.
CBCT correction is only one of the steps in the online adaptive proton therapy
chain. For a widespread deployment of online adaptive proton therapy a high de-
gree of automation is required for each step in the process. Automation of deep
learning based synthetic CT generation is easily achievable as shown in this thesis.
However, the decision if treatment plan adaptations are required is challenging to
automate, since the decision is usually based on expert opinions of physicians and
medical physicists which is difficult to assess in quantitative metrics. Further in-
vestigations of image and dosimetric features that can identify the need for plan
adaptations are required for fully automatic online adaptive workflows.
New opportunities also arise in the model-based clinic where NTCP metrics toge-
ther with target coverage metrics may guide the adaptation process. Dose recons-
tructions using daily synthetic CTs can also provide an opportunity for verification
of the initial NTCP calculations based on a single pre-treatment CT image.
Introducing deep learning-based tools into the clinic also raises regulatory questi-
ons, that due to the novelty and scope of deep learning tools hinder fast adoption
into clinical adaptive proton therapy. Multiple studies recently addressed the chal-
lenging implementation of deep learning tools in radiation oncology31–33.
7
1
All chapters in this thesis, and most of the deep learning based CBCT correction
approaches reported in literature, act on the reconstructed CBCT image. CBCT cor-
rection however can also be performed in the projection domain as shown by Lan-
dry et al.34 Projection based approaches utilize pairs of raw projections, affected by
CBCT typical imaging artifacts, and corrected, assumingly artifact free, projecti-
ons, generated by the use of prior CTs or Monte Carlo simulations. Due to the low
number of studies using projection-based approaches, no clear conclusions can be
drawn on which approach is optimal for proton dose calculations. Further studies
are required to investigate the potential benefit of correction approaches in the
projection domain.
All chapters of this thesis relied on a single neural network architecture, namely a
U-net like deep convolutional neural network. This relatively simple neural net-
work architecture was first introduced for image segmentation by Ronneberger
et al. in 201535. Since then, it has been applied to numerous tasks in the field of
image processing, including the use for synthetic CT generation. During the last
few years, a multitude of alternative, more complex strategies for image synthesis
have been proposed21,36. A popular network architecture are generative adversa-
191
Chapter 7
rial neural networks (GANs). GANs combine sub-networks for image generation
and classification to learn from unpaired data, meaning that for training CBCT and
CT images do not necessarily have to be from the same patient cohort. However,
due to lower requirements for training data and the more challenging task to learn
from unpaired data, GANs are more challenging to train and might require more
resources in the form of training data and hardware.
A general issue in the field of medical image synthesis is the lack of shared or public
data to compare the various neural network architectures in a fair manner. Each
study currently uses its own, closed source dataset to evaluate their neural network
approaches. For that reason, its currently not possible to draw definitive conclusi-
ons about which neural network architecture is most suitable to enable accurate
proton dose calculation. Only the availability of high-quality public data sets can
allow for such a comparison and is highly desirable for the future of deep learning
based synthetic CTs in the field of radiation oncology. Inter-institutional collabora-
tions would also benefit the generalizability and robustness of DL solutions.
In conclusion, this thesis focused on deep learning based CBCT correction to gene-
rate synthetic CTs and enable proton dose calculations in head and neck and tho-
racic cancer patients. Synthetic CTs are an essential step towards daily, offline and
online adaptive proton therapy workflows and have the ability to improve the pa-
tient treatment by reducing the imaging dose and detecting the need for treatment
plan adaptions early. This is particularly relevant for proton therapy treatments
since they are highly sensitive to anatomical changes. Deep learning seems to be
a promising technique to correct CBCTs, since it provides a fast and automatic ge-
neration of highly accurate images. The work presented in Chapters 2, 3, 5 and 6
demonstrates a high agreement between synthetic CTs and repeat CTs, the current
clinical standard for adaptation decisions. The need for QA tools of deep learning
based synthetic CTs was addressed in Chapter 4 by investigating in-vivo proton
radiography measurements to verify synthetic CTs.
192
References
[1] Spadea MF, Maspero M, Zaffino P, Seco J. Deep learning based synthetic-CT generation in
radiotherapy and PET: A review. Published online 2021. doi:10.1002/mp.15150
[2] Maspero M, Bentvelzen LG, Savenije MHF, et al. Deep learning-based synthetic CT gene-
ration for paediatric brain MR-only photon and proton radiotherapy. Radiotherapy and
Oncology. 2020;153:197-204. doi:10.1016/J.RADONC.2020.09.029
[3] Li Y, Zhu J, Liu Z, et al. A preliminary study of using a deep convolution neural network
to generate synthesized CT images based on CBCT for adaptive radiotherapy of nasopha-
ryngeal carcinoma. Phys Med Biol. 2019;64(14):145010. doi:10.1088/1361-6560/AB2770
[4] Barateau A, de Crevoisier R, Largent A, et al. Comparison of CBCT-based dose calculation
methods in head and neck cancer radiotherapy: from Hounsfield unit to density calibrat-
ion curve to deep learning. Med Phys. 2020;47(10):4683-4693. doi:10.1002/MP.14387
[5] Xiao H, Teng X, Liu C, et al. A review of deep learning-based three-dimensional medi-
cal image registration methods. Quant Imaging Med Surg. 2021;11(12):4895916-4894916.
doi:10.21037/QIMS-21-175
[6] Islam KT, Wijewickrema S, O’Leary S. A deep learning based framework for the registrati-
on of three dimensional multi-modal medical images of the head. Scientific Reports 2021
11:1. 2021;11(1):1-13. doi:10.1038/s41598-021-81044-7
[7] Chen X, Diaz-Pinto A, Ravikumar N, Frangi AF. Deep learning in medical image regis-
tration. Progress in Biomedical Engineering. 2021;3(1):012003. doi:10.1088/2516-1091/
ABD37C
[8] Maspero M, Houweling AC, Savenije MHF, et al. A single neural network for cone-beam
7
1
Phys Imaging Radiat Oncol. 2020;14:24-31. doi:10.1016/J.PHRO.2020.04.002
[9] Spadea MF, Pileggi G, Zaffino P, et al. Deep Convolution Neural Network (DCNN)
Multiplane Approach to Synthetic CT Generation From MR images—Application in
Brain Proton Therapy. International Journal of Radiation Oncology*Biology*Physics.
2019;105(3):495-503. doi:10.1016/j.ijrobp.2019.06.2535
Phys Med Biol. 2020;65(23):235036. doi:10.1088/1361-6560/abb1d6
[11] Kurz C, Maspero M, Savenije MHF, et al. CBCT correction using a cycle-consistent ge-
nerative adversarial network and unpaired training to enable photon and proton dose
calculation. Phys Med Biol. 2019;64(22):225004. doi:10.1088/1361-6560/AB4D8C
[12] Gao L, Xie K, Wu X, et al. Generating synthetic CT from low-dose cone-beam CT by
using generative adversarial networks for adaptive radiotherapy. Radiation Oncology.
2021;16(1). doi:10.1186/s13014-021-01928-w
therapy. Phys Med Biol. 2019;64(12). doi:10.1088/1361-6560/ab22f9
[14] Kida S, Kaji S, Nawa K, et al. Visual enhancement of Cone-beam CT by use of CycleGAN.
Med Phys. 2020;47(3):998-1010. doi:10.1002/MP.13963
[15] Hoffmann A, Oborn B, Moteabbed M, et al. MR-guided proton therapy: A review and a
preview. Radiation Oncology. 2020;15(1):1-13. doi:10.1186/S13014-020-01571-X/FIGU-
RES/5
[16] Hoffmann L, Alber M, Jensen MF, Holt MI, Møller DS. Adaptation is mandatory for inten-
sity modulated proton therapy of advanced lung cancer to ensure target coverage. Radio-
therapy and Oncology. 2017;122(3):400-405. doi:10.1016/J.RADONC.2016.12.018
[17] Li G, Citrin D, Camphausen K, et al. Advances in 4D medical imaging and 4D radiation
therapy. Technol Cancer Res Treat. 2008;7(1):67-81. doi:10.1177/153303460800700109
193
Chapter 7
[18] den Otter LA, Chen K, Janssens G, et al. Technical Note: 4D cone-beam CT reconstruction
from sparse-view CBCT data for daily motion assessment in pencil beam scanned proton
therapy (PBS-PT). Med Phys. 2020;47(12):6381-6387. doi:10.1002/MP.14521
[19] Mory C, Janssens G, Rit S. Motion-aware temporal regularization for improved 4D cone-
beam computed tomography. Phys Med Biol. 2016;61(18):6856-6877. doi:10.1088/0031-
9155/61/18/6856
[20] Meijers A, Jakobi A, Stützer K, et al. Log file based dose reconstruction and accumula-
tion for 4D adaptive pencil beam scanned proton therapy in a clinical treatment plan-
ning system: Implementation and proof-of-concept. Med Phys. Published online 2019.
doi:10.1002/mp.13371
[21] Spadea MF, Maspero M, Zaffino P, Seco J. Deep learning based synthetic‐CT generation in
radiotherapy and PET: A review. Med Phys. 2021;48(11):6537-6566. doi:10.1002/mp.15150
[22] Chen L, Liang X, Shen C, Jiang S, Wang J. Synthetic CT generation from CBCT images via
deep learning. Med Phys. 2020;47(3):1115-1125. doi:10.1002/mp.13978
[23] Yuan N, Dyer B, Rao S, et al. Convolutional neural network enhancement of fast-
scan low-dose cone-beam CT images for head and neck radiotherapy. Phys Med Biol.
2020;65(3):035003. doi:10.1088/1361-6560/ab6240
[24] Eckl M, Hoppen L, Sarria GR, et al. Evaluation of a cycle-generative adversarial network-
based cone-beam CT to synthetic CT conversion algorithm for adaptive radiation thera-
py. Physica Medica. Published online 2020. doi:10.1016/j.ejmp.2020.11.007
[25] Langendijk JA, Lambin P, de Ruysscher D, Widder J, Bos M, Verheij M. Selection of pa-
tients for radiotherapy with protons aiming at reduction of side effects: the model-based
approach. Radiother Oncol. 2013;107(3):267-273. doi:10.1016/J.RADONC.2013.05.007
[26] Meijers A, Seller Oria C, Free J, Langendijk JA, Knopf AC, Both S. Technical Note: First
report on an in vivo range probing quality control procedure for scanned proton beam
therapy in head and neck cancer patients. Med Phys. 2021;48(3):1372-1380. doi:10.1002/
mp.14713
[27] Seller Oria C, Marmitt GG, Free J, et al. Optimizing calibration settings for accurate water
equivalent path length assessment using flat panel proton radiography. Phys Med Biol.
2021;66(21). doi:10.1088/1361-6560/AC2C4F
[28] Harms J, Maloney L, Sohn JJ, Erickson A, Lin Y, Zhang R. Flat-panel imager energy-de-
pendent proton radiography for a proton pencil-beam scanning system. Phys Med Biol.
2020;65(14). doi:10.1088/1361-6560/AB9981
[29] Tanno R, Worrall DE, Kaden E, et al. Uncertainty modelling in deep learning for safer neu-
roimage enhancement: Demonstration in diffusion MRI. Neuroimage. 2021;225:117366.
doi:10.1016/J.NEUROIMAGE.2020.117366
[30] Nguyen D, Sadeghnejad Barkousaraie A, Bohara G, et al. A comparison of Monte Carlo
dropout and bootstrap aggregation on the performance and uncertainty estimation in
radiation therapy dose prediction with deep learning neural networks. Phys Med Biol.
2021;66(5):054002. doi:10.1088/1361-6560/ABE04F
[31] Brouwer CL, Dinkla AM, Vandewinckele L, et al. Machine learning applications in radia-
tion oncology: Current use and needs to support clinical implementation. Phys Imaging
Radiat Oncol. 2020;16:144-148. doi:10.1016/J.PHRO.2020.11.002
[32] Barragan-Montero A, Bibal A, Dastarac MH, et al. Towards a safe and efficient clinical
implementation of machine learning in radiation oncology by exploring model interpre-
tability, explainability and data-model dependency. Phys Med Biol. 2022;67(11):11TR01.
doi:10.1088/1361-6560/AC678A
[33] Huang D, Bai H, Wang L, et al. The Application and Development of Deep Lear-
ning in Radiotherapy: A Systematic Review. Technol Cancer Res Treat. 2021;20.
doi:10.1177/15330338211016386
194
[34] Landry G, Hansen D, Kamp F, et al. Comparing Unet training with three different data-
sets to correct CBCT images for prostate radiotherapy dose calculations. Published online
2019:1-24. doi:10.1088/1361-6560/aaf496
[35] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image
segmentation. Lecture Notes in Computer Science (including subseries Lecture Notes
in Artificial Intelligence and Lecture Notes in Bioinformatics). 2015;9351:234-241.
doi:10.1007/978-3-319-24574-4_28/COVER
[36] Wang T, Lei Y, Fu Y, et al. A review on medical imaging synthesis using deep learning
and its clinical applications. J Appl Clin Med Phys. 2021;22(1):11. doi:10.1002/ACM2.13121
7
1
195
Chapter 7
196
Appendices
Appendices
197
Appendices
Acronyms
AET acute esophageal toxicity

AI artificial intelligence
AIC analytical image based correction
APT adaptive proton therapy
ART adaptive radiotherapy
BgRT biology guided RT
CBCT cone-beam CT
CNR contrast-to-noise ratio
CT compute tomography
CTV clinical target volume
DCNN deep convolutional neural network
DIR deformable image registration
DL deep learning
DSC dice similarity coefficient
DVH dose volume histogram
FOV fiel of view
FWHM full width half maximum
GAN generative adversarial network
GPU graphical processing unit
GTV gross tumor volume
H&N head and neck
HU Hounsfield unit
IDD integral depth dose profile
IGRT image guided radiotherapy
IMPT intensity modulated proton therapy
LINAC linear accelerator
LUT look-up table
MAE mean absolute error
MARE mean absolute range error
ME mean error
MeV megaelectronvolt
MLIC multi-layer ionization chamber
MR magnetic resonance
MRE mean range error
NTCP normal tissue complication probability
198
Appendices
OAR organ-at-risk
PBS pencil beam scanning
PCM pharyngeal constrictor muscle
pCT planning CT
PET positron emission tomography
PR proton radiography
PSNR peak signal-to-noise ratio
PT proton therapy
QA quality assurance
QC quality control
RBE relative biological effectiveness
rCT repeat CT
ROI region of interest
RP radiation pneumonitis
RP range probing
RRE relative range errors
RT radiotherapy
sCT synthetic CT
SD standard deviation
SNR signal-to-noise ratio
SNU structural non-uniformity A
SOBP spread-out bragg peak
SPGRE spoiled gradient recalled echo
SPR stopping power ratio
SSIM structural similarity
STD standard deviation
TPS treatment planning system
TV target volume
UMCG University Medical Center Groningen
199
Appendices
Summary
Cone-beam computed tomography (CBCT) has the potential to play a central role
in adaptive proton therapy workflows. However, due to imaging artifacts, CBCT
Hounsfield unit (HU) accuracy is impaired and does not allow for precise proton
dose calculations. Therefore, currently the use of CBCTs in proton therapy is res-
tricted mainly to patient positioning. To overcome this limitation, various CBCT
correction techniques have been proposed in literature. This thesis aimed at inves-
tigating the suitability of some of these CBCT correction methods in the context
of adaptive proton therapy, with a strong focus on a deep learning-based method.
Corrected CBCTs, also referred to as synthetic CTs, were investigated in terms of
image quality and dosimetric accuracy for head and neck and lung cancer patients.
Chapter 2 presents a comparison of various CBCT correction techniques focusing
on proton dose calculation accuracy for adaptive proton therapy in head and neck
cancer patients and includes a deep learning-based technique, a deformable image
registration-based method and an analytical image-based correction method. Pro-
ton dose calculation accuracy was investigated by creating proton treatment plans
with a simulated target volume in the brainstem region using a single beam angle.
The highest image quality and dose calculation accuracy was observed for the deep
learning-based method with a mean absolute error (MAE) of 37 HU and a gamma
pass ratio (2%/2mm) of 99.3 %. This reasoned the choice to mainly focus on deep
learning-based techniques in the remaining chapters of this thesis.
Magnetic resonance imaging (MR) provides superior soft tissue contrast while
also avoiding the use of ionizing radiation, but does not facilitate dose calcula-
tions for radiotherapy due to the lack of electron density information. However,
the beneficial imaging characteristics can be advantageous for future applications
in adaptive proton therapy workflows and MR-only radiotherapy. Deep learning
techniques, as investigated in Chapter 2, are capable of converting MR images into
synthetic CTs and can thereby enable MR-based dose calculations. Chapter 3 pre-
sents a unique comparison of proton dose calculations on CBCT- and MR-based
synthetic CTs for head and neck cancer patients. Chapter 3 also uses an improved
dosimetric evaluation that mimicked a clinical scenario by recalculating the clini-
cal treatment plans used for patient treatment. Results of Chapter 3 showed that
CBCT-based synthetic CTs resulted in higher image quality and dosimetric accu-
racy than MR-based synthetic CTs with respective MAEs of 40 HU and 65 HU and
2%/2mm gamma pass ratios of 96.1 % and 93.3 % respectively. Although, results
showed higher accuracy for CBCT-based synthetic CTs, also MR-based synthetic
CTs resulted in acceptable image quality that allowed accurate proton dose calcu-
lations and suggests suitability for clinical applications.
200
Appendices
To verify the image quality of synthetic CTs within adaptive proton therapy treat-
ments and to facilitate a clinical implementation of synthetic CTs, quality control
(QC) procedures are essential. Chapter 4 proposes a QC-tool based on in-vivo pro-
ton radiography measurements. These ground truth proton range measurements
in seven head and neck cancer patients were compared to simulations in synthetic
CTs and reference CTs to verify the Hounsfield unit accuracy of deep learning based
synthetic CTs. The results showed comparable results for deep learning based syn-
thetic CTs and the current clinical standard for adaptive proton therapy, repeat CT
acquisitions. The agreement between simulations and measured proton radiogra-
phies ranged from -1.2% to 1.5% in repeat CTs, and from -0.7% to 2.7% in synthetic
CTs, indicating clinical suitability of synthetic CTs.
While Chapters 2 to 4 focus on the head and neck region, Chapters 5 and 6 inves-
tigate the use of synthetic CTs for lung cancer patients. Due to more heterogenei-
ties, deep seated tumors and respiratory motion, the thorax region can be conside-
red more challenging for synthetic CT generation than the head and neck region.
Chapters 5 and 6 utilize the same U-net neural network architecture as used in
Chapters 1 to 3.
Chapter 5 first investigates synthetic CTs for lung cancer patients in a static ‘3D’
scenario. The evaluation focuses on image quality, range uncertainty and dosime-
A
tric accuracy. Range uncertainty was investigated by performing proton radiogra-
phy simulations which revealed a larger errors in lung tissues compared to other
tissues (2.1 mm vs. 0.9 mm). These lung tissue inaccuracies had a noticeable im-
pact on proton dose calculations and motivated the introduction of a patient spe-
cific correction technique. This correction technique utilizes a prior CT image to
improve the HU accuracy on a patient specific basis. With the correction applied,
synthetic CTs of lung cancer patients showed comparable dosimetric accuracy to
head-and-neck cancer patients with an average MAE of 31 HU and an average gam-
ma pass ratio of 96.8 % (3 %/3 mm).
Chapter 6 extends the results of chapter 5 to a dynamic ‘4D’ scenario. Since cur-
rently our proton clinic does not routinely used 4D-CBCTs yet, 4D-CBCTs had to
be reconstructed from a sparse-view projection set dedicated for 3D-CBCT recons-
truction. An iterative reconstruction algorithm, that has shown high performance
with sparse view projection data was used for the 4D-CBCT reconstruction. Alt-
hough the image quality of 4D-CBCTs was significantly lower compared to 3D-
CBCTs, the neural network was able to restore accurate HU that allowed dose cal-
culations on 4D synthetic CTs, with similar accuracy as in 3D-CBCTs, with a MAE
of 48.1 HU for a single breathing phase and MAE of 37.7 HU for the average of all
201
Appendices
breathing phases. Average gamma pass ratios were 92.3% for an individual breat-
hing phase and 94.4% for the average phase image. To investigate the 4D-aspect,
a 4D-dose reconstruction was performed utilizing patient specific breathing sig-
nals and treatment logfiles. The dose distributions of individual breathing phases
and the dose accumulated onto a reference phase showed high similarity to doses
calculated on the clinically used 4D repeat CT, with a pass ratio of 93.7 % for the
accumulated dose.
In conclusion, this thesis thoroughly investigated the suitability and accuracy of

deep learning-based synthetic CTs for adaptive proton therapy of head and neck,
and lung cancer patients. Across the various treatment sites, synthetic CTs have
shown high image quality and dosimetric accuracy suggesting the suitability for
clinical applications. Proton radiography was proposed as a quality control proce-
dure to verify synthetic CTs on a patient specific basis. Future work should focus on
the safe implementation of deep learning based synthetic CTs into clinical work-
flows.
202
Appendices
Samenvatting
Cone-beam computertomografie (CBCT) speelt potentieel een centrale rol in adap-

tieve protonentherapie-workflows. Vanwege beeldartefacten is de nauwkeurig-
heid van de CBCT Hounsfield-eenheid (HU) echter aangetast en is het niet moge-
lijk nauwkeurige protonendosisberekeningen uit te voeren. Daarom is momenteel
het gebruik van CBCT's bij protonentherapie voornamelijk beperkt tot hulpmiddel
bij het positioneren van de patiënt. Om deze beperking te ondervangen, worden in
de literatuur verschillende CBCT-correctietechnieken voorgesteld. Dit proefschrift
is gericht op het onderzoeken van de bruikbaarheid van een aantal van deze CBCT-
correctiemethoden in de context van adaptieve protonentherapie, met een sterke
focus op een methode die gebaseerd is op Deep Learning. De beeldkwaliteit en do-
simetrische nauwkeurigheid van deze gecorrigeerde CBCT's, ook wel synthetische
CT's genoemd, werd onderzocht voor hoofd-hals- en longkankerpatiënten.
Hoofdstuk 2 presenteert een vergelijking van verschillende CBCT-correctietechnie-

ken gericht op de nauwkeurigheid van de protonendosisberekening voor adaptieve
protonentherapie bij patiënten met hoofd-halskanker en omvat een Deep Learning
gebaseerde techniek, een vervormbare beeldregistratie (deformable image regis-
tration) gebaseerde methode en een analytische beeldgebaseerde correctiemetho-
de. De nauwkeurigheid van de protonendosisberekening werd onderzocht door
A
protonenbehandelplannen te maken met een gesimuleerd doelvolume in het her-
senstamgebied met gebruik van een enkele bestralingsbundel. De hoogste beeld-
kwaliteit en nauwkeurigheid van de dosisberekening werden waargenomen voor
de Deep Learning gebaseerde methode met een gemiddelde absolute fout (MAE)
van 37 HU en een gammapass-ratio (2%/2 mm) van 99,3%. Deze resultaten waren
de reden om in de resterende hoofdstukken van dit proefschrift vooral op de Deep
Learning gebaseerde technieken te richten.
Magnetische resonantiebeeldvorming (MR) biedt superieur contrast voor zachte

weefsels en gebruikt geen ioniserende straling, maar is vanwege het gebrek aan
informatie over de elektronendichtheid niet geschikt voor dosisberekeningen. De
gunstige kenmerken van deze modaliteit kunnen echter voordelig zijn voor toe-
komstige toepassingen in adaptieve protonentherapie- en in MR-only radiothera-
pie workflows. Deep Learning-technieken, zoals onderzocht in Hoofdstuk 2, zijn in
staat om MR-beelden te transformeren in synthetische CT's en kunnen daarmee
MR gebaseerde dosisberekeningen mogelijk maken. Hoofdstuk 3 presenteert een
unieke vergelijking van protonendosisberekeningen op CBCT- en MR-gebaseerde
synthetische CT's voor patiënten met hoofd-halskanker. Ook wordt in Hoofdstuk 3
gebruik gemaakt van een verbeterde dosimetrische evaluatie die een klinisch sce-
203
Appendices
nario simuleert door de klinische behandelplannen te herberekenen. De resultaten

van Hoofdstuk 3 toonden aan dat CBCT-gebaseerde synthetische CT's resulteer-
den in een hogere beeldkwaliteit en dosimetrische nauwkeurigheid dan MR-geba-
seerde synthetische CT's met respectieve MAE's van 40 HU en 65 HU en 2%/2mm
gamma pass ratio's van respectievelijk 96,1% en 93,3%. Hoewel de resultaten een
betere nauwkeurigheid lieten zien voor de CBCT gebaseerde synthetische CT's, re-
sulteerden ook de MR gebaseerde synthetische CT's in een acceptabele beeldkwa-
liteit die nauwkeurige protonendosisberekeningen mogelijk maakten en daarmee
bruikbaar zijn voor de klinische praktijk.
Om de beeldkwaliteit van synthetische CT's binnen adaptieve protonentherapie-

behandelingen te verifiëren en om klinische implementatie van synthetische CT's
te verwezenlijken, zijn procedures voor kwaliteitscontrole (QC) essentieel. Hoofds-
tuk 4 stelt een QC-hulpmiddel voor dat gebaseerd is op in-vivo protonenradiogra-
fiemetingen. Deze metingen, die het protonenbereik verifiëren, werden bij zeven
hoofd-halskankerpatiënten vergeleken met simulaties van zowel synthetische CT's
als referentie-CT's om de nauwkeurigheid van de Hounsfield-eenheid in de Deep
Learning gebaseerde synthetische CT's te verifiëren. De resultaten toonden verge-
lijkbare uitkomsten voor synthetische CT's op basis van Deep Learning en herhaal-
de CT-opnamen; de huidige klinische standaard voor adaptieve protonentherapie.
De overeenkomst tussen de simulaties en de gemeten protonenradiografieën va-
rieerde van -1,2% tot 1,5% bij herhaalde CT's en van -0,7% tot 2,7% bij synthetische
CT's, waarmee de klinische bruikbaarheid van synthetische CT’s getoond werd.
Terwijl Hoofdstukken 2 tot en met 4 zich richten op het hoofd-halsgebied, onder-

zoeken de Hoofdstukken 5 en 6 het gebruik van synthetische CT's bij longkanker-
patiënten. Vanwege de toegenomen heterogeniteiten, de dieper gelegen tumoren
en de ademhalingsbeweging in het thoraxgebied wordt deze indicatie voor synthe-
tische CT-generatie als een grotere uitdaging beschouwd vergeleken met hoofd-
halskanker. Hoofdstukken 5 en 6 maken gebruik van dezelfde U-net neurale net-
werkarchitectuur als toegepast in Hoofdstukken 1 tot 3.
Hoofdstuk 5 onderzoekt synthetische CT's voor longkankerpatiënten in een sta-

tisch '3D'-scenario. De evaluatie richt zich op beeldkwaliteit, bereikonzekerheden
en dosimetrische nauwkeurigheid. Bereikonzekerheden werden onderzocht door
het uitvoeren van protonenradiografiesimulaties die grotere fouten in longweefsels
toonden in vergelijking met andere weefsels (2,1 mm vs. 0,9 mm). Deze onnauw-
keurigheden in het longweefsel hadden een merkbare invloed op de berekeningen
van de protonendosis en motiveerden het gebruik van een patiëntspecifieke cor-
rectietechniek. Deze correctietechniek maakt gebruik van een eerder CT-beeld om
204
Appendices
de HU-nauwkeurigheid op patiëntspecifieke basis te verbeteren. Met de toegepaste

correctie vertoonden synthetische CT's van longkankerpatiënten een vergelijkbare
dosimetrische nauwkeurigheid als hoofd-halskankerpatiënten met een gemiddel-
de MAE van 31 HU en een gemiddelde gammapass-ratio van 96,8 % (3 %/3 mm).
Hoofdstuk 6 breidt de resultaten van Hoofdstuk 5 uit naar een dynamisch ‘4D’-sce-
nario. Aangezien onze protonenkliniek op dit moment nog niet routinematig 4D-
CBCT's gebruikt, moesten 4D-CBCT's worden gereconstrueerd op basis van een
sparse-view projectieset die speciaal bedoeld is voor 3D-CBCT-reconstructie. Voor
de 4D-CBCT-reconstructie werd een iteratief reconstructie-algoritme toegepast,
dat hoge prestaties heeft getoond met sparse-view beeldprojectiegegevens. Hoewel
de beeldkwaliteit van 4D-CBCT's aanzienlijk lager was vergeleken met 3D-CBCT's,
was het neurale netwerk in staat om nauwkeurig de HU’s te herstellen die dosisbe-
rekeningen op 4D synthetische CT's vermogelijkte, met vergelijkbare nauwkeurig-
heid als in 3D-CBCT's, met een MAE van 48,1 HU voor een enkele ademhalingsfase
en een MAE van 37,7 HU voor het gemiddelde van alle ademhalingsfasen. De ge-
middelde gammapass-ratio’s waren 92,3% voor een individuele ademhalingsfase
en 94,4% voor de fasen-gemiddelde CT. Om het 4D-aspect te onderzoeken, werd
een 4D-dosisreconstructie uitgevoerd met behulp van patiëntspecifieke ademha-
lingssignalen en behandelingslogbestanden. De dosisverdelingen van individuele
ademhalingsfasen en de geaccumuleerde dosis op een referentiefase vertoonden
A
grote gelijkenis met de dosis berekend op de klinisch gebruikte 4D-herhalings-CT,
met een slagingspercentage van 93,7% voor de geaccumuleerde dosis.
Concluderend, in dit proefschrift is grondig onderzoek gedaan naar de bruikbaar-

heid en nauwkeurigheid van Deep Learning gebaseerde synthetische CT's voor
adaptieve protonentherapie van hoofd-hals- en longkankerpatiënten. Voor de
verschillende behandelingsindicaties hebben synthetische CT's een hoge beeld-
kwaliteit en dosimetrische nauwkeurigheid gedemonstreerd, wat betekent dat
synthetische CT’s in de klinische workflow ingezet zouden kunnen worden. Pro-
tonenradiografie werd gepresenteerd als een kwaliteitscontroleprocedure om syn-
thetische CT's op patiëntspecifieke basis te controleren. Toekomstig werk moet
zich verder richten op de veilige implementatie van Deep Learning gebaseerde syn-
thetische CT's in klinische workflows.
205
Appendices
Acknowledgments
First, I would like to thank my supervisors that supported me throughout my PhD

and made this thesis possible. Antje, you gave me the chance to join your research
group in Groningen, which I never regretted. I learned so much from you during
the time we were working together, and I cannot think of a better supervisor for
freshman PhD students making their first steps in academia. I also want to thank
you again for your hospitality at the beginning of my PhD.
Stefan, you started being my main supervisor after 1½ years, and from the begin-
ning on we found a great way to work together. I am grateful for all the opportuni-
ties you gave me and the trust you put in me. You taught me how to be an indepen-
dent researcher and you always allowed me to follow my own ideas and interests.
Hans, it was a pleasure to be part of your department for the last four years. You al-
ways showed great interest in my work, and I was impressed by how well you could
grasp my topic and the way you provided constructive feedback and comments for
my work.
Besides my supervision team, I was also always supported by my dear colleagues

and friends: Carmen, I was so lucky to have started my PhD at the same time as
you and to have you as a companion throughout my PhD. It was great to always
have someone to share the highs and lows of PhD life with. I really enjoyed working
with you on our projects and we also had a lot of fun outside of work. You are such a
talented researcher and I wish you all the best for your future career!
Sabine, you for sure are the most happy and vivid person I have ever met and wor-
ked with. Thank you for your continuous support during my PhD. I am so impres-
sed how you manage to excel at work while you are continuously educating your-
self and now also taking over clinical duties. However, I hope the future also brings
some less stressful times at which you can continue to fully enjoy life :)
Cassia, when Carmen and I started our PhDs, you were always making sure we feel
welcome and everything goes smooth. I envy your talent of always making sure
people feel comfortable. Although you changed your career path in the middle of
my PhD, we managed to stay in contact and still had a lot of fun times together.
You were always so welcoming and hosted me so many times for dinner or drinks.
I especially look back with good memories to our paranymph duties for Carmen.
We were a great team!
206
Appendices
Filipe, I really enjoyed that we hung out more often during the last year of my PhD.
I’m very grateful that you introduced me to the gym, to your soccer group and to
all the other fun activities we did together ;). You are such a wise and interesting
person. I hope we still manage to keep in contact in the future.
Bas, the two of us started working at the department at around the same time. It
was great to have you as a friend and colleague during my PhD. I look back with
a lot of good memories to our trip to Florida and New York. During covid times it
was very nice that we met many times to have dinner together and watch movies.
It made this period much more enjoyable! I wish you all the best for the final year
of your PhD!
Rutger, it was a great pleasure to work with you during you Master project and
afterwards. You are always very kind to anyone that requires help. In particular I
would like to thank you again for rescuing me and my bike when I lost my keys :)
I would also like to thank all other colleagues that made my PhD in Groningen such
a great experience. Adriaan, Arno, Art, Elske, Gabriel, Giuliano, Hans Paul, Ilse,
Irene, Yong, Lisa, Lydia, Makbule, Senquan, and Alessia we had a lot of fun du-
ring our extended lunch breaks, conferences and weekend activities.
A
Furthermore, I would like to thank the Medical Physics team, the Clinicians and the
RTTs in the UMCG and GPTC for always being available if there were any questions
or problems. In specific, I would like to thank Arturs for supporting me during the
beginning of my PhD. You made sure that I had all the necessary data and tools to
work on my projects and gave very valuable input and feedback for my work.
I would also like to thank my friends back home for their support. Whenever I came
home it felt like back in the old days, like I never left. Nikolaus, it is great to have so-
meone to talk about everything and you always give great advice for life and career.
I also really enjoy our heated discussions :) Andrea and Patrick, thanks for visiting
me in Groningen, I have great memories of that weekend. Daniela and Matthias
(and Bruno), thanks for being my taxi on several occasions and making sure I ar-
rive home well. It was great to see you again more often throughout the last year!
Jakob, Sarah, Melanie, Christine, Christoph, Laura and Simon I was always
looking forward to seeing you whenever I came home. Our camping trips and New
Year’s Eve celebrations were always great to reconnect. I look forward to spending
more time with you all, now that I’m closer to home again!
Finally, I would like to thank my family for their never-ending support. Dear Mama
207
Appendices
and Papa, you helped me so much during my move to Groningen, while I was there
and now also with my departure. I’m very grateful that you took the long journey to
Groningen so many times. Without you it would have been so much harder. Thank
you that whenever I came home you were making sure that everything is perfect
and that I felt like it is still my home :) Jakob, it is great to have you as a brother and
that we are still in contact so frequently. Thank you also for your help with moving!
Veronika, Gregor, Theodor and Jonathan you are such a nice family. I was always
looking forward to seeing you again. You made sure that it was never boring when
I came home, and I really enjoyed spending time with all of you. You all developed
so quickly, and I hope in the future I can see you more often again.
208
Appendices
Curriculum Vitae
Research Experience
Jan 2019 – Jan 2023 PhD candidate, Department of Radiation Oncology,
University Medical Center Groningen, Groningen, The
Netherlands
Apr 2017 - Apr 2018 Master thesis research, Department of Radiation On-
cology, Medical University Vienna, Vienna, Austria /
MedAustron center for ion therapy and research, Wiener
Neustadt, Austria
Apr 2015 – Sep 2015 Bachelor thesis research, Atominstitut, Vienna Universi-
ty of Technology, Vienna Austria
Education
Jan 2016 – Apr 2018 Master’s Degree, Biomedical Engineering, Vienna Uni-
versity of Technology, Vienna, Austria A
Sep 2011 – Jan 2016 Bachelor’s Degree, Technical Physics, Vienna University
of Technology, Vienna, Austria
Skills
Radiation therapy, online adaptive proton therapy
Medical image acquisition (CT, CBCT, MRI)
Image reconstruction, processing and evaluation
Artificial Intelligence, Deep Learning
Radiation dosimetry, thermoluminescence dosimetry, film dosimetry
Computational: Linux, bash, Python, MATLAB, Theano, Tensorflow, Pytorch,
RayStation, ITK, RTK
Languages
German (native), English (fluent), Dutch (basics)
209
Appendices
List of publications
Thummerer A, Zaffino P, Meijers A, et al. Comparison of CBCT based synthetic
CT methods suitable for proton dose calculations in adaptive proton therapy.
Phys Med Biol. 2020;65(9):95002. doi:10.1088/1361-6560/ab7d54
Thummerer A, de Jong BA, Zaffino P, et al. Comparison of the suitability of

CBCT- and MR-based synthetic CTs for daily adaptive proton therapy in head
and neck patients. Phys Med Biol. 2020;65(23):235036. doi:10.1088/1361-6560/
abb1d6
Seller Oria C, Thummerer A, Free J, et al. Range probing as a quality control

tool for CBCT-based synthetic CTs: In vivo application for head and neck cancer
patients. Med Phys. 2021;48(8):4498-4505. doi:10.1002/mp.15020
Thummerer A, Seller Oria C, Zaffino P, et al. Clinical suitability of deep learning

based synthetic CTs for adaptive proton therapy of lung cancer. Med Phys.
2021;48(12):7673-7684. doi:10.1002/MP.15333
Thummerer A, Seller Oria C, Zaffino P, et al. Deep learning–based 4D-synthetic

CTs from sparse-view CBCTs for dose calculations in adaptive proton therapy.
Med Phys. Published online 2022. doi:10.1002/MP.15930
210
Appendices
Conference records
[Oral presentation] Thummerer A, Seller Oria C, Visser S, Zaffino P, Meijers A,
Guterres Marmitt G, Seco J, Langendijk JA, Knopf A, Spadea MF, Both S, AI driven
developments in Radiotherapy, ECMP 2022, Dublin, Ireland
[Oral presentation] Thummerer A, Seller Oria C, Visser S, Zaffino P, Meijers A,

Wijsman R, Guterres Marmitt G, Seco J, Langendijk JA, Knopf A, Spadea MF, Both
S, 4D dose reconstruction on deep learning based 4D synthetic CTs generated
from CBCTs, ECMP 2022, Dublin, Ireland
[Poster] Thummerer A, Seller Oria C, Visser S, Zaffino P, Meijers A, Wijsman R,

Guterres Marmitt G, Seco J, Langendijk JA, Knopf A, Spadea MF, Both S, Deep
learning based 4D synthetic CTs generated from CBCTs for proton dose calculati-
ons in adaptive proton therapy, AAPM 2022, Washington DC, USA
[Poster] Thummerer A, Seller Oria C, Visser S, Zaffino P, Guterres Marmitt G,

Seco J, Langendijk JA, Knopf A, Spadea MF, Both S, Deep learning based 4D-syn-
thetic CTs from CBCTs for proton dose calculations in adaptive proton therapy
workflows, PTCOG 2022, Miami, USA
[Poster] Thummerer A, Seller Oria C, Zaffino P, Veldman K, Meijers A, Seco J,

Wijsman R, Langendijk JA, Knopf A, Spadea MF, Both S, Deep learning based 4D
synthetic CTs for daily proton dose calculations in lung cancer patients, ESTRO
2022, Copenhagen, Denmark A
[Poster] Thummerer A, Zaffino P, Guterres Marmitt G, Meijers A, Seco J, Langen-
dijk JA, Knopf A, Spadea MF, Both S, Evaluation of patient specific synthetic CT
correction methods for lung tissue, PTOCG 2021, online
[Oral presentation] Thummerer A, Zaffino P, Seller Oria C, Meijers A, Guterres

Marmitt G, Seco J, Langendijk JA, Knopf A, Spadea MF, Both S, OC-0478: Neural
Network Based Synthetic CTs for Adaptive Proton Therapy of Lung Cancer, ES-
TRO 2021, Madrid, Spain
[Oral presentation] Thummerer A, Zaffino P, Meijers A, Guterres Marmitt G, Seco

J, Steenbakkers RJHM, Langendijk JA, Both S, Spadea MF, Knopf AC , Synthetic
CTs for proton dose calculations in adaptive proton therapy, NVKF 2021, online
[Poster] Thummerer A, Zaffino P, Meijers A, Guterres Marmitt G, Seco J, Steen-

bakkers RJHM, Langendijk JA, Both S, Spadea MF, Knopf AC, PD-0309: Compa-
rison of CBCT based synthetic CT methods for adaptive proton therapy, ESTRO
2020, online
211
Appendices
[Poster] Thummerer A, Kuess P, Georg M, Clausen M, EP-1731: A comparative

study of passive detectors in active scanning proton beams, ESTRO 37 2018, Bar-
celona, Spain
[Oral presentation] Thummerer A, Kuess P, Georg M, Clausen M, Comparison

of passive detectors in active scanning proton beams, ÖGMP-Jahrestagung 2018
(Austrian Society for Medical Physics)
212

Complete Thesis 004

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Complete Thesis 004

Uploaded by

Copyright:

Available Formats

University of Groningen

Deep learning-based cone beam CT correction for adaptive proton therapy

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Download date: 01-05-2023

Printing of this thesis was financially supported by Elekta.

© Copyright, A. Thummerer, Groningen, 2023.

to obtain the degree of PhD at the

This thesis will be defended in public on

Monday 27 March 2023 at 09:00 hours

born on 10 September 1993

Chapter 1 General Introduction 7

Chapter 2 Comparison of CBCT based synthetic CT methods 21

Chapter 4 Range probing as a quality control tool for CBCT‐based 87

Chapter 5 Clinical suitability of deep learning based synthetic CTs 107

Chapter 6 Deep learning based 4D-synthetic CTs from sparse-view 139

Chapter 7 Summarizing Discussion 177

Appendices Acronyms 198

1.1 The radiotherapy workflow

depth depth depth

Figure 2 a) conventional vs. b) adaptive treatment workflow.

1.1.1 Imaging modalities

Volumetric imaging is the backbone of radiotherapy and is essential to visualize

CT imaging is the standard imaging modality used in radiotherapy. Images are

diotherapy, CT imaging is fundamental for dose calculations, which are necessary

1.1.2 Patient setup

Cone-beam computed tomography (CBCT) is an imaging modality that can be

1.1.3 Adaptation stage

1.2 CBCT correction techniques

1.2.1 Density overrides

1.2.2 Deformable image registration-based corrections

1.2.3 Prior-based scatter estimation

This method aims at correcting CBCT projections, instead of the reconstructed

1.2.4 Monte Carlo-based scatter correction

1.2.5 Analytical image-based correction

1.2.6 Neural network-based correction

1.3 Thesis outline

Chapter 2, “Comparison of CBCT based synthetic CT methods suitable for pro-

Chapters 2 and 3 propose deep learning-based methods to generate accurate syn-

thetic CTs. Proton radiography is an imaging modality that utilizes transmitting

Chapter 7 provides a summarizing discussion that highlights the main findings,

py. Int J Radiat Oncol Biol Phys. 2017;99(4):994-1003. doi:10.1016/j.ijrobp.2017.04.023

application to proton therapy. British Journal of Radiology. 2015;88(1053). doi:10.1259/

1 Department of Radiation Oncology, University Medical Center Groningen

Adaptive radiotherapy (ART) intends to improve radiation treatments by moni-

and was outperformed by a deformable image registration method. Results from

CT for sCT generation once the model is trained.

LUT based correction Kurz et al 2015 head and neck -

Histogram matching Arai et al 2017 phantoms, head and neck -

Veiga et al 2015, 2016,

Projection-based Park et al 2015,

Deep convolutional Hansen et al 2018,

Table 1. Overview of previously investigated CBCT correction/conversion methods in the context of

A comparison of the above-mentioned methods is challenging since many investi-

2. Material and methods

2.2. sCT methods

2.2.1. Neural network method (NN)

CBCT using convolutional layers. This encoding path is followed by a decoding

As described by Spadea et al (2019), three individual sets of weights were trained

A 3-fold cross-validation approach was followed by randomly splitting our dataset

2.2.2. Deformable image registration method (DIR)

2.2.3. Analytical image-based correction method (AIC)

2.3. Image evaluation