Professional Documents
Culture Documents
碩士論文
Department of Mathematics
College of Science
Master Thesis
電腦斷層之小波分析身體部位分割與深度學習卵巢腫
瘤分類
黃敬倫
Jinglun Huang
指導教授: 王偉仲 博士
中華民國 110 年 8 月
August, 2021
doi:10.6342/NTU202102364
ii doi:10.6342/NTU202102364
Acknowledgements
感謝林口長庚醫院林吉晉醫師提供資料集及協助身體部位標注讓本論文能夠
順利完成。
iii doi:10.6342/NTU202102364
iv doi:10.6342/NTU202102364
摘要
卵巢癌是婦癌當中最危險的一種。它不但很難被早期偵測,而且沒有警訊。
當一個病人因某些症狀而諮詢婦產科醫師時,卵巢腫瘤經常已經遍佈整個盆腔甚
至腹腔了。最近幾年,基於醫學及技術發展,許多癌症的死亡率已經有下降或持
平的趨勢,但卵巢癌的致死率卻不降反升。而電腦斷層是一種經常用於診斷卵巢
癌及許多其他種類癌症的三維影像。故建立在電腦斷層影像上的分析流程及機器
學習分類模型可以廣泛使用在許多情境。
在醫學影像分析中,機器學習的流程經常需要器官及腫瘤的影像標註,但卵
巢及其腫瘤十分難以標註。故本論文著力在電腦斷層上於建構一套不使用卵巢及
其腫瘤標註的分析流程。為此,我們發展一套基於小波轉換的身體部位分割演算
法來找到身體部位分割點。該演算法只需要使用少量的身體部位標註。在我們
的實驗中,以該演算法預測得到的六個身體部位分割點誤差中位數約為兩公分,
作為資料前處理已足夠精確。我們將該演算法使用在林口長庚紀念醫院中的資
例及 161 個惡性腫瘤案例。接著我們訓練深度學習模型並給出交叉驗證的結果。
總體而言,測試集上接收者操作特徵曲線下面積的平均數及標準差為 0.8129 及
0.0154,該結果顯示我們的分析流程具有分類卵巢惡性腫瘤及良性腫瘤的潛力。
關鍵字:電腦斷層、小波轉換、身體部位分割、深度學習、卵巢腫瘤分類
v doi:10.6342/NTU202102364
vi doi:10.6342/NTU202102364
Abstract
Ovarian cancer is one of the most dangerous cancers for women. Moreover, it is
hard to early detect and it has no warning sign. When a patient consults a gynecologist
due to some symptoms, the ovarian tumor usually spread within the pelvis and even the
abdomen. In recent years, the mortality of ovarian cancer is increasing while mortality of
some other kinds of cancer is either decreasing or not increasing due to the improvement
of medical science and techniques. Computed tomography (CT) is one kind of three
dimensional image and is used for the diagnosis of ovarian cancer as well as many other
Image annotations of organs and tumors are usually needed in machine learning
workflow in medical image analysis, but image annotations of ovaries and ovarian tu
mors are hard to label. Therefore, this thesis aims to build up a pipeline for distinguishing
ovarian cancerous tumors from ovarian benign tumors in CT images by deep learning
vii doi:10.6342/NTU202102364
models without using image annotations of ovaries and ovarian tumors. For this purpose,
we develop a body parts partition algorithm to find the breakpoints of body parts by using
wavelet transform. Only a few body parts annotations are needed in this algorithm. In
our experiments, the prediction errors of 6 body part breakpoints are of medians approx
imately 2 cm, which is accurate enough for data preprocessing. We use our algorithm to
crop image to pelvis and lower abdomen, which are related to ovarian cancer on the dataset
from Linkou ChangGeng Memorial Hospital. The dataset consists of 161 cancerous cases
and 240 benign cases. Then we train deep learning models and provide crossvalidation
results. Overall, the mean test ROCAUC is 0.8129 and the standard deviation is 0.0154,
which shows the pipeline has the potential to distinguish cancerous ovarian tumors from
viii doi:10.6342/NTU202102364
Contents
Page
Acknowledgements iii
摘要 v
Abstract vii
Contents ix
List of Tables xv
Chapter 1 Introduction 1
Chapter 2 Method 5
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
ix doi:10.6342/NTU202102364
2.4.2 Feature Extraction by Wavelet Transforms . . . . . . . . . . . . . . 20
2.6.5 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.4 Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Chapter 4 Conclusion 61
References 63
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
x doi:10.6342/NTU202102364
Appendix B — CrossValidation Results 71
B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
B.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
xi doi:10.6342/NTU202102364
xii doi:10.6342/NTU202102364
List of Figures
3.1 An example of a CT slice in axial view and its bone segmentation obtained
by global thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Bone segmentation and contrastenhanced structure segmentation. In both
plots, black pixels mean the background. White pixels in the left plot
mean the bone pixels and that in the right plot mean the contrastenhanced
structure. Gray pixels in the left plot mean the contrastenhanced structure
and that in the right plot mean the bone pixels. . . . . . . . . . . . . . . . 37
3.3 Bone segmentation and contrastenhanced structure segmentation. The
white pixels mean the foreground and the gray and black pixels mean the
background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Original axial images of case 1. . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Bone segmentation in axial view of case 1. . . . . . . . . . . . . . . . . . 39
3.6 Bone signal and bone segmentation in coronal view of case 1. . . . . . . 39
3.7 Original axial images of case 2. . . . . . . . . . . . . . . . . . . . . . . . 40
3.8 Bone segmentation in axial view of case 2. . . . . . . . . . . . . . . . . . 41
3.9 Bone signal and bone segmentation in coronal view of case 2. . . . . . . 41
3.10 Original axial images of case 1. . . . . . . . . . . . . . . . . . . . . . . . 43
3.11 Internal air segmentation in axial view of case 1. . . . . . . . . . . . . . 43
3.12 Air segmentation in axial view of case 1. . . . . . . . . . . . . . . . . . . 44
3.13 Original axial images of case 2. . . . . . . . . . . . . . . . . . . . . . . . 44
3.14 Internal air segmentation in axial view of case 2. . . . . . . . . . . . . . 45
3.15 Air segmentation in axial view of case 2. . . . . . . . . . . . . . . . . . . 45
xiii doi:10.6342/NTU202102364
3.16 Bounds of body parts of the reference image. . . . . . . . . . . . . . . . 46
3.17 Air segmentation in axial view of the reference image. . . . . . . . . . . 47
3.18 Bone segmentation in axial view of the reference image. . . . . . . . . . 47
3.19 Signals transformed by Gaussian filter of the reference case. . . . . . . . 48
3.20 Signals transformed by Gaussian filter of case 1. . . . . . . . . . . . . . . 49
3.21 Signals transformed by Gaussian filter of case 2. . . . . . . . . . . . . . . 49
3.22 Bone signal and its features extracted by wavelet transforms. . . . . . . . 51
3.23 Timescale plots of case 1. . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.24 Timescale plots of case 2. . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.25 Registration results of case 1. The bone structure of the reference image
and the transformed bone structure of the moving image. . . . . . . . . . 53
3.26 Registration results of case 1. The air structure of the reference image and
the transformed air structure of the moving image. . . . . . . . . . . . . . 54
3.27 Registration results of case 2. The bone structure of the reference image
and the transformed bone structure of the moving image. . . . . . . . . . 55
3.28 Registration results of case 2. The air structure of the reference image and
the transformed air structure of the moving image. . . . . . . . . . . . . . 55
3.29 Body parts partition of case 1. . . . . . . . . . . . . . . . . . . . . . . . 56
3.30 Body parts partition of case 2. . . . . . . . . . . . . . . . . . . . . . . . 56
xiv doi:10.6342/NTU202102364
List of Tables
3.1 Error distribution of each body parts. We denote lower, middle, upper,
pelvis, and abdomen by l, m, u, pel, and abd. . . . . . . . . . . . . . . . . 57
3.2 Error distribution of each body parts after excluding outliers. We denote
lower, middle, upper, pelvis, and abdomen by l, m, u, pel, and abd. . . . . 58
3.3 Mean AUCs for different stategies. This table shows means and standard
deviations of validation AUCs and test AUCs. The means are standard
deviations are computed from 5fold cross validation results. . . . . . . . 58
B.1 This table shows metrics obtained in details. The threshold is simply cho
sen as 0.5 for computing accuracy. Here, we denote validation, accuracy,
upper pelvis, and lower abdomen by val, acc, upel, labd, respectively,
for short. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
xv doi:10.6342/NTU202102364
xvi doi:10.6342/NTU202102364
Chapter 1 Introduction
Ovarian cancer is one of the most dangerous cancers for women. In Taiwan, the num
ber of death in ovarian cancer is ranked number seven in female cancer. In recent years,
due to the improvement of medical science and techniques, mortality of some other kinds
of cancer is either decreasing or not increasing. However, the mortality of ovarian can
cer is increasing. According to the statistics of Ministry of Health and Welfare (R.O.C.),
the mortality of ovarian cancer is even higher than cervical cancer in 2020 (updating in
2021/06/18). Difficulty in early detection is a reason why the mortality of ovarian cancer
is still increasing. Some other kinds of cancer often follow some warning signs, such as
pain, bleeding, changes in physical appearances, or other symptoms. For example, blad
der cancer usually cause hematuria and breast cancer usually cause some lumps in the
breast or underarm. However, ovarian cancer is often difficult to detect. When a patient
gets some warning signs and consults an gynecologist, for ovarian cancer, the cancerous
tumor has often spread within pelvic or even abdomen. Moreover, the fiveyear survival
rate of a ovarian cancer patient in early stage is far higher than that of late stage and the
medical cost for early detection of cancer is lower than that of late stage. Therefore, early
mon tool used to detect ovarian cancer as well as many other cancers. Moreover, CT
1 doi:10.6342/NTU202102364
is one of the most general medical image, so an analysis pipeline for CT images can be
widely used in many situations. In this work, we used CT scan as the inputs of our analysis
pipeline.
In medical science, there were researches that aimed at ovarian tumors. S. E. Jung
et al. [10] aimed to find features in CT scan for the use in differential diagnosis. U. R.
Acharya et al. [1] extracted features, such as deviation and entropy, from ultrasound using
tree. A. Vlahou et al. [21] used clinical data to established a decision tree for diagnosis of
ovarian cancer.
Although machine learning became a popular word in medical image analysis, clas
reason is that ovaries and ovarian tumors are difficult to segment, even by manual seg
mentation. Locations of ovaries as well as pelvis organs in CT images vary from case to
case. Moreover, ovary is a small organ, and sometimes the boundary of a ovary in CT is
not clear. Hence, it is hard to define the ground truth of annotations of ovaries and ovarian
tumors. Also, labeling ovaries and ovarian tumors is timeconsuming. Therefore, an anal
ysis pipeline that does not depends on image annotations of ovaries and ovarian tumors is
Many machine learning algorithms are based on image annotation. One may say
convolutional neural network (CNN) didn’t strongly depend on image annotation; instead,
2 doi:10.6342/NTU202102364
CNN learnt to extract useful features from images during training process. However,
training a CNN for CT tasks usually needs image annotations as a mask or locating the
organs. Cropping to region of interest (ROI) is a standard technique for locating the organs.
Medical images often contains a lot of organs or tissues and hence includes amounts of
information, but many problems usually aim at one or a few organs. Moreover, CT images
may capture different body parts. Hence, using the whole CT images as inputs of models
does not make sense. Even though a model trained by using whole images gets good
performance, it may not be explainable and is hard to use in medical practice. Therefore,
a proper preprocessing algorithm that crops some meaningful parts from CT images is a
In this thesis, we design an analysis pipeline that does not depend on image annota
tions of ovaries and ovarian tumors. Instead, we use bounding boxes of ROI as inputs of
DL model. The ROI can be the pelvis or the union of the pelvis and the lower abdomen,
since the lower abdomen may provide some features related to ovarian cancer, such as
ascites. Except for avoiding the difficulty of image annotation, cropping a bounding box
is easier to design a rulebased method than image segmentation which is usually done by
There are several types of ovarian tumors, such as benign, cancerous, and some other
types of tumors, but, in this work, we aim at distinguishing cancerous ovarian tumors from
benign ones by using CT images. We design an algorithm for the body parts partition by
using wavelet transform and crop the pelvis and lower abdomen from CT images. The
3 doi:10.6342/NTU202102364
2. Segment bone, and internal air
3. Construct the bone sequence and air sequence from the segmentation of bone and
internal air
4. Register the body parts by aligning bone sequence and air sequence to a reference
case
5. Propagate the bodypart breakpoints from the reference case to the query case
and inference, which only needs a few body parts labels rather than image annotations of
We apply our algorithm to the dataset from Linkou Chang Gung Memorial Hospi
tal, train and evaluate deep learning models for ovarian tumors classification by cross
validation and testing. We compared the bodypart breakpoints obtained from our body
parts partition algorithm to the ground truth labeled by a radiologist (Dr. Lin, Gigin). The
mean absolute errors in lower pelvis, upper pelvis, middle abdomen, lower chest, middle
chest, and upper chest are approximately 2.48 cm, 1.73 cm, 1.73 cm, 1.25 cm, 2.37 cm,
and 1.42 cm after we remove 4 outliers, which shows the body parts partition algorithm
is accurate for data preprocessing. We use crossvalidation to train and evaluate the deep
learning models. By cropping CTs to pelvis only, we obtain the mean test AUCs 0.8129
with the standard deviation 0.0154, which shows the analysis pipeline has the potential to
4 doi:10.6342/NTU202102364
Chapter 2 Method
In this section, we introduce our body parts partition algorithm and the training details
of deep learning models. First, we give an overview of our algorithm. Then we talk about
the details in each step of this algorithm. In this thesis, we define images as follow for
convenience.
m × n × p.
between two voxels in xaxis. Similar definition for spacing in yaxis and spacing
in zaxis.
Remark 2.2.
volume for visualization. Figure 2.1 shows three different views of a CT image.
5 doi:10.6342/NTU202102364
3. The spacing is saved in the meta data of and the size of a CT image is usually
and
2.1 Overview
We give an overview of our analysis pipeline in this section. Due to the difficulty
of labeling image annotations of ovaries and ovarian tumors, using a bounding box of a
proper region instead of image segmentation is more applicable. In this work, the main
6 doi:10.6342/NTU202102364
idea of body parts partition contains the following three steps. First, we segment the bone
and the internal air in a CT image. Second, we compute the numbers of bone pixels and
internal air pixels in each axial slices for defining the bone sequence and the internal air
sequence. The formal definition of the bone sequence and the air sequence are defined
in Definition 2.3. Note that the bone sequence and the internal air sequence (or simply
the air sequence) are two signals that carry the body parts information of the CT image.
Hence, the third step is to align these two signals between cases registration of body parts.
Eventually, we only need a few body parts annotations to automatically find the break
points of body parts of any query CT image by propagating the body parts annotations
bone segmentation of I and let the set of bone voxels B = S −1 ({1}). The bone sequence
bk = |{(x, y, z) ∈ B : z = k}| .
lem. Let f, g : [0, 1] → R be two functions with some proper assumptions, such as
f, g ∈ L2 ([0, 1]), being continuous or smooth. The registration problem is, roughly say
ing, finding a onetoone, increasing function h : [0, 1] → [0, 1] such that f ≈ g ◦ h under
some sense, such as being close under L2 distance. h is called a warping function and is
7 doi:10.6342/NTU202102364
There are different formulations of the registration problem between f and g, for
example, landmarkbased matching [2]. However, since each CT image usually scan dif
ferent body parts, landmarkbased matching is difficult in our case and assuming that
codomain of h is [0, 1] is not applicable. Moreover, bone sequence and air sequence are
two types of information for a single CT, so they shall be taken into consideration at the
same time. Although bone sequences and air sequences are discrete data, it is convenient
to consider them as continuous signals. Therefore, in this thesis, the registration problem
Every CT image has its bone signal and its air signal and we are going to align these
two signals at the same time, which is an important difference between our problem and
other curve registration settings. Assume that f1 and f2 are bone signals and g1 and g2
is expected to have abilities to denoise and to represent the patterns of the bone signals
and air signals. The codomain of T is not necessary a subset of univariate functions.
Then we define a distance between the pairs (T (f1 ), T (g1 )) and (T (f2 ◦ h), T (g2 ◦ h)) by
using 2norm or its generalization. Finally, we register body parts between CT images by
Now, we summarize the main ideas of body parts partition in the following steps.
Preparation Stage
1. Choose a reference CT image and obtain the breakpoints of body parts of interest
8 doi:10.6342/NTU202102364
4. Apply some proper transforms to the bone sequence and the air sequence and regard
We obtain reference signals and body parts breakpoints from the preparation stage. The
reference image is expected to include all body parts of interest and to represent the im
In functional data registration, we usually have a reference signal and a moving sig
nal. We transform the moving signal into the coordinates of the reference signal. In our
problem, we treat the transformation outputs of the bone signal and the air signal of the
reference image as reference signals. At the inference stage, if we are given a query CT
image, then the transformation outputs of the bone signal and the air signal of the query
In the inference stage, we label the body parts for a query CT image in the following
4. Apply some proper transforms to the bone sequence and the air sequence and regard
5. Compute the optimal warping function by minimizing the distance between the
6. Get the body parts breakpoints by propagating the breakpoints from the reference
9 doi:10.6342/NTU202102364
signal to the moving signal
By these steps, we can propagate the body parts breakpoints from only a few labeled
images to lots of query images. Finally, we use this body parts partition algorithm to
preprocess CT images and then use techniques in deep learning to establish models.
may be used as a reference information for surgery. There are already some researches on
this topic. Moreover, bone segmentation is one step in our body parts partition algorithm
since the bone sequence relies on the bone segmentation. Here are some related works
energy function consisting of perpixel term and boundary term, Y. Boykov and G. Funka
Lea [3] could find a segmentation where pixels are labeled by the same class if they have
high similarity. M. Krčah et al. [12] extend the work in [3] to a another formulation that
is more suitable for 3D images, which is more applicable to bone segmentation in CT.
By assuming that the intensities of bone and nonbone voxels are sampled from a
by [22]. Starting with an initial segmentation (obtained from global thresholding, for
instance), this method reclassified pixels in bone class by Bayesian decision rule to update
10 doi:10.6342/NTU202102364
Moreover, H. Lamecker et al. [13] uses the statistical shape model (SSM) to segment
pelvic bone. SSM aims to find a deformation from one shape to another shape. Based on
the deformation, SSM can transform shapes into a base space and obtain the distance
between surfaces. The distance can be used to estimate the average shape as well as the
variation in shape.
need a extremely accurate segmentation for the next step and the computational efficiency
may be more important in medical practices. So, we try some other methods for bone
segmentation. In fact, if the bone sequence from a bone segmentation can present its
essential patterns, then we may accept the segmentation. We have tried the following
whether a voxel shall be classified into object class by comparing the intensity of the voxel
segmentation S of I defined by
1 if I(x, y, z) > t
S(x, y, z) =
0 if I(x, y, z) ≤ t
11 doi:10.6342/NTU202102364
In our case, a voxel in CT is classified into the bone class by global thresholding if
its grayscale value is larger than t, otherwise it is classified into the nonbone class.
Global thresholding is useful when the intensity of the object is very different from the
background. Since bone voxels are very bright in CT (in other words, they are of high gray
Unit) due to medical properties of bone. Hence, global thresholding is a common method
to segment bone. However, since the CT images we use are venous phase CT images,
this approach must include the contrastenhanced structures if any. Therefore, we also
images as follow.
Definition 2.5 (Voxels Connectivity). Let (x, y, z) and (x′ , y ′ , z ′ ) be voxels and let ∆x =
12 doi:10.6342/NTU202102364
For voxels u and v, we say u ∼ v if either u = v or ∃ v1 , v2 , ..., vk−1 ∈ F such that
The equivalence classes of ∼ defined in Remark 2.6 are called a connected components
are not connected to the bone. In fact, soft tissues are usually covered by fat, so they are
not connected to the bone. Moreover, the volume of bone is larger than that of contrast
enhanced structures in general. Hence, if we further assume that the bone is a connected
component of the segmentation obtained by global thresholding, then we may compute the
largest component of the segmentation to obtain the bone segmentation without including
contrastenhanced structures.
bone in images. They may just near to each other, but in the sense of voxels, they are
13 doi:10.6342/NTU202102364
Before we introduce the details of this approach, we assume the human bone is con
We may further define the weight for each edge as follow. For each edge uv ∈ E, define
|I(u)−I(v)|
the weight of uv to be e σ , where σ > 0 is a parameter. σ is usually chosen to be
the standard deviation of |I(u) − I(v)| over all uv ∈ E. Then G(V, E) is a weighted
A be the adjacency matrix and D be the degree matrix of G. The Laplacian matrix of G
is given by L = D − A. Shi, Jianbo and Malik, Jitendra [18] propose a method for image
Lx = λDx. (2.1)
and use spectral clustering to revise the initial segmentation. We may construct a graph as
above with and solve the generalized eigenvalue problem on the corresponding Laplacian
matrix to obtain two clusters on the graph. By choosing the larger cluster, we obtain a
revised segmentation. This approach take more information into consideration and can
remove some contrastenhanced structures that connect to bone in the sense of voxels.
However, it is still not good enough since the bone itself may be not connected in the
sense of voxels, which violates the basic assumption of connectivity of graph in this ap
14 doi:10.6342/NTU202102364
next section.
Since the bone may not be connected in the sense of voxels, we consider the hysteresis
thresholding that proposed in [4]. This approach perform twostage thresholding to avoid
some nonobject voxels. Before we introduce hysteresis thresholding, let’s define the
Definition 2.8. Let F and G be two sets of voxels and let u be a voxel.
Now we introduce the hysteresis thresholding. Let s, t ∈ R with s < t. At the first
thresholds s and t. Let F = S −1 ({1}) and G = T −1 ({1}). At the second stage, the
[
{C : C is a connected component of F and C is connected to G}.
We applied the hysteresis thresholding with the high threshold t = 800 HU and the
low threshold s = 400 HU to obtain bone segmentation. An observation about true bone
and contrastenhanced soft tissue is that a true bone voxel is often of intensity greater
than 1000 HU while the contrastenhanced soft tissue is often of intensity smaller than
800 HU. Although ribs are usually of intensity smaller than 800 HU and are filter out in
the first stage, they can be usually recovered at the second stage due to the connectivity
15 doi:10.6342/NTU202102364
between ribs and spine. Moreover, 400 HU is a common threshold for bone segmentation.
Air is a crucial information in a CT image. We can easily recognize the lung since it
contains a lot of air which has low intensity and is black in the image. Moreover, the inter
nal air signal is another important information to align the body parts between CT images.
Therefore, we introduce some image processing techniques to segment the internal air. In
this section, we assume that the external air is connected, the external air and the internal
air are not connected, and the volume of the external air is greater than the volume of the
Here are two notes about these assumptions. First, the the internal air and the external
air are not connected in CT in almost all cases, even though the respiratory system is a
path connects the internal air and the external air. Second, the 3D volume of external air
is greater than that of internal air in CT, even though it may be false in a specific single
axial slice.
The main idea to segment internal air is to segment the external air first and obtain the
internal air by removing the external air. We use the grayscale value −700 as a threshold
to apply global thresholding to segment air. In this case, voxels of intensity smaller than
−700 HU are what we need. By assumption 1 and 2, the external air is a connected
component of the air segmentation, and assumption 3 implies the external air is the largest
component of the air segmentation. Hence, we compute the largest component of the air
segmentation to obtain the external air. Finally, we obtain the internal air by removing the
16 doi:10.6342/NTU202102364
external air from the air segmentation.
The performance of this method is good in almost all cases. As we discussed in previ
ous sections, contrastenhanced structures may be confused with bone, in either intensity
there is no other tissue would be confused with air. Therefore, the typical approaches in
formations
Our goal is body parts registration by align bone signals and air signals. Hence, we
will introduce the bone sequences and the internal sequences in this section. By calculating
the numbers of bone pixels and internal air pixels in each axial slice, we obtain two se
quences indexed by the zaxis coordinate of CT. The formal definition of bone sequences
and air sequences have shown in Definition 2.3. The bone sequences and air sequences
can be regarded as discrete version of bone signals and air signals which are usually con
sidered in continuous manner. Unfortunately, the two sequences might be noisy due to
• We are using thick cut CT scan, which means the spacing of zaxis was 5 mm.
• We are using venous phase CT scan, which means the bone segmentation may in
17 doi:10.6342/NTU202102364
• The number of bone pixels is depending on the amount of calcium in the bone of a
patient.
To deal with these two noisy signals, we apply transformations to them for denoising
and recognizing patterns. We introduce the Gaussian filters and wavelet transform in the
following sections.
Gaussian filters are widely used in signal processing. One of the most common use
of a Gaussian filter is the ability to denoise. We use Gaussian filters to smooth our signals.
Z
(f1 ∗ f2 )(x) = f1 (t)f2 (t − x)dt, x ∈ R,
R
provided the integral exists. Let K : R → R be a L1 function and define T (f ) = f ∗K, for
f ∈ Lp (R). In which case, we call T a convolution transformation and K the kernel. Then
function. Then [T (f )]′ = f ∗ K ′ for f ∈ Lp (R). That is, the convolution transformation
Z
∞
Theorem 2.9 (Zygmund [17]). Let K ∈ L (R) ∩ L (R) with
1
K = 1, and suppose
R
18 doi:10.6342/NTU202102364
K(x) = o(1/|x|) as |x| → ∞. For ϵ > 0, define
1 x
Kϵ (x) = K( )
ϵ ϵ
of f .
This theorem is roughly saying a kernel function that decay rapidly in infinity induces
1
As a special case, we let K(x) = √ e−x be the Gaussian function, then we call
2
π
the convolution transformation induced by the kernel Kϵ a Gaussian filter. Note that K ∈
L1 (R) with
Z Z Z ∞
1
e−x dx = 1.
2
∥K∥1 = |K| = K=√
R R π −∞
1
Since K(x) ≤ √ , we have that K ∈ L∞ (R). Finally, since
π
K(x) |x|
= √ e−x → 0 as |x| → ∞,
2
1/|x| π
2.9. Therefore, Gaussian filters are approximations of identity for continuous functions
Moreover, since Kϵ is smooth, the convolution f ∗Kϵ is also smooth. Therefore, Gaussian
a continuous function f with a lower noise level. Under the sense of convergence, we
may say the Gaussian filter preserves some patterns of f . Smoothness is an important
reason why we use Gaussian filters. Smooth approximation obtained by Gaussian filters
19 doi:10.6342/NTU202102364
make the L2 distance of bone sequences (or air sequences) between patients robuster. As
an discrete approximation of Gaussian function is used since the true Gaussian function
It is also used for edge detection in image processing. We introduce wavelet transform in
this section.
Z ∞
Let ψ ∈ L (R) with
2
ψ = 0. Define ψ ab by
−∞
t−b
ψ ab (t) = a−1/2 ψ( ), for a > 0, b ∈ R (2.2)
a
and ψjk by
Z ∞
W (f )(a, b) = f ψ ab , for a > 0, b ∈ R (2.4)
−∞
Z ∞
wjk = f ψjk , for j, k ∈ Z. (2.5)
−∞
We say ψ is the mother wavelet and ψ ab and ψjk are wavelets. If {ψjk }j,k∈Z forms an
orthonormal basis for L2 (R), then we say ψ is an orthonormal wavelet and wjk is called
20 doi:10.6342/NTU202102364
the wavelet coefficient.
Although not every mother wavelet is an orthonormal wavelet, wavelet transform still
extract some important features of a given function. Moreover, since the scale factor (a and
j) makes the wavelets be of different widths, wavelet transform can capture information
in different scale. Due to the ability of wavelet transform to handle multiscale problems,
it can be used to recognize the patterns of a signal. Applying wavelet transform by using
wavelet with small width tends to extract local features, which is often high frequency
wave and is possible noisy. And applying wavelet transform by using a wavelet with large
width tends to extract wide features, which is often low frequency wave and recognize only
Let a1 , a2 , ..., ap > 0 be some scale factors with p ∈ N. For signal f , define the
Note that the scale factors ai ’s are expected to extract features from input signal and filter
In this work, we use wavelet transform with Mexican hat (or the Ricker wavelet)
2 2
ψ(t) = √ 1 1 − t2 e−t /2 ,
3π 4
to extract some features from observed bone sequences and internal air sequences. Based
on some observations from our experiments, we find that the scale factors small than 3 are
too noisy and the scale factors larger than 10 tend to detect redundant features. Therefore,
21 doi:10.6342/NTU202102364
we choose the scale factors ai = i + 2 for i = 1, 2, ..., 8. In which case, p = 8.
Curve registration is a crucial step in our pipeline. We align the body parts between
There are many studies in functional data analysis about curve registration. For ex
ample, [6] aligns curves by solving a global minimization problem and [2] matches curves
by aligning landmarks.
However, there are two critical issues in our situation. First, each CT images may
capture different body parts, and hence the bone sequences (and air sequences) usually
contain different landmarks between images. Second, we have two kinds of information
to use, i.e. bone sequences and air sequences, so the formulation in [6] cannot be directly
used.
Even though all CT images are processed so that they are of size 512 × 512 in axial
view and are of spacing 5 mm in zaxis for thick cut images, they still have its essential
scale. For example, heights and body proportions are different over cases. So CT images
have various numbers of axial slices even if some of them capture the same body parts.
warping function, where a and b are in some compact intervals, respectively, to match the
Here are two notes about our method. First, affine transformation cannot adjust the
body proportion. However, finding a nonlinear transformation usually needs some accu
22 doi:10.6342/NTU202102364
rate landmarks, but landmarks are not easy to locate and to recognize due to different body
parts between CT images as previous discussion. More precisely, although [16] proposes
a good approach to finding landmarks, selection of landmarks for registration in our situa
tion is still an issue. Therefore, we don’t consider nonlinear transformations in this work.
Second, we have considered 2D or 3D bone registration, but the registration is hard to do.
Although there is a standard brain in medical science, the bone structures are so various
problem is very nonconvex. Nevertheless, standard bone signals and standard air signals
Assume that f1 and f2 are bone signals and g1 and g2 are air signals. The notation ∥·∥2
problem:
where
contained in (0, ∞)
• λ ∈ [0, 1] is a parameter
23 doi:10.6342/NTU202102364
If T : F → F p for some p ∈ N, say
(1) (1)
T (f1 ) = (w1 , w2 , ..., wp(1) )
(1) (1)
T (g1 ) = (v1 , v2 , ..., vp(1) )
(2) (2)
T (f2 ◦ h) = (w1 , w2 , ..., wp(2) )
(2) (2)
T (g2 ◦ h) = (v1 , v2 , ..., vp(2) )
(j) (j)
where wi ∈ F and vi ∈ F , for i = 1, 2, ..., p and j = 1, 2, then we align body parts by
X
p
2
2
(1) (2)
(1) (2)
min (1 − λ)
wi − wi
+ λ
vi − vi
(2.8)
h∈A 2 2
i=1
where
contained in (0, ∞)
• λ ∈ [0, 1] is a parameter
The interval I is chosen to be [0.8, 1.2] in this work. Note that for b with large absolute
value |b|, the warping function h will make f1 ◦ h = 0 and g2 ◦ h = 0 on [0, 1] since the
to a compact interval that depends on a. Since heights of almost all people are in some
fixed range, we may solve the minimization problem in a reasonable time even by grid
24 doi:10.6342/NTU202102364
search. Moreover, grid search can be easily parallelly computed, so we may accelerate the
Deep learning has been the most popular topic in artificial intelligence. It is a kind of
machine learning algorithm which establishes a model from data. Moreover, deep learning
is a feasible framework so that it can be used in both supervised learning and unsupervised
Machine learning has been widely used in medical image analysis and it obtains many
good results in different problems. Classification, segmentation, and detection are main
topics in medical image analysis. Classifying normal and abnormal CT, segmenting organs
and tumors in CT, and detecting lesions are examples of these topics, respectively. All of
age annotation. For example, radiomics is a feature extraction approach that relies on
image annotation. By using both image and image annotation, radiomics computes the
intensitybased and texturebased features in ROI and analyze shapebased features of the
geometry properties of ROI. With the radiomics features, one may train a machine learn
ing model to classify images. Another approach of using radiomics is to divide a image
into patches and compute feature for each patch. This approach is usually used in classifi
25 doi:10.6342/NTU202102364
cation and the image annotation is used to assign a label for each patch. One possibility is
to label both the organ and tumors. A machine learning algorithm is then used to build up
a model to distinguish patches that contain tumors from patches that does not contain tu
mors. There are several possible choices of machine learning algorithms, such as k nearest
neighbors (KNN), support vector machine (SVM), random forest, XGboost. The outputs
of this approach patchbased results. One may summarize the patchbased results in a
heat map and make conclusions by the heat map. Heat maps not only provide explainable
However, as we previous discussed, image annotation is not a good idea in our prob
lem. Therefore, we use deep learning to establish models rather than traditional machine
learning.
Deep learning has shown its power in many fields by constructing artificial neural
networks (ANN) of multiple layers. We introduce ANN and some related concepts in this
section.
function
linear continuous function, Ai ∈ Rni ×ni−1 , and bi ∈ Rni ×1 . For i = 1, 2, ..., L, define a
26 doi:10.6342/NTU202102364
singlelayer neural network as
Then an ANN is defined as the composition FL ◦ FL−1 ◦ ... ◦ F1 . We denote the ANN
by F (x; θ) where x ∈ Rn0 ×1 and θ is the parameter consisting of Ai ’s and bi ’s for conve
nience. In which case, L is called the number of layers, ni is called the number of neurons
in ith layer, fi is called an activation function, Ai is called a weight matrix (or simply
weight), and bi is called a bias vector (or simply bias), for i = 1, 2, ..., L.
A popular choice of activation functions is the Rectified Linear Unit (ReLU), which
is defined by
x if x ≥ 0
ReLU(x) = max{x, 0} = x∈R
0 if x < 0
A neural network uses ReLU as its activation function at every layer is called a ReLU
network.
The configuration of an ANN means the number of layers and numbers of neurons
of the ANN. With a fixed configuration, we would like to find some good weights and
biases for the dataset and the task. Loss functions are smooth functions that evaluate how
good an ANN is. The choice of loss function depends on tasks. Given a dataset and a task.
We treat the loss function as the objective function and weights and biases as independent
variables. By minimizing the loss function, we may find optimized weights and biases.
The terms parameters and hyperparameters mean variables that are determined by
training and variables that are defined at the first stage, respectively. Note that the con
27 doi:10.6342/NTU202102364
figuration is defined at the first stage and the weights and biases are randomly initialized
and are optimized during training. As a result, weights and biases are parameters and the
A loss function depends on the task we are given. In a classification task, we are
ŷi = F (xi ; θ) for i = 1, 2, ..., N , where θ denotes the parameters. In this case, a typical
The loss function L is called the binary crossentropy loss. Note that ŷi ’s are functions of
XN
θ. Let L(θ) = L(ŷi , yi ). Then, we solve the following minimization problem to find
i=1
good θ.
For the first question, Universal Approximation Theorem for WidthBounded ReLU
Networks has shown that ANN with ReLU activations has the ability to approximate any
Lp function. [15] For the second question, gradient descent is a common method to solve
minimization problem for differentiable objective functions. Gradient descent update the
28 doi:10.6342/NTU202102364
parameters at each iteration by the recursive formula
where θj is the parameter at the jth iteration for j ∈ N and η > 0 is called the step
size or the learning rate. However, since the number of data N and the dimension of
X
N
input data n are usually large. It is difficult to compute L(θ) = L(ŷi , yi ) as well as its
i=1
gradient. Therefore, minibatch stochastic gradient descent is widely used in deep learning
algorithms. In which case, we update the parameter by using a batch of data instead of
using the whole dataset. It has been shown that the minibatch stochastic gradient descent
Although ReLU networks have ability to approximate Lp functions, the other activa
tion functions are still play some roles. In fact, Parametric Rectified Linear Unit (PReLU)
[8], defined by
x if x ≥ 0
PReLU(x) = max{x, 0} + a min{x, 0} = x∈R
ax if x < 0
where a > 0 is a parameter, has been shown that it can improve model fitting in the. Note
that the parameter a can also be trained during the training process and [8] has derived the
update formulation.
ANN is one of the most important concepts in deep learning. However, realworld
applications do not use the plain ANN. Convolutional neural network is an extension of
the concept of ANN that is widely used in computer vision. We introduce convolutional
29 doi:10.6342/NTU202102364
2.6.3 Convolutional Neural Networks
ANN can approximate many functions, but it is not applicable for image analysis
in practice. Computational cost is a big issue in this situation since images are repre
sented by large arrays and so the numbers of neurons. Moreover, using a ANN with dense
weights would destroy the structure of the input being an image. Hence, Convolutional
neural networks (CNNs) are widely used in image tasks. Note that singlelayer neural net
special case of ANN that replaces general linear transformations in ANN by convolution
transformations.
There are many advantages of using CNN in image tasks. We list three advantages
as follow.
2. A convolution transformation extracts local features and takes the location into con
Due to these advantages, CNN has became state of the art in image tasks. In this thesis,
that has validated in ImageNet and train it for the ovarian tumor classification task.
30 doi:10.6342/NTU202102364
2.6.4 Dataset and Data Splitting
We use the dataset maintained from Linkou ChangGung Memorial Hospital to es
tablish models. After some rechecks and quality controls by Dr. Lin, Gigin, the dataset
contains 411 CT images obtained from 401 patients that consists of 161 cancerous cases
In this thesis, we split the patient list into 5fold training sets and an extra test set.
First, we keep a test set of 81 patients. This test set does not involve in any training and
validation process. It is only used in testing. Second, we split the remaining 320 patients
into 5 folds in a stratified manner. Each fold contains 64 patients. We perform a 5fold
crossvalidation in the folds data and finally test the models on test set.
2.6.5 Training
The training process of deep learning model is precisely solving an optimization prob
lem which aims to minimize the loss function. Note that the minimization problem has no
analytic expression and the computation complexity is very high. A common approach to
31 doi:10.6342/NTU202102364
solve this optimization problem is stochastic gradient descent. However, the loss function
is usually very nonconvex, so there are some studies aim to improve the convergence of
stochastic gradient descent and Adam [11] is the most popular one. Adam is a widely used
optimizer that combine AdaGrad [5] and RMSProp [20]. It has been shown that Adam has
high stability and high efficiency. NovoGrad [7] uses layerwise gradient normalization
to improve the performance and combines the advantages of stochastic gradient gradient
By using the registration methods discussed in previous sections, we can extract some
specific body parts from CT images. Two body parts selection strategies are used in pre
As we discussed in Section 2.6.4, we perform a 5fold cross validation and test the
resulting models on the test set. At the ithstage, we treat the ith fold as the validation set
and treat the other 4 folds as training set. We train models on the training set and monitor
the training process by evaluating the models on the validation set for each epoch. After
the training process, we evaluated the models on the test set. Finally, we compared the
We choose Densenet121 [9] with dropout [19] rate 0.2 as the classification model
in the experiments and choose NovoGrad as the optimizer as previous discussed with
learning rate 3 × 10−4 and weight decay 10−4 with a batch size 8. Also, we use the cosine
annealing [14] learning rate scheduler to decay the learning rate during training, where the
decay period was 50 and the minimal learning rate was 10−6 and trained these models for
1000 epochs.
Each data is resampled to spacing 1mm × 1mm × 1mm for a uniform spacing. And
32 doi:10.6342/NTU202102364
we use center crop or padding to obtain an array from an image of uniform spacing. For
the case of cropping to pelvis, the size of the array is 224 × 224 × 192. For the case of
cropping to pelvis and lower abdomen, the size of the array is 224 × 224 × 256. Then we
apply random 3D affine transform Gaussian noise for data augmentation. The parameters
π
of random 3D affine transform are maximal rotation angle , maximal shear range 0.2,
12
maximal translation range 0.1, and trilinear interpolation.
33 doi:10.6342/NTU202102364
34 doi:10.6342/NTU202102364
Chapter 3 Results and Discussion
We assume the bone is a connected component in Section 2.2.2 and Section 2.2.3
since the human bone is connected. However, due to some reasons in CT imaging tech
niques, the assumption does not always hold. For example, the older CT scan usually had
its spacing 1 mm in zaxis, which is called thin cut image. But nowaday, the CT scan usu
ally has its spacing 5 mm in zaxis by using compression techniques for saving storage.
Therefore, the bone in CT scan may not be connected. Hence, we perform a morpho
logical closing to retouch the segmentation obtained from global thresholding in practice.
Although we expect that the closing binary segmentation is connected, the connectivity
Without the connectivity assumption, the bone segmentation obtained by using the
largest component method may fail. In fact, the segmentation usually omits some parts
of bone, such as mandible. In some cases, the segmentation may even omit the spine
and ribs, which means the segmentation preserve pelvis only. In which case, the bone
sequence cannot represent the pattern of the amounts of bone in axial slices. Moreover,
35 doi:10.6342/NTU202102364
the graphcut method assume the graph is connected, which is equivalent to connectivity
method to segment 3D whole bone does not make sense if the connectivity assumption
fails.
Since our main purpose is to obtain a brief approximation of the amount of bone in
each axial slice and that recognize the pattern by the next step, registration by wavelet
transform. Moreover, there are large numbers of CT images to be analyzed and thus the
Hence, we decide to use hysteresis thresholding for its efficiency and good enough per
formance.
As we previous discussed, the choice of the low threshold is based on medical knowl
edge. In fact, the CT value 400 HU is a common threshold for bone. The choice of the high
threshold is based on experimental observation. We observe from the case shown in 3.1
that the bone voxels are usually of intensities greater than 800 while the contrastenhanced
Figure 3.1: An example of a CT slice in axial view and its bone segmentation obtained by
global thresholding
Note that the set of foreground pixels F in the segmentation consists of both spine
(at the center) and contrastenhanced structure (on the right). We can manually separate
36 doi:10.6342/NTU202102364
these two parts in this case and obtain the following results. In fact, there are 16 connected
A = {i ∈ {1, 2, ..., 16} | pi > 650} and B = {i ∈ {1, 2, ..., 16} | pi ≤ 650}.
exactly the subset of bone pixels in F and B is exactly the subset of contrastenhanced
Based on the separation, we plot the histogram of the intensities in the bone set A and
that in the nonbone set B in Figure 3.3. Observe that the intensities of nonbone pixels
is less possible to be higher than a bound, say 650. On the other hand, the intensities of
bone pixels can achieve high intensities, such as 1000. Therefore, we use 800 as the high
bone. Figure 3.4, Figure 3.5, and Figure 3.6 show segmentation results of Case 1.
37 doi:10.6342/NTU202102364
Figure 3.3: Bone segmentation and contrastenhanced structure segmentation. The white
pixels mean the foreground and the gray and black pixels mean the background.
38 doi:10.6342/NTU202102364
Figure 3.5: Bone segmentation in axial view of case 1.
Figure 3.6: Bone signal and bone segmentation in coronal view of case 1.
39 doi:10.6342/NTU202102364
Case 1 is one of regular cases, which means there is no distinct contrastenhanced
structure that interrupts the bone segmentation. Note that there is an M structure in the
pelvis. This fact is highly related to anatomy in medical science. The other patterns in
the bone signal also correspond to some bone structures and body parts. For example, the
numbers of bone pixels are small in the lower abdomen since the only bone in that part is
spine, and the numbers of pixels are increasing as ribs appear in the slices.
Case 2 is one of not regular cases, which means there are some distinct contrast
enhanced structures that interrupt the bone segmentation. CT slices of Case 2 are shown
in Figure 3.7. In fact, there are several contrastenhanced structures in pelvis, and we
may also see the segmentation obtained by hysteresis thresholding includes these non
bone pixels in Figure 3.8. One of our main purpose is to find the pelvis slices in CT scan,
but these contrastenhanced structures interrupt the pattern in bone signal. Accordingly,
the M structure is not perfect and the third peak appears in the bone signal Figure 3.9.
40 doi:10.6342/NTU202102364
Figure 3.8: Bone segmentation in axial view of case 2.
Figure 3.9: Bone signal and bone segmentation in coronal view of case 2.
41 doi:10.6342/NTU202102364
3.2 Internal Air Segmentation
We use the approach described in Section 2.3 to segment internal air. Fortunately,
air signals are not confusing during segmentation process. The most important features of
air is low grayscale values, such as −1000 HU. The other things in CT scan are usually
of grayscale values larger than −200 HU. For example, body fat is darker than many
tissue in CT, and the grayscale values of body fat usually lies in the interval [−70, −30].
Therefore, the performance of the approach described in Section 2.3 is good enough for
further analysis.
Figure 3.10 to 3.15 show segmentation results of Case 1 and Case 2. The slices
chosen in this section are near lung, which is the organ that contains most air in human
body. We may see the lung is segmented in Figure 3.12 and Figure 3.15. Gastrointestinal
tract may also contains air but the amount of air in gastrointestinal tract is far less than
that in lung. As a result, we may see a part of gastrointestinal tract is segmented in Figure
3.12. However, the whole lung is included in Case 2 and hence the gastrointestinal tract
42 doi:10.6342/NTU202102364
Figure 3.10: Original axial images of case 1.
43 doi:10.6342/NTU202102364
Figure 3.12: Air segmentation in axial view of case 1.
44 doi:10.6342/NTU202102364
Figure 3.14: Internal air segmentation in axial view of case 2.
45 doi:10.6342/NTU202102364
3.3 Registration and Partition of Body Parts
As we mentioned before, the bone signal and internal air signal are often noisy, so
directly calculating the Euclidean distances between bone signal and internal air signal
does not make sense. Therefore, we need to apply some transformation to the signals first
for denoising and pattern recognition before we minimizing the distances between these
signals.
We use the following case as the reference image and the breakpoints of body parts
are shown in Figure 3.16. Then we compute the bone segmentation and the internal air
segmentation, and the segmentation results, the bone signal, and the air signal are shown
46 doi:10.6342/NTU202102364
Figure 3.17: Air segmentation in axial view of the reference image.
47 doi:10.6342/NTU202102364
3.3.2 Gaussian Filters as the Transformations
We use the Gaussian filter with width ϵ = 2 (unit) in Section 2.4.1 for denoising.
The results of the reference image, Case 1, and Case 2 are shown in Figure 3.19 to 3.21.
Note that the air signals have a large maximum in the part of lung and the typical pattern
While Gaussian filter is denoising, it also removes some important potential patterns
in images. Notice that the M structure disappears in the reference case and Case 2. But the
48 doi:10.6342/NTU202102364
Figure 3.20: Signals transformed by Gaussian filter of case 1.
49 doi:10.6342/NTU202102364
3.3.3 Wavelet Transforms as the Transformations
Due to the ability of wavelet transform to deal with multiscale problem, the wavelet
transform recognize the patterns of bone sequence and internal air sequence. Choosing a
suitable set of scale factors is an important part in this step. Besides, normalization is also
a necessary step. Since people have different body types, the range of bone sequences and
air sequences are also different. Hence, even if we assume two people have bone structure
only different in scales, the reference bone signal and the warping moving bone signals
may be far to each other in the sense of L2 norm. However, using either L2 norm or L∞
norm for normalization may get confused since every CT has different body parts. If a
CT contains lung, then the L∞ norm of a air signal can be large, for example, it may be
50000 HU. On the other hand, if a CT does not contain any parts of lung, then the L∞
norm is small, for example, it may be 4000 HU. A Similar situation also occurs in case of
the bone signal. So we use some percentiles to normalize the signals. More precisely, we
use the maximum of 75percentile of the bone sequence and 3000 HU to normalize the
bone signal. And we use the maximum of 97.5percentile of the air sequence and 4000 HU
to normalize the air signal. Taking the maximums is preventing the results from outliers.
These percentiles are expected to emphasize some landmarks, such as M structure in the
Figure 3.22 shows an example of bone signal and its wavelet basis with normaliza
tion. If we use a lot of scale factors to form a basis, then there would be a lot of redundant
vectors. By choosing a suitable set of scale factors, we may see some import patterns in
the feature vectors. For example, the M structure and the local monotonicities appear in
some feature vectors of suitable scale factors. As we mentioned in Section 2.4, we use
50 doi:10.6342/NTU202102364
Figure 3.22: Bone signal and its features extracted by wavelet transforms.
{3, 4, ..., 10} as the set of scale factors to perform the wavelet transforms.
The timescale plots of Case 1 and Case 2 are shown in Figure 3.23 and 3.24. In
Case 1, we may see the M structures are captured by several transformed bone signals of
different scale factors. The M structures and the third peak related to the contrastenhanced
structures are captured by transformed bone signals in Case 2. Moreover, the peaks of the
air signals are captured by the transformed signals of both cases. In either bone signals or
air signals, due to the ability to deal with multiscale problem, wavelet transforms captures
more patterns than Gaussian filters. Therefore, we will use the transformation defined by
3.3.4 Registration
By solving the optimization problem (8) described in Section 2.4, we may register the
bone signals and the air signals between cases. The parameter λ is chosen to be 0.3 since
the main information should be given by the bone signals while air signals is regularizing
51 doi:10.6342/NTU202102364
Figure 3.23: Timescale plots of case 1.
52 doi:10.6342/NTU202102364
Figure 3.25: Registration results of case 1. The bone structure of the reference image and
the transformed bone structure of the moving image.
Recall that Case 1 is a regular case. The pattern of bone structure is complete in both
segmentation and the bone signal. The registration result as shown in Figure 3.25 is great
as we may expect.
On the other hand, since we use air signal just for regularization, the registration re
sults of internal air of Case 1 as shown in Figure 3.26 is not good. In fact, the maximums
of air signals align to each other due to our normalization approach. The registration result
may not be improve anymore since the lung structure in Case 1 does not be completely
captured. Moreover, we are not going to improve the registration result due to the fol
lowing reasons. First, the shape of lung is not fixed. In fact, lung is flexible. The shape
of lung is changing at every moment since we are breathing. Hence, registration of lungs
is difficult. Second, our purpose is the partition of body parts and the usage of air sig
nal is just regularizing the registration, so we do not focus on improving the internal air
53 doi:10.6342/NTU202102364
Figure 3.26: Registration results of case 1. The air structure of the reference image and
the transformed air structure of the moving image.
registration.
Although Case 2 is not so regular, the registration results are still good as shown
in Figure 3.27 and 3.28. Even though the transformed bone signals near pelvis are not
close enuough to the reference case, we still obtain a good body parts partition due to the
regularization of air signals. In fact, the internal air registration of Case 2 is great as shown
in Figure 3.28.
As we see in these examples, the air segmentation is usually good enough but the
bone segmentation may be bad. However, even though the bone segmentation seems not
regular as in Case 2, our body parts partition algorithm can still obtain a good registration
and hence a good body parts partition if the bone segmentation is not totally broken. We
show the final body parts partition of Case 1 and Case 2 in Figure 3.29 and Figure 3.30.
During the research, we listed 44 cases for the difficulty to segment bone or other
54 doi:10.6342/NTU202102364
Figure 3.27: Registration results of case 2. The bone structure of the reference image and
the transformed bone structure of the moving image.
Figure 3.28: Registration results of case 2. The air structure of the reference image and
the transformed air structure of the moving image.
55 doi:10.6342/NTU202102364
Figure 3.29: Body parts partition of case 1.
56 doi:10.6342/NTU202102364
problems, such as wrong body parts and wrong phases. And we discussed these 44 CT
images with Dr. Lin. Dr. Lin rechecked these cases and then labeled the body parts for
these cases. After that, 8 images were excluded in the analysis. Some of them are replaced
by correct ones and some of them are removed due to the wrong body part, such as chest
CT images.
We test body parts partition algorithm to the remaining 36 cases and compare the
partition obtained by the algorithm with the ground truth labeled by Dr. Lin. For each
case, we compute the absolute error (cm) between partitions of our algorithm and the
It is clear that there are some outliers in these distributions. These cases basically are
of average errors larger than 10 slices. We leave the segmentation and partition results in
Appendices A.2. These cases contain low resolution image, leg CT, and artifact.
We exclude outliers described above to obtain robust statistics as shown in Table 3.2.
The body parts partition results of outliers are shown in Appendix A.2 We are interested
in lpel, upel, mabd since the three breakpoints are used in our preprocessing of deep
learning models. The mean errors in lpel, upel, mabd are 2.48 cm, 1.73 cm, and 1.73
cm. And the respective standard deviations are 1.96 cm, 1.87 cm, and 1.46 cm. The results
57 doi:10.6342/NTU202102364
lpel upel mabd lchest mchest uchest
count 32 32 31 31 31 17
mean 2.48 1.73 1.73 1.25 2.37 1.42
std 1.96 1.87 1.46 1.00 1.40 1.45
min 0.06 0.07 0.07 0.04 0.02 0.09
Q1 1.02 0.49 0.78 0.47 1.43 0.51
Q2 2.23 1.20 1.33 1.02 2.37 1.01
Q3 3.29 2.17 2.51 1.73 3.34 2.08
max 7.27 9.33 6.91 4.48 5.50 6.14
Table 3.2: Error distribution of each body parts after excluding outliers. We denote lower,
middle, upper, pelvis, and abdomen by l, m, u, pel, and abd.
shows the algorithm has good enough performance for preprocessing of our deep learning
models.
We use two cropping strategies to preprocess the CT images. The first one is cropping
to pelvis. That is, crop CT images to the region between lpel and upel determined by our
registration results. The second one is to crop images to pelvis and lower abdomen. That
is, crop CT images to the region between lpel and mabd determined by our registration
results.
Table 3.3 shows the means and standard deviations of AUCs of 5 folds and we leave
the details in Table B.1. By cropping to upper pelvis, we obtain a mean validation AUC
0.8601 and a mean test AUC 0.8129. On the other hand, we obtain a mean validation AUC
58 doi:10.6342/NTU202102364
The first thing we may see is that the validation performance is better than test perfor
mance in either cases. Note that the validation set is used to monitor the training process
and we evaluate validation performance on each epoch end. Moreover, the model check
point is selected to have the highest validation AUC. Hence, it may overfit the validation
set when the dataset is not large enough. We may, in theory, reduce the phenomenon by
adding data.
Second, both the mean validation AUC and mean test AUC obtained by cropping to
upper pelvis are higher than that obtained by cropping to lower abdomen. Although there
are some features in lower abdomen that is related to ovarian cancer, it seems that mod
els are making decisions by organs in pelvis. Moreover, the standard deviations obtained
by cropping to upper pelvis are lower than that obtained by cropping to lower abdomen.
Therefore, cropping to upper pelvis gives robuster models and we choose it as our prepro
cessing strategy.
59 doi:10.6342/NTU202102364
60 doi:10.6342/NTU202102364
Chapter 4 Conclusion
In this thesis, we propose an analysis pipeline that does not need image annotations
of ovaries and ovarian tumors. We decide to preprocess the CT images by cropping the
images to some specific body parts to avoid image annotations. For this purpose, we
develop a body parts partition algorithm which only needs a few body parts annotations
use wavelet transform to smooth the bone signals and air signals as well as recognize
patterns. And we align body parts between the reference image and the moving image
by solving a minimization problem. Although the algorithm fails in some cases, such as
images with artifacts or cases that does not contain abdomen and chest parts, our body parts
partition algorithm still has a good enough performance as a preprocessing technique for
deep learning. Error in 5 body part breakpoints are of means approximately 2cm, which
gies, cropping to pelvis and to the union of pelvis and lower abdomen and train CNNs for
two strategies and find that cropping to pelvis is not only of a high mean test AUC but also
of a lower standard deviation. Therefore, we decide to crop CT images to pelvis for this
task. Moreover, the mean test AUC obtained by cropping to upper pelvis is 0.8129 and the
61 doi:10.6342/NTU202102364
standard deviation 0.0154 and it shows that CNN as well as our preprocessing approach
Some future works includes the following. The first one is improving preprocessing
pipeline, including the bone segmentation methodology and the computational efficiency
so that the pipeline has its capability to support medical practices. Second, find an optimal
parameter λ in the registration setting to minimize the error of registration results. Third,
use a set of reference images for different subsets of patients rather than one reference
image in the registration step. People with different conditions tend to have different types
of bone or some other structures. For example, the bone structures between a young man
and a old man are different. Therefore, we may divide people into groups and prepare a
reference image for each group to improve the registration performance. Fourth, improves
hyperparameter optimization.
Also, adding more data for training is another approach to improve the model per
formance.
62 doi:10.6342/NTU202102364
References
[2] J. Bigot. Landmarkbased registration of curves via the continuous wavelet trans
[3] Y. Boykov and G. FunkaLea. Graph cuts and efficient nd image segmentation.
[4] J. Canny. A computation approach to edge detection. IEEE Trans. Pattern Anal.
[5] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning
[6] T. Gasser and K. Wang. Synchronizing sample curves nonparametrically. The Annals
63 doi:10.6342/NTU202102364
adaptive moments for training of deep networks. arXiv preprint arXiv:1905.11286,
2019.
[8] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing
[9] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected
22(6):1305–1325, 2002.
[11] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.
[12] M. Krčah, G. Székely, and R. Blanc. Fully automatic and fast segmentation of the
femur bone from 3dct images with no shape prior. In 2011 IEEE international
2011.
model of the pelvic bone for segmentation. In Medical imaging 2004: Image
processing, volume 5370, pages 1341–1351. International Society for Optics and
Photonics, 2004.
[14] I. Loshchilov and F. Hutter. Sgdr: Stochastic gradient descent with warm restarts.
64 doi:10.6342/NTU202102364
[15] Z. Lu, H. Pu, F. Wang, Z. Hu, and L. Wang. The expressive power of neural networks:
[16] J. O. Ramsay and X. Li. Curve registration. Journal of the Royal Statistical Society:
[18] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions
Dropout: a simple way to prevent neural networks from overfitting. The journal
[20] T. Tieleman and G. Hinton. Lecture 6.5 rmsprop, coursera: Neural networks for
ovarian cancer using decision tree classification of mass spectral data. Journal of
[22] J. Zhang, C.H. Yan, C.K. Chui, and S.H. Ong. Fast segmentation of bone in ct im
236, 2010.
65 doi:10.6342/NTU202102364
66 doi:10.6342/NTU202102364
Appendix A — Outliers in Error
Analysis
A.1 Introduction
In error analysis of our partition algorithm, we find there are 4 outliers in our results.
We remove the 4 outliers for a robust mean and standard deviation. Also, we analyze the 4
outliers and give some explanation for them. We plot the partition results of these 4 cases
in this appendix.
The first one is a low resolution image as shown in Figure A.1. The ribs are not
included in the bone segmentation, so the bone signal does not present the information
about ribs. Moreover, only a little part of lung is captured in this image, so the air signal
not good enough and only a little part of lung is captured, so the registration result is bad.
There are some artifacts in the second case as shown in Figure A.2. Hence, we may
see that the large number of ”bone pixels” detected by the hysteresis bone segmentation.
67 doi:10.6342/NTU202102364
Figure A.1: Outlier 1 removed in Table 3.2.
And therefore, the bone signal does not present the true patterns of bone. The bone signal
is the most important part in our body parts partition algorithm. As a result, the registration
result is bad.
The third case, as shown in Figure A.3 is not so strange. However, it seems that some
breakpoints such as breakpoints in chest are not correct. One reason for this case is that
this image captures only a little part of lung. Therefore, the registration of air sequences
The fourth case is a leg CT, as shown in Figure A.4. So the air signal cannot provide
any information. Furthermore, it may interrupt the registration since it tries to align the
abdomen of this case to the lung of reference case. Therefore, our algorithm align the
pelvis of the reference case to the feet of this case and align the lung of the reference case
68 doi:10.6342/NTU202102364
Figure A.3: Outlier 3 removed in Table 3.2.
69 doi:10.6342/NTU202102364
70 doi:10.6342/NTU202102364
Appendix B — CrossValidation Results
B.1 Introduction
B.2 Results
In view of Table B.1, we may see metrics obtained by cropping to upper pelvis are
higher than or approximately equal to that obtained by cropping to lower abdomen in many
folds. Based on the results, cropping to pelvis is a better strategy in our experiments.
71 doi:10.6342/NTU202102364