電腦斷層之小波分析身體部位分割與深度學習卵巢腫瘤分類

國立臺灣大學理學院數學系
碩士論文
Department of Mathematics
College of Science
National Taiwan University
Master Thesis
電腦斷層之小波分析身體部位分割與深度學習卵巢腫
瘤分類
Body Parts Partition by Wavelet Transform and Ovarian

Tumors Classification by Deep Learning in Computed
Tomography
黃敬倫
Jinglun Huang
指導教授: 王偉仲博士
Advisor: Weichung Wang Ph.D.
中華民國 110 年 8 月
August, 2021
doi:10.6342/NTU202102364
ii doi:10.6342/NTU202102364
Acknowledgements
感謝林口長庚醫院林吉晉醫師提供資料集及協助身體部位標注讓本論文能夠
順利完成。
iii doi:10.6342/NTU202102364
iv doi:10.6342/NTU202102364
摘要
卵巢癌是婦癌當中最危險的一種。它不但很難被早期偵測，而且沒有警訊。
當一個病人因某些症狀而諮詢婦產科醫師時，卵巢腫瘤經常已經遍佈整個盆腔甚
至腹腔了。最近幾年，基於醫學及技術發展，許多癌症的死亡率已經有下降或持
平的趨勢，但卵巢癌的致死率卻不降反升。而電腦斷層是一種經常用於診斷卵巢
癌及許多其他種類癌症的三維影像。故建立在電腦斷層影像上的分析流程及機器
學習分類模型可以廣泛使用在許多情境。
在醫學影像分析中，機器學習的流程經常需要器官及腫瘤的影像標註，但卵
巢及其腫瘤十分難以標註。故本論文著力在電腦斷層上於建構一套不使用卵巢及
其腫瘤標註的分析流程。為此，我們發展一套基於小波轉換的身體部位分割演算
法來找到身體部位分割點。該演算法只需要使用少量的身體部位標註。在我們
的實驗中，以該演算法預測得到的六個身體部位分割點誤差中位數約為兩公分，
作為資料前處理已足夠精確。我們將該演算法使用在林口長庚紀念醫院中的資
料集，裁切出與卵巢癌有關的盆腔及下腹腔。此資料集包含 240 個良性腫瘤案
例及 161 個惡性腫瘤案例。接著我們訓練深度學習模型並給出交叉驗證的結果。
總體而言，測試集上接收者操作特徵曲線下面積的平均數及標準差為 0.8129 及
0.0154，該結果顯示我們的分析流程具有分類卵巢惡性腫瘤及良性腫瘤的潛力。
關鍵字：電腦斷層、小波轉換、身體部位分割、深度學習、卵巢腫瘤分類
v doi:10.6342/NTU202102364
vi doi:10.6342/NTU202102364
Abstract
Ovarian cancer is one of the most dangerous cancers for women. Moreover, it is
hard to early detect and it has no warning sign. When a patient consults a gynecologist
due to some symptoms, the ovarian tumor usually spread within the pelvis and even the
abdomen. In recent years, the mortality of ovarian cancer is increasing while mortality of
some other kinds of cancer is either decreasing or not increasing due to the improvement
of medical science and techniques. Computed tomography (CT) is one kind of three
dimensional image and is used for the diagnosis of ovarian cancer as well as many other
kinds of cancers. An analysis pipeline and a machine learning classification model by
using CT images can be widely used in many situations.
Image annotations of organs and tumors are usually needed in machine learning
workflow in medical image analysis, but image annotations of ovaries and ovarian tu
mors are hard to label. Therefore, this thesis aims to build up a pipeline for distinguishing
ovarian cancerous tumors from ovarian benign tumors in CT images by deep learning
vii doi:10.6342/NTU202102364
models without using image annotations of ovaries and ovarian tumors. For this purpose,
we develop a body parts partition algorithm to find the breakpoints of body parts by using
wavelet transform. Only a few body parts annotations are needed in this algorithm. In
our experiments, the prediction errors of 6 body part breakpoints are of medians approx
imately 2 cm, which is accurate enough for data preprocessing. We use our algorithm to
crop image to pelvis and lower abdomen, which are related to ovarian cancer on the dataset
from Linkou ChangGeng Memorial Hospital. The dataset consists of 161 cancerous cases
and 240 benign cases. Then we train deep learning models and provide crossvalidation
results. Overall, the mean test ROCAUC is 0.8129 and the standard deviation is 0.0154,
which shows the pipeline has the potential to distinguish cancerous ovarian tumors from
benign ovarian tumors.
Keywords: Computed Tomography, Wavelet Transform, Body Parts Partition, Deep

Learning, Ovarian Tumors Classification
viii doi:10.6342/NTU202102364
Contents
Page
Acknowledgements iii
摘要 v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xv
Chapter 1 Introduction 1
Chapter 2 Method 5
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Bone Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Global Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 The Largest Connected Component Method . . . . . . . . . . . . . 12
2.2.3 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.4 Hysteresis Thresholding . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Internal Air Segmentation . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Bone Sequences, Internal Air Sequences, and Transformations . . . . 17
2.4.1 Smoothing by Gaussian Filters . . . . . . . . . . . . . . . . . . . . 18
ix doi:10.6342/NTU202102364
2.4.2 Feature Extraction by Wavelet Transforms . . . . . . . . . . . . . . 20
2.5 Curve Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6.1 Machine Learning in Medical Image Analysis . . . . . . . . . . . . 25
2.6.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . 26
2.6.3 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . 30
2.6.4 Dataset and Data Splitting . . . . . . . . . . . . . . . . . . . . . . 31
2.6.5 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Chapter 3 Results and Discussion 35
3.1 Bone Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Internal Air Segmentation . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Registration and Partition of Body Parts . . . . . . . . . . . . . . . . 46
3.3.1 Preparation Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.2 Gaussian Filters as the Transformations . . . . . . . . . . . . . . . 48
3.3.3 Wavelet Transforms as the Transformations . . . . . . . . . . . . . 50
3.3.4 Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 Classification by Deep Learning . . . . . . . . . . . . . . . . . . . . 58
Chapter 4 Conclusion 61
References 63
Appendix A — Outliers in Error Analysis 67
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
A.2 Body Parts Predicted by Our Algorithm . . . . . . . . . . . . . . . . 67
x doi:10.6342/NTU202102364
Appendix B — CrossValidation Results 71
B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
B.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
xi doi:10.6342/NTU202102364
xii doi:10.6342/NTU202102364
List of Figures
2.1 Three views in CT scan. . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 An example of a CT slice in axial view and its bone segmentation obtained
by global thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Bone segmentation and contrastenhanced structure segmentation. In both
plots, black pixels mean the background. White pixels in the left plot
mean the bone pixels and that in the right plot mean the contrastenhanced
structure. Gray pixels in the left plot mean the contrastenhanced structure
and that in the right plot mean the bone pixels. . . . . . . . . . . . . . . . 37
3.3 Bone segmentation and contrastenhanced structure segmentation. The
white pixels mean the foreground and the gray and black pixels mean the
background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Original axial images of case 1. . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Bone segmentation in axial view of case 1. . . . . . . . . . . . . . . . . . 39
3.6 Bone signal and bone segmentation in coronal view of case 1. . . . . . . 39
3.8 Bone segmentation in axial view of case 2. . . . . . . . . . . . . . . . . . 41
3.9 Bone signal and bone segmentation in coronal view of case 2. . . . . . . 41
3.11 Internal air segmentation in axial view of case 1. . . . . . . . . . . . . . 43
3.12 Air segmentation in axial view of case 1. . . . . . . . . . . . . . . . . . . 44
3.14 Internal air segmentation in axial view of case 2. . . . . . . . . . . . . . 45
3.15 Air segmentation in axial view of case 2. . . . . . . . . . . . . . . . . . . 45
xiii doi:10.6342/NTU202102364
3.16 Bounds of body parts of the reference image. . . . . . . . . . . . . . . . 46
3.17 Air segmentation in axial view of the reference image. . . . . . . . . . . 47
3.18 Bone segmentation in axial view of the reference image. . . . . . . . . . 47
3.19 Signals transformed by Gaussian filter of the reference case. . . . . . . . 48
3.20 Signals transformed by Gaussian filter of case 1. . . . . . . . . . . . . . . 49
3.21 Signals transformed by Gaussian filter of case 2. . . . . . . . . . . . . . . 49
3.22 Bone signal and its features extracted by wavelet transforms. . . . . . . . 51
3.23 Timescale plots of case 1. . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.24 Timescale plots of case 2. . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.25 Registration results of case 1. The bone structure of the reference image
and the transformed bone structure of the moving image. . . . . . . . . . 53
3.26 Registration results of case 1. The air structure of the reference image and
the transformed air structure of the moving image. . . . . . . . . . . . . . 54
3.27 Registration results of case 2. The bone structure of the reference image
and the transformed bone structure of the moving image. . . . . . . . . . 55
3.28 Registration results of case 2. The air structure of the reference image and
the transformed air structure of the moving image. . . . . . . . . . . . . . 55
3.29 Body parts partition of case 1. . . . . . . . . . . . . . . . . . . . . . . . 56
3.30 Body parts partition of case 2. . . . . . . . . . . . . . . . . . . . . . . . 56
A.1 Outlier 1 removed in Table 3.2. . . . . . . . . . . . . . . . . . . . . . . 68

xiv doi:10.6342/NTU202102364
List of Tables
2.1 The numbers of patients and images of folds. . . . . . . . . . . . . . . . 31

2.2 The numbers of data in each fold. The test sets in all lists are identical.
We keep the test set unseen for final test. For i = 1, 2, 3, 4, 5, list i regards
fold i as the validation set and the others as the training set. . . . . . . . . 31
3.1 Error distribution of each body parts. We denote lower, middle, upper,
pelvis, and abdomen by l, m, u, pel, and abd. . . . . . . . . . . . . . . . . 57
3.2 Error distribution of each body parts after excluding outliers. We denote
lower, middle, upper, pelvis, and abdomen by l, m, u, pel, and abd. . . . . 58
3.3 Mean AUCs for different stategies. This table shows means and standard
deviations of validation AUCs and test AUCs. The means are standard
deviations are computed from 5fold cross validation results. . . . . . . . 58
B.1 This table shows metrics obtained in details. The threshold is simply cho
sen as 0.5 for computing accuracy. Here, we denote validation, accuracy,
upper pelvis, and lower abdomen by val, acc, upel, labd, respectively,
for short. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
xv doi:10.6342/NTU202102364
xvi doi:10.6342/NTU202102364
Chapter 1 Introduction
Ovarian cancer is one of the most dangerous cancers for women. In Taiwan, the num
ber of death in ovarian cancer is ranked number seven in female cancer. In recent years,
due to the improvement of medical science and techniques, mortality of some other kinds
of cancer is either decreasing or not increasing. However, the mortality of ovarian can
cer is increasing. According to the statistics of Ministry of Health and Welfare (R.O.C.),
the mortality of ovarian cancer is even higher than cervical cancer in 2020 (updating in
2021/06/18). Difficulty in early detection is a reason why the mortality of ovarian cancer
is still increasing. Some other kinds of cancer often follow some warning signs, such as
pain, bleeding, changes in physical appearances, or other symptoms. For example, blad
der cancer usually cause hematuria and breast cancer usually cause some lumps in the
breast or underarm. However, ovarian cancer is often difficult to detect. When a patient
gets some warning signs and consults an gynecologist, for ovarian cancer, the cancerous
tumor has often spread within pelvic or even abdomen. Moreover, the fiveyear survival
rate of a ovarian cancer patient in early stage is far higher than that of late stage and the
medical cost for early detection of cancer is lower than that of late stage. Therefore, early
detection of ovarian cancer is a crucial task.
Computed tomography (CT) is a three dimensional imaging technique and is a com
mon tool used to detect ovarian cancer as well as many other cancers. Moreover, CT
1 doi:10.6342/NTU202102364
is one of the most general medical image, so an analysis pipeline for CT images can be
widely used in many situations. In this work, we used CT scan as the inputs of our analysis
pipeline.
In medical science, there were researches that aimed at ovarian tumors. S. E. Jung
et al. [10] aimed to find features in CT scan for the use in differential diagnosis. U. R.
Acharya et al. [1] extracted features, such as deviation and entropy, from ultrasound using
computeraided diagnostic (CAD) techniques and classified ovarian tumors by decision
tree. A. Vlahou et al. [21] used clinical data to established a decision tree for diagnosis of
ovarian cancer.
However, using CT images to establish machine learning models to classify ovarian
tumors is still not wellinvestigated. In this thesis, we build up an analysis pipeline of
ovarian tumors by using CT images.
Although machine learning became a popular word in medical image analysis, clas
sification ovarian tumors in by machine learning still seems challenging. An important
reason is that ovaries and ovarian tumors are difficult to segment, even by manual seg
mentation. Locations of ovaries as well as pelvis organs in CT images vary from case to
case. Moreover, ovary is a small organ, and sometimes the boundary of a ovary in CT is
not clear. Hence, it is hard to define the ground truth of annotations of ovaries and ovarian
tumors. Also, labeling ovaries and ovarian tumors is timeconsuming. Therefore, an anal
ysis pipeline that does not depends on image annotations of ovaries and ovarian tumors is
more applicable in medical practice.
Many machine learning algorithms are based on image annotation. One may say
convolutional neural network (CNN) didn’t strongly depend on image annotation; instead,
2 doi:10.6342/NTU202102364
CNN learnt to extract useful features from images during training process. However,
training a CNN for CT tasks usually needs image annotations as a mask or locating the
organs. Cropping to region of interest (ROI) is a standard technique for locating the organs.
Medical images often contains a lot of organs or tissues and hence includes amounts of
information, but many problems usually aim at one or a few organs. Moreover, CT images
may capture different body parts. Hence, using the whole CT images as inputs of models
does not make sense. Even though a model trained by using whole images gets good
performance, it may not be explainable and is hard to use in medical practice. Therefore,
a proper preprocessing algorithm that crops some meaningful parts from CT images is a
crucial part of ovarian tumors classification.
In this thesis, we design an analysis pipeline that does not depend on image annota
tions of ovaries and ovarian tumors. Instead, we use bounding boxes of ROI as inputs of
DL model. The ROI can be the pelvis or the union of the pelvis and the lower abdomen,
since the lower abdomen may provide some features related to ovarian cancer, such as
ascites. Except for avoiding the difficulty of image annotation, cropping a bounding box
is easier to design a rulebased method than image segmentation which is usually done by
another deep learning model nowaday.
There are several types of ovarian tumors, such as benign, cancerous, and some other
types of tumors, but, in this work, we aim at distinguishing cancerous ovarian tumors from
benign ones by using CT images. We design an algorithm for the body parts partition by
using wavelet transform and crop the pelvis and lower abdomen from CT images. The
algorithm is based on the following main steps.
1. Obtain a query case
3 doi:10.6342/NTU202102364
2. Segment bone, and internal air
3. Construct the bone sequence and air sequence from the segmentation of bone and
internal air
4. Register the body parts by aligning bone sequence and air sequence to a reference
case
5. Propagate the bodypart breakpoints from the reference case to the query case
6. Crop to target parts
By this algorithm, we may automatically preprocess CT images before model training
and inference, which only needs a few body parts labels rather than image annotations of
ovaries and ovarian tumors.
We apply our algorithm to the dataset from Linkou Chang Gung Memorial Hospi
tal, train and evaluate deep learning models for ovarian tumors classification by cross
validation and testing. We compared the bodypart breakpoints obtained from our body
parts partition algorithm to the ground truth labeled by a radiologist (Dr. Lin, Gigin). The
mean absolute errors in lower pelvis, upper pelvis, middle abdomen, lower chest, middle
chest, and upper chest are approximately 2.48 cm, 1.73 cm, 1.73 cm, 1.25 cm, 2.37 cm,
and 1.42 cm after we remove 4 outliers, which shows the body parts partition algorithm
is accurate for data preprocessing. We use crossvalidation to train and evaluate the deep
learning models. By cropping CTs to pelvis only, we obtain the mean test AUCs 0.8129
with the standard deviation 0.0154, which shows the analysis pipeline has the potential to
distinguish the cancerous ovarian tumors from benign ones.
4 doi:10.6342/NTU202102364
Chapter 2 Method
In this section, we introduce our body parts partition algorithm and the training details
of deep learning models. First, we give an overview of our algorithm. Then we talk about
the details in each step of this algorithm. In this thesis, we define images as follow for
convenience.
Definition 2.1. Let m, n, p ∈ N and let [k] = {0, 1, 2, ..., k − 1} for k ∈ N.
1. A function I : [m] × [n] × [p] → R is called a (threedimensional) image of size
m × n × p.
2. Suppose I is an image of size m × n × p. The spacing (of I) in xaxis is the width
between two voxels in xaxis. Similar definition for spacing in yaxis and spacing
in zaxis.
3. Suppose I is an image of size m × n × p. If S is an image of the same size with I,
whose range is contained in {0, 1}, then we say S is a binary image.
Remark 2.2.
1. We may omit the size of an image if no confusion.
2. CT images are threedimensional images, so we usually plot a cross section of the
volume for visualization. Figure 2.1 shows three different views of a CT image.
5 doi:10.6342/NTU202102364
3. The spacing is saved in the meta data of and the size of a CT image is usually
512 × 512 × p where p ∈ N. p is often determined by the scanned body parts.
4. We use a binary image S of size m × n × p to present the segmentation of some
ROI in a given image. More precisely, the preimages
{(x, y, z) ∈ [m] × [n] × [p] : S(x, y, z) = 1}
and
{(x, y, z) ∈ [m] × [n] × [p] : S(x, y, z) = 0}
are the sets of ROI and background, respectively.
Figure 2.1: Three views in CT scan.
2.1 Overview
We give an overview of our analysis pipeline in this section. Due to the difficulty
of labeling image annotations of ovaries and ovarian tumors, using a bounding box of a
proper region instead of image segmentation is more applicable. In this work, the main
6 doi:10.6342/NTU202102364
idea of body parts partition contains the following three steps. First, we segment the bone
and the internal air in a CT image. Second, we compute the numbers of bone pixels and
internal air pixels in each axial slices for defining the bone sequence and the internal air
sequence. The formal definition of the bone sequence and the air sequence are defined
in Definition 2.3. Note that the bone sequence and the internal air sequence (or simply
the air sequence) are two signals that carry the body parts information of the CT image.
Hence, the third step is to align these two signals between cases registration of body parts.
Eventually, we only need a few body parts annotations to automatically find the break
points of body parts of any query CT image by propagating the body parts annotations
from known cases to the query case.
Definition 2.3. Suppose m, n, p ∈ N and I is an image of size m × n × p. Let S be a
bone segmentation of I and let the set of bone voxels B = S −1 ({1}). The bone sequence
(of I) is defined as the sequence {bk }pk=1 , where
bk = |{(x, y, z) ∈ B : z = k}| .
Here, |A| is the cardinality of a given set A.
The air sequence {ak }pk=1 is defined in the same manner.
Registration is aligning signals and is usually formulated by an optimization prob
lem. Let f, g : [0, 1] → R be two functions with some proper assumptions, such as
f, g ∈ L2 ([0, 1]), being continuous or smooth. The registration problem is, roughly say
ing, finding a onetoone, increasing function h : [0, 1] → [0, 1] such that f ≈ g ◦ h under
some sense, such as being close under L2 distance. h is called a warping function and is
usually assumed to have an integrable secondderivative [16].
7 doi:10.6342/NTU202102364
There are different formulations of the registration problem between f and g, for
example, landmarkbased matching [2]. However, since each CT image usually scan dif
ferent body parts, landmarkbased matching is difficult in our case and assuming that
codomain of h is [0, 1] is not applicable. Moreover, bone sequence and air sequence are
two types of information for a single CT, so they shall be taken into consideration at the
same time. Although bone sequences and air sequences are discrete data, it is convenient
to consider them as continuous signals. Therefore, in this thesis, the registration problem
between bone sequences and air sequences is formulated as following.
Every CT image has its bone signal and its air signal and we are going to align these
two signals at the same time, which is an important difference between our problem and
other curve registration settings. Assume that f1 and f2 are bone signals and g1 and g2
are air signals. We are going to transform f1 , f2 , g1 , and g2 by a transformation T . T
is expected to have abilities to denoise and to represent the patterns of the bone signals
and air signals. The codomain of T is not necessary a subset of univariate functions.
Then we define a distance between the pairs (T (f1 ), T (g1 )) and (T (f2 ◦ h), T (g2 ◦ h)) by
using 2norm or its generalization. Finally, we register body parts between CT images by
minimizing the distance.
Now, we summarize the main ideas of body parts partition in the following steps.
Preparation Stage
1. Choose a reference CT image and obtain the breakpoints of body parts of interest
2. Segment bone and internal air
3. Compute the bone sequence and the air sequence
8 doi:10.6342/NTU202102364
4. Apply some proper transforms to the bone sequence and the air sequence and regard
them as the reference signal
We obtain reference signals and body parts breakpoints from the preparation stage. The
reference image is expected to include all body parts of interest and to represent the im
portant pattern of the bone sequence and the air sequence.
In functional data registration, we usually have a reference signal and a moving sig
nal. We transform the moving signal into the coordinates of the reference signal. In our
problem, we treat the transformation outputs of the bone signal and the air signal of the
reference image as reference signals. At the inference stage, if we are given a query CT
image, then the transformation outputs of the bone signal and the air signal of the query
image will be treated as moving signals.
In the inference stage, we label the body parts for a query CT image in the following
steps. Inference Stage
1. Obtain a CT image and then segment bone and internal air
2. Segment bone and internal air
3. Compute the bone sequence and the air sequence
4. Apply some proper transforms to the bone sequence and the air sequence and regard
them as moving signal
5. Compute the optimal warping function by minimizing the distance between the
moving signal and the reference signal
6. Get the body parts breakpoints by propagating the breakpoints from the reference
9 doi:10.6342/NTU202102364
signal to the moving signal
By these steps, we can propagate the body parts breakpoints from only a few labeled
images to lots of query images. Finally, we use this body parts partition algorithm to
preprocess CT images and then use techniques in deep learning to establish models.
2.2 Bone Segmentation
Bone segmentation is an important topic. One usage of bone segmentation is that it
may be used as a reference information for surgery. There are already some researches on
this topic. Moreover, bone segmentation is one step in our body parts partition algorithm
since the bone sequence relies on the bone segmentation. Here are some related works
about bone segmentation.
A graphbased segmentation approach is proposed by Y. Boykov and G. Funka
Lea[3], which is done by solving a generalized eigenvalue problem. By minimizing an
energy function consisting of perpixel term and boundary term, Y. Boykov and G. Funka
Lea [3] could find a segmentation where pixels are labeled by the same class if they have
high similarity. M. Krčah et al. [12] extend the work in [3] to a another formulation that
is more suitable for 3D images, which is more applicable to bone segmentation in CT.
By assuming that the intensities of bone and nonbone voxels are sampled from a
mixture of Gaussian distributions, an iterative method for bone segmentation is proposed
by [22]. Starting with an initial segmentation (obtained from global thresholding, for
instance), this method reclassified pixels in bone class by Bayesian decision rule to update
the segmentation and so on.
10 doi:10.6342/NTU202102364
Moreover, H. Lamecker et al. [13] uses the statistical shape model (SSM) to segment
pelvic bone. SSM aims to find a deformation from one shape to another shape. Based on
the deformation, SSM can transform shapes into a base space and obtain the distance
between surfaces. The distance can be used to estimate the average shape as well as the
variation in shape.
Although bone segmentation is a step in our proposed analysis pipeline, we don’t
need a extremely accurate segmentation for the next step and the computational efficiency
may be more important in medical practices. So, we try some other methods for bone
segmentation. In fact, if the bone sequence from a bone segmentation can present its
essential patterns, then we may accept the segmentation. We have tried the following
approaches to segment bone and they are good at different situations.
2.2.1 Global Thresholding
Global thresholding is a typical segmentation approach. This approach distinguishes
whether a voxel shall be classified into object class by comparing the intensity of the voxel
with a threshold t ∈ R. We formally define the global thresholding in Definition 2.4.
Definition 2.4 (Global Thresholding). Let t ∈ R and m, n, p ∈ N. Suppose I is an
image of size m × n × p. Global thresholding is a segmentation approach that obtains a
segmentation S of I defined by




1 if I(x, y, z) > t
S(x, y, z) =



0 if I(x, y, z) ≤ t
for (x, y, z) ∈ [m] × [n] × [p]. In which case, t is called a threshold.
11 doi:10.6342/NTU202102364
In our case, a voxel in CT is classified into the bone class by global thresholding if
its grayscale value is larger than t, otherwise it is classified into the nonbone class.
Global thresholding is useful when the intensity of the object is very different from the
background. Since bone voxels are very bright in CT (in other words, they are of high gray
scale values), it is natural to consider global thresholding as our segmentation approach.
In fact, a common threshold of bone in CT is a grayscale value 400 HU (Hounsfield
Unit) due to medical properties of bone. Hence, global thresholding is a common method
to segment bone. However, since the CT images we use are venous phase CT images,
this approach must include the contrastenhanced structures if any. Therefore, we also
consider another methods in the following sections.
2.2.2 The Largest Connected Component Method
Choosing the largest connected component from the foreground of a segmentation is
an approach to retouch the segmentation if it is noisy. The concept of connected compo
nents depends on the connectivity in images, so we firstly introduce the connectivity in
images as follow.
Definition 2.5 (Voxels Connectivity). Let (x, y, z) and (x′ , y ′ , z ′ ) be voxels and let ∆x =
|x − x′ |, ∆y = |y ′ − y|, and ∆z = |z ′ − z|.
1. We say (x′ , y ′ , z ′ ) is 6connected to (x, y, z) if ∆x + ∆y + ∆z = 1.
2. We say (x′ , y ′ , z ′ ) is 26connected to (x, y, z) if ∆x ≤ 1, ∆y ≤ 1, and ∆z ≤ 1 but
∆x, ∆y, and ∆z are not all zeroes.
Remark 2.6. Let S be a binary image and F = S −1 ({1}).
12 doi:10.6342/NTU202102364
For voxels u and v, we say u ∼ v if either u = v or ∃ v1 , v2 , ..., vk−1 ∈ F such that
vi−1 is 6connected to vi for i = 1, 2, ..., k, where we denote vk = v and v0 = u. Then it
is clear that ∼ defines an equivalence relation on F .
Similar arguments for 26connectivity.
Definition 2.7 (Connected Components). Let S be a binary image and F = S −1 ({1}).
The equivalence classes of ∼ defined in Remark 2.6 are called a connected components
of F (or the foreground object).
A motivation of taking the largest component is that the contrastenhanced structures
are not connected to the bone. In fact, soft tissues are usually covered by fat, so they are
not connected to the bone. Moreover, the volume of bone is larger than that of contrast
enhanced structures in general. Hence, if we further assume that the bone is a connected
component of the segmentation obtained by global thresholding, then we may compute the
largest component of the segmentation to obtain the bone segmentation without including
contrastenhanced structures.
However, there may be some contrastenhanced structures that is connected to the
bone in images. They may just near to each other, but in the sense of voxels, they are
connected. Therefore, we consider the spectral clustering in the next section.
2.2.3 Spectral Clustering
Since contrastenhanced structures may be connected to bone in the sense of voxels,
we consider the segmentation approach by spectral clustering proposed in [18]. we use
spectral clustering to revise the bone segmentation from an initial segmentation.
13 doi:10.6342/NTU202102364
Before we introduce the details of this approach, we assume the human bone is con
nected in CT images. Consider a CT image I and a segmentation S. Here, S can be
constant 1 but it can’t be constant 0. Let V = S −1 ({1}) and let
E = {uv : u is 6connected to v, ∀u, v ∈ V }.
We may further define the weight for each edge as follow. For each edge uv ∈ E, define
|I(u)−I(v)|
the weight of uv to be e σ , where σ > 0 is a parameter. σ is usually chosen to be
the standard deviation of |I(u) − I(v)| over all uv ∈ E. Then G(V, E) is a weighted
undirected graph. By the assumption of connectivity of human bone, G is connected. Let
A be the adjacency matrix and D be the degree matrix of G. The Laplacian matrix of G
is given by L = D − A. Shi, Jianbo and Malik, Jitendra [18] propose a method for image
segmentation by solving the generalized eigenvalue problem
Lx = λDx. (2.1)
In our situation, we obtain an initial bone segmentation S from global thresholding
and use spectral clustering to revise the initial segmentation. We may construct a graph as
above with and solve the generalized eigenvalue problem on the corresponding Laplacian
matrix to obtain two clusters on the graph. By choosing the larger cluster, we obtain a
revised segmentation. This approach take more information into consideration and can
remove some contrastenhanced structures that connect to bone in the sense of voxels.
However, it is still not good enough since the bone itself may be not connected in the
sense of voxels, which violates the basic assumption of connectivity of graph in this ap
proach. Therefore, we continue to consider the hysteresis thresholding introduced in the
14 doi:10.6342/NTU202102364
next section.
2.2.4 Hysteresis Thresholding
Since the bone may not be connected in the sense of voxels, we consider the hysteresis
thresholding that proposed in [4]. This approach perform twostage thresholding to avoid
some nonobject voxels. Before we introduce hysteresis thresholding, let’s define the
connectivity between two sets for convenience.
Definition 2.8. Let F and G be two sets of voxels and let u be a voxel.
1. We say u is connected to G if there is v ∈ G such that u is connected to v
2. We say F is connected to G if there is v ∈ F such that v is connected to G
Now we introduce the hysteresis thresholding. Let s, t ∈ R with s < t. At the first
stage, we obtain segmentations S and T by applying global thresholding with respect to
thresholds s and t. Let F = S −1 ({1}) and G = T −1 ({1}). At the second stage, the
segmentation obtained by hysteresis thresholding is defined as
[
{C : C is a connected component of F and C is connected to G}.
We applied the hysteresis thresholding with the high threshold t = 800 HU and the
low threshold s = 400 HU to obtain bone segmentation. An observation about true bone
and contrastenhanced soft tissue is that a true bone voxel is often of intensity greater
than 1000 HU while the contrastenhanced soft tissue is often of intensity smaller than
800 HU. Although ribs are usually of intensity smaller than 800 HU and are filter out in
the first stage, they can be usually recovered at the second stage due to the connectivity
15 doi:10.6342/NTU202102364
between ribs and spine. Moreover, 400 HU is a common threshold for bone segmentation.
Therefore, the choices of s and t are suitable in our problem.
2.3 Internal Air Segmentation
Air is a crucial information in a CT image. We can easily recognize the lung since it
contains a lot of air which has low intensity and is black in the image. Moreover, the inter
nal air signal is another important information to align the body parts between CT images.
Therefore, we introduce some image processing techniques to segment the internal air. In
this section, we assume that the external air is connected, the external air and the internal
air are not connected, and the volume of the external air is greater than the volume of the
internal air in CT.
Here are two notes about these assumptions. First, the the internal air and the external
air are not connected in CT in almost all cases, even though the respiratory system is a
path connects the internal air and the external air. Second, the 3D volume of external air
is greater than that of internal air in CT, even though it may be false in a specific single
axial slice.
The main idea to segment internal air is to segment the external air first and obtain the
internal air by removing the external air. We use the grayscale value −700 as a threshold
to apply global thresholding to segment air. In this case, voxels of intensity smaller than
−700 HU are what we need. By assumption 1 and 2, the external air is a connected
component of the air segmentation, and assumption 3 implies the external air is the largest
component of the air segmentation. Hence, we compute the largest component of the air
segmentation to obtain the external air. Finally, we obtain the internal air by removing the
16 doi:10.6342/NTU202102364
external air from the air segmentation.
The performance of this method is good in almost all cases. As we discussed in previ
ous sections, contrastenhanced structures may be confused with bone, in either intensity
or connectivity. Hence, we survey a lot of methods to solve this problem. Fortunately,
there is no other tissue would be confused with air. Therefore, the typical approaches in
image processing is good enough to segment the internal air in CT images.
2.4 Bone Sequences, Internal Air Sequences, and Trans
formations
Our goal is body parts registration by align bone signals and air signals. Hence, we
will introduce the bone sequences and the internal sequences in this section. By calculating
the numbers of bone pixels and internal air pixels in each axial slice, we obtain two se
quences indexed by the zaxis coordinate of CT. The formal definition of bone sequences
and air sequences have shown in Definition 2.3. The bone sequences and air sequences
can be regarded as discrete version of bone signals and air signals which are usually con
sidered in continuous manner. Unfortunately, the two sequences might be noisy due to
the following reasons.
• The segmentations are not perfect
• We are using thick cut CT scan, which means the spacing of zaxis was 5 mm.
• We are using venous phase CT scan, which means the bone segmentation may in
clude some nonbone contrastenhanced structures.
17 doi:10.6342/NTU202102364
• The number of bone pixels is depending on the amount of calcium in the bone of a
patient.
To deal with these two noisy signals, we apply transformations to them for denoising
and recognizing patterns. We introduce the Gaussian filters and wavelet transform in the
following sections.
2.4.1 Smoothing by Gaussian Filters
Gaussian filters are widely used in signal processing. One of the most common use
of a Gaussian filter is the ability to denoise. We use Gaussian filters to smooth our signals.
Mathematically, we may define a Gaussian filter by convolution. We use the knowl
edge in [17] to describe convolution transformations. Let f1 , f2 : R → R be two signals.
Assume f1 and f2 are measurable. The convolution of f1 and f2 is defined by
Z
(f1 ∗ f2 )(x) = f1 (t)f2 (t − x)dt, x ∈ R,
R
provided the integral exists. Let K : R → R be a L1 function and define T (f ) = f ∗K, for
f ∈ Lp (R). In which case, we call T a convolution transformation and K the kernel. Then
T is a linear transformation from Lp (R) to Lp (R). Furthermore, assume K is a smooth
function. Then [T (f )]′ = f ∗ K ′ for f ∈ Lp (R). That is, the convolution transformation
transforms a Lp function to a smooth function provided by the kernel smooth.
For the selfcontainment, we quote Theorem 9.9 in Section 9.2 of [17].
Z
∞
Theorem 2.9 (Zygmund [17]). Let K ∈ L (R) ∩ L (R) with
1
K = 1, and suppose
R
18 doi:10.6342/NTU202102364
K(x) = o(1/|x|) as |x| → ∞. For ϵ > 0, define
1 x
Kϵ (x) = K( )
ϵ ϵ
and let fϵ = f ∗ Kϵ , where f ∈ L1 (R). Then fϵ → f as ϵ → 0+ at each point of continuity
of f .
This theorem is roughly saying a kernel function that decay rapidly in infinity induces
an approximation of identity for continuous functions by convolution transformations.
1
As a special case, we let K(x) = √ e−x be the Gaussian function, then we call
2
π
the convolution transformation induced by the kernel Kϵ a Gaussian filter. Note that K ∈
L1 (R) with
Z Z Z ∞
1
e−x dx = 1.
2
∥K∥1 = |K| = K=√
R R π −∞
1
Since K(x) ≤ √ , we have that K ∈ L∞ (R). Finally, since
π
K(x) |x|
= √ e−x → 0 as |x| → ∞,
2
1/|x| π
we conclude K(x) = o(1/|x|) as |x| → ∞. Hence, K satisfies all conditions in Theorem
2.9. Therefore, Gaussian filters are approximations of identity for continuous functions
Moreover, since Kϵ is smooth, the convolution f ∗Kϵ is also smooth. Therefore, Gaussian
filters are precisely smoothing signals.
To summarize properties of Gaussian filters, f ∗ Kϵ is a smooth approximation of
a continuous function f with a lower noise level. Under the sense of convergence, we
may say the Gaussian filter preserves some patterns of f . Smoothness is an important
reason why we use Gaussian filters. Smooth approximation obtained by Gaussian filters
19 doi:10.6342/NTU202102364
make the L2 distance of bone sequences (or air sequences) between patients robuster. As
a result, the distance minimization described in Section 2.1 is robuster. In applications,
an discrete approximation of Gaussian function is used since the true Gaussian function
has infinite support.
2.4.2 Feature Extraction by Wavelet Transforms
Wavelet transform is a technique in signal process and is widely used in engineering.
It is also used for edge detection in image processing. We introduce wavelet transform in
this section.
Z ∞
Let ψ ∈ L (R) with
2
ψ = 0. Define ψ ab by
−∞
t−b
ψ ab (t) = a−1/2 ψ( ), for a > 0, b ∈ R (2.2)
a
and ψjk by
ψjk (x) = 2j/2 ψ(2j x − k), for j, k ∈ Z. (2.3)
For a function f ∈ L2 (R), the continuous wavelet transform is defined as
Z ∞
W (f )(a, b) = f ψ ab , for a > 0, b ∈ R (2.4)
−∞
and the discrete wavelet transform is defined by
Z ∞
wjk = f ψjk , for j, k ∈ Z. (2.5)
−∞
We say ψ is the mother wavelet and ψ ab and ψjk are wavelets. If {ψjk }j,k∈Z forms an
orthonormal basis for L2 (R), then we say ψ is an orthonormal wavelet and wjk is called
20 doi:10.6342/NTU202102364
the wavelet coefficient.
Although not every mother wavelet is an orthonormal wavelet, wavelet transform still
extract some important features of a given function. Moreover, since the scale factor (a and
j) makes the wavelets be of different widths, wavelet transform can capture information
in different scale. Due to the ability of wavelet transform to handle multiscale problems,
it can be used to recognize the patterns of a signal. Applying wavelet transform by using
wavelet with small width tends to extract local features, which is often high frequency
wave and is possible noisy. And applying wavelet transform by using a wavelet with large
width tends to extract wide features, which is often low frequency wave and recognize only
rough patterns such as monotone of a function on some intervals.
Let a1 , a2 , ..., ap > 0 be some scale factors with p ∈ N. For signal f , define the
transformation T as mentioned in Section 2.1 by
T (f )(t) = (W (f )(a1 , t), W (f )(a2 , t), ..., W (f )(ap , t)). (2.6)
Note that the scale factors ai ’s are expected to extract features from input signal and filter
out the high frequency wave for denoising.
In this work, we use wavelet transform with Mexican hat (or the Ricker wavelet)
mother wavelet, which is defined as
2 2
ψ(t) = √ 1 1 − t2 e−t /2 ,
3π 4
to extract some features from observed bone sequences and internal air sequences. Based
on some observations from our experiments, we find that the scale factors small than 3 are
too noisy and the scale factors larger than 10 tend to detect redundant features. Therefore,
21 doi:10.6342/NTU202102364
we choose the scale factors ai = i + 2 for i = 1, 2, ..., 8. In which case, p = 8.
2.5 Curve Registration
Curve registration is a crucial step in our pipeline. We align the body parts between
cases by the registration between bone signals and air signals.
There are many studies in functional data analysis about curve registration. For ex
ample, [6] aligns curves by solving a global minimization problem and [2] matches curves
by aligning landmarks.
However, there are two critical issues in our situation. First, each CT images may
capture different body parts, and hence the bone sequences (and air sequences) usually
contain different landmarks between images. Second, we have two kinds of information
to use, i.e. bone sequences and air sequences, so the formulation in [6] cannot be directly
used.
Even though all CT images are processed so that they are of size 512 × 512 in axial
view and are of spacing 5 mm in zaxis for thick cut images, they still have its essential
scale. For example, heights and body proportions are different over cases. So CT images
have various numbers of axial slices even if some of them capture the same body parts.
Hence, we consider a one dimensional affine transformation h(t) = at + b, t ∈ R, as our
warping function, where a and b are in some compact intervals, respectively, to match the
essential scales over two cases.
Here are two notes about our method. First, affine transformation cannot adjust the
body proportion. However, finding a nonlinear transformation usually needs some accu
22 doi:10.6342/NTU202102364
rate landmarks, but landmarks are not easy to locate and to recognize due to different body
parts between CT images as previous discussion. More precisely, although [16] proposes
a good approach to finding landmarks, selection of landmarks for registration in our situa
tion is still an issue. Therefore, we don’t consider nonlinear transformations in this work.
Second, we have considered 2D or 3D bone registration, but the registration is hard to do.
Although there is a standard brain in medical science, the bone structures are so various
between people. Hence, the loss function corresponding to the 2D or 3D optimization
problem is very nonconvex. Nevertheless, standard bone signals and standard air signals
are more convincing in medical knowledge.
Assume that f1 and f2 are bone signals and g1 and g2 are air signals. The notation ∥·∥2
denotes 2norm. Let F be a function space. For instance, F can be Lp (R). If T : F →
F is a transformation, then we align body parts by solving the following optimization
problem:
min(1 − λ) ∥T (f1 ) − T (f2 ◦ h)∥22 + λ ∥T (g1 ) − T (g2 ◦ h)∥22 (2.7)

h∈H
where
• H = {h : [0, 1] → R | h(t) = at + b where a ∈ I, b ∈ R}, I is a compact interval
contained in (0, ∞)
• λ ∈ [0, 1] is a parameter
23 doi:10.6342/NTU202102364
If T : F → F p for some p ∈ N, say
(1) (1)
T (f1 ) = (w1 , w2 , ..., wp(1) )
(1) (1)
T (g1 ) = (v1 , v2 , ..., vp(1) )
(2) (2)
T (f2 ◦ h) = (w1 , w2 , ..., wp(2) )
(2) (2)
T (g2 ◦ h) = (v1 , v2 , ..., vp(2) )
(j) (j)
where wi ∈ F and vi ∈ F , for i = 1, 2, ..., p and j = 1, 2, then we align body parts by
solving the following optimization problem:
X
p 2 2
(1) (2) (1) (2)
min (1 − λ) wi − wi + λ vi − vi (2.8)
h∈A 2 2
i=1
where
• H = {h : [0, 1] → R | h(t) = at + b where a ∈ I, b ∈ R}, I is a compact interval
contained in (0, ∞)
• λ ∈ [0, 1] is a parameter
The interval I is chosen to be [0.8, 1.2] in this work. Note that for b with large absolute
value |b|, the warping function h will make f1 ◦ h = 0 and g2 ◦ h = 0 on [0, 1] since the
translation factor b move the curves outside the window. Thus
supp(f1 ) ∩ supp(f2 ◦ h) = supp(g1 ) ∩ supp(g2 ◦ h) = ϕ
where supp(f ) = {x ∈ R : f (x) ̸= 0} for f : R → R. Hence, b can be also restricted
to a compact interval that depends on a. Since heights of almost all people are in some
fixed range, we may solve the minimization problem in a reasonable time even by grid
24 doi:10.6342/NTU202102364
search. Moreover, grid search can be easily parallelly computed, so we may accelerate the
computation by parallelly computing in practice. Therefore, we use grid search to solve
the minimization problem in this thesis.
2.6 Deep Learning
Deep learning has been the most popular topic in artificial intelligence. It is a kind of
machine learning algorithm which establishes a model from data. Moreover, deep learning
is a feasible framework so that it can be used in both supervised learning and unsupervised
learning. In this thesis, we use techniques in deep learning to establish models.
2.6.1 Machine Learning in Medical Image Analysis
Machine learning has been widely used in medical image analysis and it obtains many
good results in different problems. Classification, segmentation, and detection are main
topics in medical image analysis. Classifying normal and abnormal CT, segmenting organs
and tumors in CT, and detecting lesions are examples of these topics, respectively. All of
these three tasks need to extract features from images.
In traditional machine learning alrogithms, feature extraction of images relies on im
age annotation. For example, radiomics is a feature extraction approach that relies on
image annotation. By using both image and image annotation, radiomics computes the
intensitybased and texturebased features in ROI and analyze shapebased features of the
geometry properties of ROI. With the radiomics features, one may train a machine learn
ing model to classify images. Another approach of using radiomics is to divide a image
into patches and compute feature for each patch. This approach is usually used in classifi
25 doi:10.6342/NTU202102364
cation and the image annotation is used to assign a label for each patch. One possibility is
to label both the organ and tumors. A machine learning algorithm is then used to build up
a model to distinguish patches that contain tumors from patches that does not contain tu
mors. There are several possible choices of machine learning algorithms, such as k nearest
neighbors (KNN), support vector machine (SVM), random forest, XGboost. The outputs
of this approach patchbased results. One may summarize the patchbased results in a
heat map and make conclusions by the heat map. Heat maps not only provide explainable
information but also can be used to generate patientbased prediction.
However, as we previous discussed, image annotation is not a good idea in our prob
lem. Therefore, we use deep learning to establish models rather than traditional machine
learning.
2.6.2 Artificial Neural Networks
Deep learning has shown its power in many fields by constructing artificial neural
networks (ANN) of multiple layers. We introduce ANN and some related concepts in this
section.
First, we formally define the ANN. A singlelayer neural network is a parametrized
function
F (x; A, b) = f (Ax + b) for x ∈ Rn ,
where f is a nonlinear continuous function, A ∈ Rm×n , and b ∈ Rm . An ANN or a
multilayer neural network is a composition of singlelayer neural networks. Suppose that
L ∈ N and n0 , n1 , n2 , ..., nL ∈ N. For i = 1, 2, ..., L, let fi : Rni → Rni be a non
linear continuous function, Ai ∈ Rni ×ni−1 , and bi ∈ Rni ×1 . For i = 1, 2, ..., L, define a
26 doi:10.6342/NTU202102364
singlelayer neural network as
Fi (x; Ai , bi ) = fi (Ai x + bi ) for x ∈ Rni−1 ×1 .
Then an ANN is defined as the composition FL ◦ FL−1 ◦ ... ◦ F1 . We denote the ANN
by F (x; θ) where x ∈ Rn0 ×1 and θ is the parameter consisting of Ai ’s and bi ’s for conve
nience. In which case, L is called the number of layers, ni is called the number of neurons
in ith layer, fi is called an activation function, Ai is called a weight matrix (or simply
weight), and bi is called a bias vector (or simply bias), for i = 1, 2, ..., L.
A popular choice of activation functions is the Rectified Linear Unit (ReLU), which
is defined by 



x if x ≥ 0
ReLU(x) = max{x, 0} = x∈R



0 if x < 0
A neural network uses ReLU as its activation function at every layer is called a ReLU
network.
The configuration of an ANN means the number of layers and numbers of neurons
of the ANN. With a fixed configuration, we would like to find some good weights and
biases for the dataset and the task. Loss functions are smooth functions that evaluate how
good an ANN is. The choice of loss function depends on tasks. Given a dataset and a task.
We treat the loss function as the objective function and weights and biases as independent
variables. By minimizing the loss function, we may find optimized weights and biases.
Solving this minimization problem is called training.
The terms parameters and hyperparameters mean variables that are determined by
training and variables that are defined at the first stage, respectively. Note that the con
27 doi:10.6342/NTU202102364
figuration is defined at the first stage and the weights and biases are randomly initialized
and are optimized during training. As a result, weights and biases are parameters and the
number of layers, numbers of neurons are kinds of hyperparameters.
A loss function depends on the task we are given. In a classification task, we are
given a dataset {(xi , yi )}N

i=1 , where N is the number of data, xi ∈ R is an input data, and
n
yi ∈ {0, 1} is the label of corresponding xi , for i = 1, 2, ..., N . Let F be an ANN and
ŷi = F (xi ; θ) for i = 1, 2, ..., N , where θ denotes the parameters. In this case, a typical
loss function L is defined by
L(ŷi , yi ) = − (yi ln(ŷi ) + (1 − yi ) ln(1 − ŷi )) for i = 1, 2, ..., N.
The loss function L is called the binary crossentropy loss. Note that ŷi ’s are functions of
XN
θ. Let L(θ) = L(ŷi , yi ). Then, we solve the following minimization problem to find
i=1
good θ.
min L(θ) (2.9)

θ
Here are some natural questions.
• What kinds of functions can an ANN approximate?
• How do we solve this minimization problem?
For the first question, Universal Approximation Theorem for WidthBounded ReLU
Networks has shown that ANN with ReLU activations has the ability to approximate any
Lp function. [15] For the second question, gradient descent is a common method to solve
minimization problem for differentiable objective functions. Gradient descent update the
28 doi:10.6342/NTU202102364
parameters at each iteration by the recursive formula
θi+1 = θi − η∇θ L(θi ) for i = 1, 2, ...
where θj is the parameter at the jth iteration for j ∈ N and η > 0 is called the step
size or the learning rate. However, since the number of data N and the dimension of
X
N
input data n are usually large. It is difficult to compute L(θ) = L(ŷi , yi ) as well as its
i=1
gradient. Therefore, minibatch stochastic gradient descent is widely used in deep learning
algorithms. In which case, we update the parameter by using a batch of data instead of
using the whole dataset. It has been shown that the minibatch stochastic gradient descent
with adaptive learning rate η converges to some local minimum.
Although ReLU networks have ability to approximate Lp functions, the other activa
tion functions are still play some roles. In fact, Parametric Rectified Linear Unit (PReLU)
[8], defined by




x if x ≥ 0
PReLU(x) = max{x, 0} + a min{x, 0} = x∈R



ax if x < 0
where a > 0 is a parameter, has been shown that it can improve model fitting in the. Note
that the parameter a can also be trained during the training process and [8] has derived the
update formulation.
ANN is one of the most important concepts in deep learning. However, realworld
applications do not use the plain ANN. Convolutional neural network is an extension of
the concept of ANN that is widely used in computer vision. We introduce convolutional
neural networks in the following section.
29 doi:10.6342/NTU202102364
2.6.3 Convolutional Neural Networks
ANN can approximate many functions, but it is not applicable for image analysis
in practice. Computational cost is a big issue in this situation since images are repre
sented by large arrays and so the numbers of neurons. Moreover, using a ANN with dense
weights would destroy the structure of the input being an image. Hence, Convolutional
neural networks (CNNs) are widely used in image tasks. Note that singlelayer neural net
work is a composition of a linear transformation, a translation, and an activation function.
A convolution transformation (with a kernel) is also a linear transformation. CNN is a
special case of ANN that replaces general linear transformations in ANN by convolution
transformations.
There are many advantages of using CNN in image tasks. We list three advantages
as follow.
1. A convolution transformation parametrized by its kernel, which is a k × k matrix
or a k × k × k tensor, where k ∈ N. In practice, k is far less the size of images, for
example k = 3. Hence, using CNN can reduce the numbers of parameters.
2. A convolution transformation extracts local features and takes the location into con
sideration. Hence, using CNN can preserve structures of images.
3. The matrix representation of a convolution transformation is a circulant matrix,
which has many powerful properties for computation.
Due to these advantages, CNN has became state of the art in image tasks. In this thesis,
we focus on applications of CNN in medical image analysis. We use some configurations
that has validated in ImageNet and train it for the ovarian tumor classification task.
30 doi:10.6342/NTU202102364
2.6.4 Dataset and Data Splitting
We use the dataset maintained from Linkou ChangGung Memorial Hospital to es
tablish models. After some rechecks and quality controls by Dr. Lin, Gigin, the dataset
contains 411 CT images obtained from 401 patients that consists of 161 cancerous cases
and 240 benign cases.
In this thesis, we split the patient list into 5fold training sets and an extra test set.
First, we keep a test set of 81 patients. This test set does not involve in any training and
validation process. It is only used in testing. Second, we split the remaining 320 patients
into 5 folds in a stratified manner. Each fold contains 64 patients. We perform a 5fold
crossvalidation in the folds data and finally test the models on test set.
fold 1 fold 2 fold 3 fold 4 fold 5 test set

patients 64 64 64 64 64 81
images 67 64 64 66 66 84
Table 2.1: The numbers of patients and images of folds.
training validation test

fold 1 260 67 84
fold 2 263 64 84
fold 3 263 64 84
fold 4 261 66 84
fold 5 261 66 84
Table 2.2: The numbers of data in each fold. The test sets in all lists are identical. We keep
the test set unseen for final test. For i = 1, 2, 3, 4, 5, list i regards fold i as the validation
set and the others as the training set.
2.6.5 Training
The training process of deep learning model is precisely solving an optimization prob
lem which aims to minimize the loss function. Note that the minimization problem has no
analytic expression and the computation complexity is very high. A common approach to
31 doi:10.6342/NTU202102364
solve this optimization problem is stochastic gradient descent. However, the loss function
is usually very nonconvex, so there are some studies aim to improve the convergence of
stochastic gradient descent and Adam [11] is the most popular one. Adam is a widely used
optimizer that combine AdaGrad [5] and RMSProp [20]. It has been shown that Adam has
high stability and high efficiency. NovoGrad [7] uses layerwise gradient normalization
to improve the performance and combines the advantages of stochastic gradient gradient
descent and Adam. In our experiemnts, we use NovoGrad as our optimizer.
By using the registration methods discussed in previous sections, we can extract some
specific body parts from CT images. Two body parts selection strategies are used in pre
processing, using only pelvis or using pelvis and lower abdomen.
As we discussed in Section 2.6.4, we perform a 5fold cross validation and test the
resulting models on the test set. At the ithstage, we treat the ith fold as the validation set
and treat the other 4 folds as training set. We train models on the training set and monitor
the training process by evaluating the models on the validation set for each epoch. After
the training process, we evaluated the models on the test set. Finally, we compared the
performance between models as well as body parts selection strategies.
We choose Densenet121 [9] with dropout [19] rate 0.2 as the classification model
in the experiments and choose NovoGrad as the optimizer as previous discussed with
learning rate 3 × 10−4 and weight decay 10−4 with a batch size 8. Also, we use the cosine
annealing [14] learning rate scheduler to decay the learning rate during training, where the
decay period was 50 and the minimal learning rate was 10−6 and trained these models for
1000 epochs.
Each data is resampled to spacing 1mm × 1mm × 1mm for a uniform spacing. And
32 doi:10.6342/NTU202102364
we use center crop or padding to obtain an array from an image of uniform spacing. For
the case of cropping to pelvis, the size of the array is 224 × 224 × 192. For the case of
cropping to pelvis and lower abdomen, the size of the array is 224 × 224 × 256. Then we
apply random 3D affine transform Gaussian noise for data augmentation. The parameters
π
of random 3D affine transform are maximal rotation angle , maximal shear range 0.2,
12
maximal translation range 0.1, and trilinear interpolation.
33 doi:10.6342/NTU202102364
34 doi:10.6342/NTU202102364
Chapter 3 Results and Discussion
We discuss the results of the proposed pipeline in this chapter.
3.1 Bone Segmentation
We assume the bone is a connected component in Section 2.2.2 and Section 2.2.3
since the human bone is connected. However, due to some reasons in CT imaging tech
niques, the assumption does not always hold. For example, the older CT scan usually had
its spacing 1 mm in zaxis, which is called thin cut image. But nowaday, the CT scan usu
ally has its spacing 5 mm in zaxis by using compression techniques for saving storage.
Therefore, the bone in CT scan may not be connected. Hence, we perform a morpho
logical closing to retouch the segmentation obtained from global thresholding in practice.
Although we expect that the closing binary segmentation is connected, the connectivity
assumption may still fail.
Without the connectivity assumption, the bone segmentation obtained by using the
largest component method may fail. In fact, the segmentation usually omits some parts
of bone, such as mandible. In some cases, the segmentation may even omit the spine
and ribs, which means the segmentation preserve pelvis only. In which case, the bone
sequence cannot represent the pattern of the amounts of bone in axial slices. Moreover,
35 doi:10.6342/NTU202102364
the graphcut method assume the graph is connected, which is equivalent to connectivity
assumption of bone segmentation in some sense of connectivity. Hence, using graphcut
method to segment 3D whole bone does not make sense if the connectivity assumption
fails.
Since our main purpose is to obtain a brief approximation of the amount of bone in
each axial slice and that recognize the pattern by the next step, registration by wavelet
transform. Moreover, there are large numbers of CT images to be analyzed and thus the
computational performance of the pipeline shall be more efficient in medical practices.
Hence, we decide to use hysteresis thresholding for its efficiency and good enough per
formance.
As we previous discussed, the choice of the low threshold is based on medical knowl
edge. In fact, the CT value 400 HU is a common threshold for bone. The choice of the high
threshold is based on experimental observation. We observe from the case shown in 3.1
that the bone voxels are usually of intensities greater than 800 while the contrastenhanced
structures are usually of intensities less than 650.
Figure 3.1: An example of a CT slice in axial view and its bone segmentation obtained by
global thresholding
Note that the set of foreground pixels F in the segmentation consists of both spine
(at the center) and contrastenhanced structure (on the right). We can manually separate
36 doi:10.6342/NTU202102364
these two parts in this case and obtain the following results. In fact, there are 16 connected
components C1 , C2 , ..., C16 in the segmentation. By pi , we denote the 90th percentile of
the intensity histogram in Ci . Let
A = {i ∈ {1, 2, ..., 16} | pi > 650} and B = {i ∈ {1, 2, ..., 16} | pi ≤ 650}.
Then the we divide F into A and B. More precisely F = A ∪ B and A ∩ B = ϕ. A is
exactly the subset of bone pixels in F and B is exactly the subset of contrastenhanced
structure pixels in F . The separation result has shown in Figure 3.2.
Figure 3.2: Bone segmentation and contrastenhanced structure segmentation. In both

plots, black pixels mean the background. White pixels in the left plot mean the bone
pixels and that in the right plot mean the contrastenhanced structure. Gray pixels in the
left plot mean the contrastenhanced structure and that in the right plot mean the bone
pixels.
Based on the separation, we plot the histogram of the intensities in the bone set A and
that in the nonbone set B in Figure 3.3. Observe that the intensities of nonbone pixels
is less possible to be higher than a bound, say 650. On the other hand, the intensities of
bone pixels can achieve high intensities, such as 1000. Therefore, we use 800 as the high
threshold in hysteresis thresholding.
Hence, we decide to use hysteresis thresholding described in Section 2.2.4 to segment
bone. Figure 3.4, Figure 3.5, and Figure 3.6 show segmentation results of Case 1.
37 doi:10.6342/NTU202102364
Figure 3.3: Bone segmentation and contrastenhanced structure segmentation. The white
pixels mean the foreground and the gray and black pixels mean the background.
Figure 3.4: Original axial images of case 1.
38 doi:10.6342/NTU202102364
Figure 3.5: Bone segmentation in axial view of case 1.
Figure 3.6: Bone signal and bone segmentation in coronal view of case 1.
39 doi:10.6342/NTU202102364
Case 1 is one of regular cases, which means there is no distinct contrastenhanced
structure that interrupts the bone segmentation. Note that there is an M structure in the
pelvis. This fact is highly related to anatomy in medical science. The other patterns in
the bone signal also correspond to some bone structures and body parts. For example, the
numbers of bone pixels are small in the lower abdomen since the only bone in that part is
spine, and the numbers of pixels are increasing as ribs appear in the slices.
Case 2 is one of not regular cases, which means there are some distinct contrast
enhanced structures that interrupt the bone segmentation. CT slices of Case 2 are shown
in Figure 3.7. In fact, there are several contrastenhanced structures in pelvis, and we
may also see the segmentation obtained by hysteresis thresholding includes these non
bone pixels in Figure 3.8. One of our main purpose is to find the pelvis slices in CT scan,
but these contrastenhanced structures interrupt the pattern in bone signal. Accordingly,
the M structure is not perfect and the third peak appears in the bone signal Figure 3.9.
40 doi:10.6342/NTU202102364
Figure 3.8: Bone segmentation in axial view of case 2.
Figure 3.9: Bone signal and bone segmentation in coronal view of case 2.
41 doi:10.6342/NTU202102364
3.2 Internal Air Segmentation
We use the approach described in Section 2.3 to segment internal air. Fortunately,
air signals are not confusing during segmentation process. The most important features of
air is low grayscale values, such as −1000 HU. The other things in CT scan are usually
of grayscale values larger than −200 HU. For example, body fat is darker than many
tissue in CT, and the grayscale values of body fat usually lies in the interval [−70, −30].
Therefore, the performance of the approach described in Section 2.3 is good enough for
further analysis.
Figure 3.10 to 3.15 show segmentation results of Case 1 and Case 2. The slices
chosen in this section are near lung, which is the organ that contains most air in human
body. We may see the lung is segmented in Figure 3.12 and Figure 3.15. Gastrointestinal
tract may also contains air but the amount of air in gastrointestinal tract is far less than
that in lung. As a result, we may see a part of gastrointestinal tract is segmented in Figure
3.12. However, the whole lung is included in Case 2 and hence the gastrointestinal tract
is not clear in Figure 3.15.
42 doi:10.6342/NTU202102364
Figure 3.11: Internal air segmentation in axial view of case 1.
43 doi:10.6342/NTU202102364
Figure 3.12: Air segmentation in axial view of case 1.
44 doi:10.6342/NTU202102364
Figure 3.14: Internal air segmentation in axial view of case 2.
Figure 3.15: Air segmentation in axial view of case 2.
45 doi:10.6342/NTU202102364
3.3 Registration and Partition of Body Parts
As we mentioned before, the bone signal and internal air signal are often noisy, so
directly calculating the Euclidean distances between bone signal and internal air signal
does not make sense. Therefore, we need to apply some transformation to the signals first
for denoising and pattern recognition before we minimizing the distances between these
signals.
3.3.1 Preparation Stage
We use the following case as the reference image and the breakpoints of body parts
are shown in Figure 3.16. Then we compute the bone segmentation and the internal air
segmentation, and the segmentation results, the bone signal, and the air signal are shown
in Figure 3.17 and 3.18.
Figure 3.16: Bounds of body parts of the reference image.
46 doi:10.6342/NTU202102364
Figure 3.17: Air segmentation in axial view of the reference image.
Figure 3.18: Bone segmentation in axial view of the reference image.
47 doi:10.6342/NTU202102364
3.3.2 Gaussian Filters as the Transformations
We use the Gaussian filter with width ϵ = 2 (unit) in Section 2.4.1 for denoising.
The results of the reference image, Case 1, and Case 2 are shown in Figure 3.19 to 3.21.
Note that the air signals have a large maximum in the part of lung and the typical pattern
of the bone signal is present in Case 1.
Figure 3.19: Signals transformed by Gaussian filter of the reference case.
While Gaussian filter is denoising, it also removes some important potential patterns
in images. Notice that the M structure disappears in the reference case and Case 2. But the
M structure is an important pattern in bone signal as our previous discussion. Therefore,
we don’t use Gaussian filter as the transformation.
48 doi:10.6342/NTU202102364
Figure 3.20: Signals transformed by Gaussian filter of case 1.
Figure 3.21: Signals transformed by Gaussian filter of case 2.
49 doi:10.6342/NTU202102364
3.3.3 Wavelet Transforms as the Transformations
Due to the ability of wavelet transform to deal with multiscale problem, the wavelet
transform recognize the patterns of bone sequence and internal air sequence. Choosing a
suitable set of scale factors is an important part in this step. Besides, normalization is also
a necessary step. Since people have different body types, the range of bone sequences and
air sequences are also different. Hence, even if we assume two people have bone structure
only different in scales, the reference bone signal and the warping moving bone signals
may be far to each other in the sense of L2 norm. However, using either L2 norm or L∞
norm for normalization may get confused since every CT has different body parts. If a
CT contains lung, then the L∞ norm of a air signal can be large, for example, it may be
50000 HU. On the other hand, if a CT does not contain any parts of lung, then the L∞
norm is small, for example, it may be 4000 HU. A Similar situation also occurs in case of
the bone signal. So we use some percentiles to normalize the signals. More precisely, we
use the maximum of 75percentile of the bone sequence and 3000 HU to normalize the
bone signal. And we use the maximum of 97.5percentile of the air sequence and 4000 HU
to normalize the air signal. Taking the maximums is preventing the results from outliers.
These percentiles are expected to emphasize some landmarks, such as M structure in the
bone signal and the peak in the air signal.
Figure 3.22 shows an example of bone signal and its wavelet basis with normaliza
tion. If we use a lot of scale factors to form a basis, then there would be a lot of redundant
vectors. By choosing a suitable set of scale factors, we may see some import patterns in
the feature vectors. For example, the M structure and the local monotonicities appear in
some feature vectors of suitable scale factors. As we mentioned in Section 2.4, we use
50 doi:10.6342/NTU202102364
Figure 3.22: Bone signal and its features extracted by wavelet transforms.
{3, 4, ..., 10} as the set of scale factors to perform the wavelet transforms.
The timescale plots of Case 1 and Case 2 are shown in Figure 3.23 and 3.24. In
Case 1, we may see the M structures are captured by several transformed bone signals of
different scale factors. The M structures and the third peak related to the contrastenhanced
structures are captured by transformed bone signals in Case 2. Moreover, the peaks of the
air signals are captured by the transformed signals of both cases. In either bone signals or
air signals, due to the ability to deal with multiscale problem, wavelet transforms captures
more patterns than Gaussian filters. Therefore, we will use the transformation defined by
wavelet transforms to do the next step.
3.3.4 Registration
By solving the optimization problem (8) described in Section 2.4, we may register the
bone signals and the air signals between cases. The parameter λ is chosen to be 0.3 since
the main information should be given by the bone signals while air signals is regularizing
the registration results.
51 doi:10.6342/NTU202102364
Figure 3.23: Timescale plots of case 1.
Figure 3.24: Timescale plots of case 2.
52 doi:10.6342/NTU202102364
Figure 3.25: Registration results of case 1. The bone structure of the reference image and
the transformed bone structure of the moving image.
Recall that Case 1 is a regular case. The pattern of bone structure is complete in both
segmentation and the bone signal. The registration result as shown in Figure 3.25 is great
as we may expect.
On the other hand, since we use air signal just for regularization, the registration re
sults of internal air of Case 1 as shown in Figure 3.26 is not good. In fact, the maximums
of air signals align to each other due to our normalization approach. The registration result
may not be improve anymore since the lung structure in Case 1 does not be completely
captured. Moreover, we are not going to improve the registration result due to the fol
lowing reasons. First, the shape of lung is not fixed. In fact, lung is flexible. The shape
of lung is changing at every moment since we are breathing. Hence, registration of lungs
is difficult. Second, our purpose is the partition of body parts and the usage of air sig
nal is just regularizing the registration, so we do not focus on improving the internal air
53 doi:10.6342/NTU202102364
Figure 3.26: Registration results of case 1. The air structure of the reference image and
the transformed air structure of the moving image.
registration.
Although Case 2 is not so regular, the registration results are still good as shown
in Figure 3.27 and 3.28. Even though the transformed bone signals near pelvis are not
close enuough to the reference case, we still obtain a good body parts partition due to the
regularization of air signals. In fact, the internal air registration of Case 2 is great as shown
in Figure 3.28.
As we see in these examples, the air segmentation is usually good enough but the
bone segmentation may be bad. However, even though the bone segmentation seems not
regular as in Case 2, our body parts partition algorithm can still obtain a good registration
and hence a good body parts partition if the bone segmentation is not totally broken. We
show the final body parts partition of Case 1 and Case 2 in Figure 3.29 and Figure 3.30.
During the research, we listed 44 cases for the difficulty to segment bone or other
54 doi:10.6342/NTU202102364
Figure 3.27: Registration results of case 2. The bone structure of the reference image and
the transformed bone structure of the moving image.
Figure 3.28: Registration results of case 2. The air structure of the reference image and
the transformed air structure of the moving image.
55 doi:10.6342/NTU202102364
Figure 3.29: Body parts partition of case 1.
Figure 3.30: Body parts partition of case 2.
56 doi:10.6342/NTU202102364
problems, such as wrong body parts and wrong phases. And we discussed these 44 CT
images with Dr. Lin. Dr. Lin rechecked these cases and then labeled the body parts for
these cases. After that, 8 images were excluded in the analysis. Some of them are replaced
by correct ones and some of them are removed due to the wrong body part, such as chest
CT images.
We test body parts partition algorithm to the remaining 36 cases and compare the
partition obtained by the algorithm with the ground truth labeled by Dr. Lin. For each
case, we compute the absolute error (cm) between partitions of our algorithm and the
ground truth. We summarize the corresponding statistics as shown in Table 3.1.
It is clear that there are some outliers in these distributions. These cases basically are
of average errors larger than 10 slices. We leave the segmentation and partition results in
Appendices A.2. These cases contain low resolution image, leg CT, and artifact.
lpel upel mabd lchest mchest uchest

count 36 36 34 34 34 17
mean 5.53 5.07 2.42 2.07 3.13 1.42
std 16.10 16.05 2.67 3.01 2.82 1.45
min 0.06 0.07 0.07 0.04 0.02 0.09
Q1 1.05 0.52 0.81 0.50 1.56 0.51
Q2 2.39 1.52 1.53 1.25 2.63 1.01
Q3 3.75 3.10 2.80 1.77 3.75 2.08
max 98.24 97.07 11.01 12.89 12.07 6.14
Table 3.1: Error distribution of each body parts. We denote lower, middle, upper, pelvis,
and abdomen by l, m, u, pel, and abd.
We exclude outliers described above to obtain robust statistics as shown in Table 3.2.
The body parts partition results of outliers are shown in Appendix A.2 We are interested
in lpel, upel, mabd since the three breakpoints are used in our preprocessing of deep
learning models. The mean errors in lpel, upel, mabd are 2.48 cm, 1.73 cm, and 1.73
cm. And the respective standard deviations are 1.96 cm, 1.87 cm, and 1.46 cm. The results
57 doi:10.6342/NTU202102364
lpel upel mabd lchest mchest uchest
count 32 32 31 31 31 17
mean 2.48 1.73 1.73 1.25 2.37 1.42
std 1.96 1.87 1.46 1.00 1.40 1.45
min 0.06 0.07 0.07 0.04 0.02 0.09
Q1 1.02 0.49 0.78 0.47 1.43 0.51
Q2 2.23 1.20 1.33 1.02 2.37 1.01
Q3 3.29 2.17 2.51 1.73 3.34 2.08
max 7.27 9.33 6.91 4.48 5.50 6.14
Table 3.2: Error distribution of each body parts after excluding outliers. We denote lower,
middle, upper, pelvis, and abdomen by l, m, u, pel, and abd.
shows the algorithm has good enough performance for preprocessing of our deep learning
models.
3.4 Classification by Deep Learning
We use two cropping strategies to preprocess the CT images. The first one is cropping
to pelvis. That is, crop CT images to the region between lpel and upel determined by our
registration results. The second one is to crop images to pelvis and lower abdomen. That
is, crop CT images to the region between lpel and mabd determined by our registration
results.
cropping strategy validation test

upper pelvis 0.8601 ± 0.0414 0.8129 ± 0.0154
lower abdomen 0.8486 ± 0.0526 0.7891 ± 0.0300
Table 3.3: Mean AUCs for different stategies. This table shows means and standard devia
tions of validation AUCs and test AUCs. The means are standard deviations are computed
from 5fold cross validation results.
Table 3.3 shows the means and standard deviations of AUCs of 5 folds and we leave
the details in Table B.1. By cropping to upper pelvis, we obtain a mean validation AUC
0.8601 and a mean test AUC 0.8129. On the other hand, we obtain a mean validation AUC
0.8486 and a mean test AUC 0.7891 by cropping to lower abdomen.
58 doi:10.6342/NTU202102364
The first thing we may see is that the validation performance is better than test perfor
mance in either cases. Note that the validation set is used to monitor the training process
and we evaluate validation performance on each epoch end. Moreover, the model check
point is selected to have the highest validation AUC. Hence, it may overfit the validation
set when the dataset is not large enough. We may, in theory, reduce the phenomenon by
adding data.
Second, both the mean validation AUC and mean test AUC obtained by cropping to
upper pelvis are higher than that obtained by cropping to lower abdomen. Although there
are some features in lower abdomen that is related to ovarian cancer, it seems that mod
els are making decisions by organs in pelvis. Moreover, the standard deviations obtained
by cropping to upper pelvis are lower than that obtained by cropping to lower abdomen.
Therefore, cropping to upper pelvis gives robuster models and we choose it as our prepro
cessing strategy.
59 doi:10.6342/NTU202102364
60 doi:10.6342/NTU202102364
Chapter 4 Conclusion
In this thesis, we propose an analysis pipeline that does not need image annotations
of ovaries and ovarian tumors. We decide to preprocess the CT images by cropping the
images to some specific body parts to avoid image annotations. For this purpose, we
develop a body parts partition algorithm which only needs a few body parts annotations
to automatically obtain bodypart breakpoints in a query image. In our algorithm, we
use wavelet transform to smooth the bone signals and air signals as well as recognize
patterns. And we align body parts between the reference image and the moving image
by solving a minimization problem. Although the algorithm fails in some cases, such as
images with artifacts or cases that does not contain abdomen and chest parts, our body parts
partition algorithm still has a good enough performance as a preprocessing technique for
deep learning. Error in 5 body part breakpoints are of means approximately 2cm, which
is under a controllable level.
On the part of ovarian tumors classification, we preprocess CT images by two strate
gies, cropping to pelvis and to the union of pelvis and lower abdomen and train CNNs for
classification in a crossvalidation manner. We compare the performance between these
two strategies and find that cropping to pelvis is not only of a high mean test AUC but also
of a lower standard deviation. Therefore, we decide to crop CT images to pelvis for this
task. Moreover, the mean test AUC obtained by cropping to upper pelvis is 0.8129 and the
61 doi:10.6342/NTU202102364
standard deviation 0.0154 and it shows that CNN as well as our preprocessing approach
has the potential to classify ovarian tumor.
Some future works includes the following. The first one is improving preprocessing
pipeline, including the bone segmentation methodology and the computational efficiency
so that the pipeline has its capability to support medical practices. Second, find an optimal
parameter λ in the registration setting to minimize the error of registration results. Third,
use a set of reference images for different subsets of patients rather than one reference
image in the registration step. People with different conditions tend to have different types
of bone or some other structures. For example, the bone structures between a young man
and a old man are different. Therefore, we may divide people into groups and prepare a
reference image for each group to improve the registration performance. Fourth, improves
the model performance. In theory, we may improve the classification performance by
hyperparameter optimization.
Also, adding more data for training is another approach to improve the model per
formance.
62 doi:10.6342/NTU202102364
References
[1] U. R. Acharya, S. V. Sree, L. Saba, F. Molinari, S. Guerriero, and J. S. Suri. Ovarian
tumor characterization and classification using ultrasound—a new online paradigm.
Journal of digital imaging, 26(3):544–553, 2013.
[2] J. Bigot. Landmarkbased registration of curves via the continuous wavelet trans
form. Journal of Computational and Graphical Statistics, 15(3):542–564, 2006.
[3] Y. Boykov and G. FunkaLea. Graph cuts and efficient nd image segmentation.
International journal of computer vision, 70(2):109–131, 2006.
[4] J. Canny. A computation approach to edge detection. IEEE Trans. Pattern Anal.
Mach. Intell., 8(6):670–700, 1986.
[5] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning
and stochastic optimization. Journal of machine learning research, 12(7), 2011.
[6] T. Gasser and K. Wang. Synchronizing sample curves nonparametrically. The Annals
of Statistics, 27(2):439–460, 1999.
[7] B. Ginsburg, P. Castonguay, O. Hrinchuk, O. Kuchaiev, V. Lavrukhin, R. Leary, J. Li,
H. Nguyen, Y. Zhang, and J. M. Cohen. Stochastic gradient methods with layerwise
63 doi:10.6342/NTU202102364
adaptive moments for training of deep networks. arXiv preprint arXiv:1905.11286,
2019.
[8] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing
humanlevel performance on imagenet classification. In Proceedings of the IEEE
international conference on computer vision, pages 1026–1034, 2015.
[9] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected
convolutional networks. In Proceedings of the IEEE conference on computer vision
and pattern recognition, pages 4700–4708, 2017.
[10] S. E. Jung, J. M. Lee, S. E. Rha, J. Y. Byun, J. I. Jung, and S. T. Hahn. Ct and mr
imaging of ovarian tumors with emphasis on differential diagnosis. Radiographics,
22(6):1305–1325, 2002.
[11] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.
[12] M. Krčah, G. Székely, and R. Blanc. Fully automatic and fast segmentation of the
femur bone from 3dct images with no shape prior. In 2011 IEEE international
symposium on biomedical imaging: from nano to macro, pages 2087–2090. IEEE,
2011.
[13] H. Lamecker, M. Seebass, H.C. Hege, and P. Deuflhard. A 3d statistical shape
model of the pelvic bone for segmentation. In Medical imaging 2004: Image
processing, volume 5370, pages 1341–1351. International Society for Optics and
Photonics, 2004.
[14] I. Loshchilov and F. Hutter. Sgdr: Stochastic gradient descent with warm restarts.
arXiv preprint arXiv:1608.03983, 2016.
64 doi:10.6342/NTU202102364
[15] Z. Lu, H. Pu, F. Wang, Z. Hu, and L. Wang. The expressive power of neural networks:
A view from the width. arXiv preprint arXiv:1709.02540, 2017.
[16] J. O. Ramsay and X. Li. Curve registration. Journal of the Royal Statistical Society:
Series B (Statistical Methodology), 60(2):351–363, 1998.
[17] A. Z. Richard L. Wheeden. Measure and Integral. An Introduction to Real Analysis.
CRC Press, 2015.
[18] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions
on pattern analysis and machine intelligence, 22(8):888–905, 2000.
[19] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov.
Dropout: a simple way to prevent neural networks from overfitting. The journal
of machine learning research, 15(1):1929–1958, 2014.
[20] T. Tieleman and G. Hinton. Lecture 6.5 rmsprop, coursera: Neural networks for
machine learning. Technical report, 2012.
[21] A. Vlahou, J. O. Schorge, B. W. Gregory, and R. L. Coleman. Diagnosis of
ovarian cancer using decision tree classification of mass spectral data. Journal of
Biomedicine and Biotechnology, 2003:308–319, 2003.
[22] J. Zhang, C.H. Yan, C.K. Chui, and S.H. Ong. Fast segmentation of bone in ct im
ages using 3d adaptive thresholding. Computers in biology and medicine, 40(2):231–
236, 2010.
65 doi:10.6342/NTU202102364
66 doi:10.6342/NTU202102364
Appendix A — Outliers in Error
Analysis
A.1 Introduction
In error analysis of our partition algorithm, we find there are 4 outliers in our results.
We remove the 4 outliers for a robust mean and standard deviation. Also, we analyze the 4
outliers and give some explanation for them. We plot the partition results of these 4 cases
in this appendix.
A.2 Body Parts Predicted by Our Algorithm
The first one is a low resolution image as shown in Figure A.1. The ribs are not
included in the bone segmentation, so the bone signal does not present the information
about ribs. Moreover, only a little part of lung is captured in this image, so the air signal
cannot regularize the registration as we expected. In summary, the bone segmentation is
not good enough and only a little part of lung is captured, so the registration result is bad.
There are some artifacts in the second case as shown in Figure A.2. Hence, we may
see that the large number of ”bone pixels” detected by the hysteresis bone segmentation.
67 doi:10.6342/NTU202102364
Figure A.1: Outlier 1 removed in Table 3.2.
And therefore, the bone signal does not present the true patterns of bone. The bone signal
is the most important part in our body parts partition algorithm. As a result, the registration
result is bad.
The third case, as shown in Figure A.3 is not so strange. However, it seems that some
breakpoints such as breakpoints in chest are not correct. One reason for this case is that
this image captures only a little part of lung. Therefore, the registration of air sequences
is somehow interrupting the results.
The fourth case is a leg CT, as shown in Figure A.4. So the air signal cannot provide
any information. Furthermore, it may interrupt the registration since it tries to align the
abdomen of this case to the lung of reference case. Therefore, our algorithm align the
pelvis of the reference case to the feet of this case and align the lung of the reference case
68 doi:10.6342/NTU202102364
to the lower abdomen in this case.
69 doi:10.6342/NTU202102364
70 doi:10.6342/NTU202102364
Appendix B — CrossValidation Results
B.1 Introduction
We show the details of cross validation results in this appendix.
B.2 Results
metric strategy list 1 list 2 list 3 list 4 list 5 mean std

val AUC upel 0.8902 0.8103 0.9124 0.8533 0.8343 0.8601 0.0414
val AUC labd 0.9325 0.7964 0.8527 0.8485 0.8129 0.8486 0.0526
test AUC upel 0.8157 0.8303 0.8236 0.7927 0.8020 0.8129 0.0154
test AUC labd 0.7901 0.8347 0.7720 0.7942 0.7545 0.7891 0.0300
val acc upel 0.8209 0.7344 0.8281 0.8030 0.7273 0.7827 0.0483
val acc labd 0.8358 0.7031 0.7813 0.6212 0.7576 0.7398 0.0816
test acc upel 0.7500 0.7143 0.7143 0.7143 0.7262 0.7238 0.0155
test acc labd 0.7024 0.7381 0.6905 0.6429 0.7143 0.6976 0.0353
Table B.1: This table shows metrics obtained in details. The threshold is simply chosen
as 0.5 for computing accuracy. Here, we denote validation, accuracy, upper pelvis, and
lower abdomen by val, acc, upel, labd, respectively, for short.
In view of Table B.1, we may see metrics obtained by cropping to upper pelvis are
higher than or approximately equal to that obtained by cropping to lower abdomen in many
folds. Based on the results, cropping to pelvis is a better strategy in our experiments.
71 doi:10.6342/NTU202102364

電腦斷層之小波分析身體部位分割與深度學習卵巢腫瘤分類

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

電腦斷層之小波分析身體部位分割與深度學習卵巢腫瘤分類

Uploaded by

Copyright:

Available Formats

國立臺灣大學理學院數學系

National Taiwan University

Body Parts Partition by Wavelet Transform and Ovarian

Advisor: Weichung Wang Ph.D.

料集，裁切出與卵巢癌有關的盆腔及下腹腔。此資料集包含 240 個良性腫瘤案

kinds of cancers. An analysis pipeline and a machine learning classification model by

using CT images can be widely used in many situations.

benign ovarian tumors.

Keywords: Computed Tomography, Wavelet Transform, Body Parts Partition, Deep

List of Figures xiii

2.2 Bone Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Global Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.2 The Largest Connected Component Method . . . . . . . . . . . . . 12

2.2.3 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.4 Hysteresis Thresholding . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Internal Air Segmentation . . . . . . . . . . . . . . . . . . . . . . . 16

2.4 Bone Sequences, Internal Air Sequences, and Transformations . . . . 17

2.4.1 Smoothing by Gaussian Filters . . . . . . . . . . . . . . . . . . . . 18

2.5 Curve Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6.1 Machine Learning in Medical Image Analysis . . . . . . . . . . . . 25

2.6.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . 26

2.6.3 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . 30

2.6.4 Dataset and Data Splitting . . . . . . . . . . . . . . . . . . . . . . 31

Chapter 3 Results and Discussion 35

3.1 Bone Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 Internal Air Segmentation . . . . . . . . . . . . . . . . . . . . . . . 42

3.3 Registration and Partition of Body Parts . . . . . . . . . . . . . . . . 46

3.3.1 Preparation Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3.2 Gaussian Filters as the Transformations . . . . . . . . . . . . . . . 48

3.3.3 Wavelet Transforms as the Transformations . . . . . . . . . . . . . 50

3.4 Classification by Deep Learning . . . . . . . . . . . . . . . . . . . . 58

Appendix A — Outliers in Error Analysis 67

A.2 Body Parts Predicted by Our Algorithm . . . . . . . . . . . . . . . . 67

2.1 Three views in CT scan. . . . . . . . . . . . . . . . . . . . . . . . . . . 6

A.1 Outlier 1 removed in Table 3.2. . . . . . . . . . . . . . . . . . . . . . . 68

2.1 The numbers of patients and images of folds. . . . . . . . . . . . . . . . 31

detection of ovarian cancer is a crucial task.

Computed tomography (CT) is a three dimensional imaging technique and is a com­

computer­aided diagnostic (CAD) techniques and classified ovarian tumors by decision

However, using CT images to establish machine learning models to classify ovarian

tumors is still not well­investigated. In this thesis, we build up an analysis pipeline of

ovarian tumors by using CT images.

sification ovarian tumors in by machine learning still seems challenging. An important

more applicable in medical practice.

crucial part of ovarian tumors classification.

another deep learning model nowaday.

algorithm is based on the following main steps.

1. Obtain a query case

6. Crop to target parts

By this algorithm, we may automatically preprocess CT images before model training

ovaries and ovarian tumors.

distinguish the cancerous ovarian tumors from benign ones.

Definition 2.1. Let m, n, p ∈ N and let [k] = {0, 1, 2, ..., k − 1} for k ∈ N.

1. A function I : [m] × [n] × [p] → R is called a (three­dimensional) image of size

2. Suppose I is an image of size m × n × p. The spacing (of I) in x­axis is the width

3. Suppose I is an image of size m × n × p. If S is an image of the same size with I,

whose range is contained in {0, 1}, then we say S is a binary image.

1. We may omit the size of an image if no confusion.

2. CT images are three­dimensional images, so we usually plot a cross section of the

512 × 512 × p where p ∈ N. p is often determined by the scanned body parts.

4. We use a binary image S of size m × n × p to present the segmentation of some

ROI in a given image. More precisely, the preimages

{(x, y, z) ∈ [m] × [n] × [p] : S(x, y, z) = 1}

Computed tomography (CT) is a three dimensional imaging technique and is a com

computeraided diagnostic (CAD) techniques and classified ovarian tumors by decision

tumors is still not wellinvestigated. In this thesis, we build up an analysis pipeline of

1. A function I : [m] × [n] × [p] → R is called a (threedimensional) image of size

2. Suppose I is an image of size m × n × p. The spacing (of I) in xaxis is the width

2. CT images are threedimensional images, so we usually plot a cross section of the

Registration is aligning signals and is usually formulated by an optimization prob

usually assumed to have an integrable secondderivative [16].

A graphbased segmentation approach is proposed by Y. Boykov and G. Funka

In fact, a common threshold of bone in CT is a grayscale value 400 HU (Hounsfield

an approach to retouch the segmentation if it is noisy. The concept of connected compo

1. We say (x′ , y ′ , z ′ ) is 6connected to (x, y, z) if ∆x + ∆y + ∆z = 1.

2. We say (x′ , y ′ , z ′ ) is 26connected to (x, y, z) if ∆x ≤ 1, ∆y ≤ 1, and ∆z ≤ 1 but

vi−1 is 6connected to vi for i = 1, 2, ..., k, where we denote vk = v and v0 = u. Then it

Similar arguments for 26connectivity.

A motivation of taking the largest component is that the contrastenhanced structures

However, there may be some contrastenhanced structures that is connected to the

Since contrastenhanced structures may be connected to bone in the sense of voxels,

E = {uv : u is 6connected to v, ∀u, v ∈ V }.