You are on page 1of 87

國立臺灣大學理學院數學系

碩士論文
Department of Mathematics

College of Science

National Taiwan University

Master Thesis

電腦斷層之小波分析身體部位分割與深度學習卵巢腫
瘤分類

Body Parts Partition by Wavelet Transform and Ovarian


Tumors Classification by Deep Learning in Computed
Tomography

黃敬倫

Jinglun Huang

指導教授: 王偉仲 博士

Advisor: Weichung Wang Ph.D.

中華民國 110 年 8 月

August, 2021

doi:10.6342/NTU202102364
ii doi:10.6342/NTU202102364
Acknowledgements

感謝林口長庚醫院林吉晉醫師提供資料集及協助身體部位標注讓本論文能夠

順利完成。

iii doi:10.6342/NTU202102364
iv doi:10.6342/NTU202102364
摘要

卵巢癌是婦癌當中最危險的一種。它不但很難被早期偵測,而且沒有警訊。

當一個病人因某些症狀而諮詢婦產科醫師時,卵巢腫瘤經常已經遍佈整個盆腔甚

至腹腔了。最近幾年,基於醫學及技術發展,許多癌症的死亡率已經有下降或持

平的趨勢,但卵巢癌的致死率卻不降反升。而電腦斷層是一種經常用於診斷卵巢

癌及許多其他種類癌症的三維影像。故建立在電腦斷層影像上的分析流程及機器

學習分類模型可以廣泛使用在許多情境。

在醫學影像分析中,機器學習的流程經常需要器官及腫瘤的影像標註,但卵

巢及其腫瘤十分難以標註。故本論文著力在電腦斷層上於建構一套不使用卵巢及

其腫瘤標註的分析流程。為此,我們發展一套基於小波轉換的身體部位分割演算

法來找到身體部位分割點。該演算法只需要使用少量的身體部位標註。在我們

的實驗中,以該演算法預測得到的六個身體部位分割點誤差中位數約為兩公分,

作為資料前處理已足夠精確。我們將該演算法使用在林口長庚紀念醫院中的資

料集,裁切出與卵巢癌有關的盆腔及下腹腔。此資料集包含 240 個良性腫瘤案

例及 161 個惡性腫瘤案例。接著我們訓練深度學習模型並給出交叉驗證的結果。

總體而言,測試集上接收者操作特徵曲線下面積的平均數及標準差為 0.8129 及

0.0154,該結果顯示我們的分析流程具有分類卵巢惡性腫瘤及良性腫瘤的潛力。

關鍵字:電腦斷層、小波轉換、身體部位分割、深度學習、卵巢腫瘤分類

v doi:10.6342/NTU202102364
vi doi:10.6342/NTU202102364
Abstract

Ovarian cancer is one of the most dangerous cancers for women. Moreover, it is

hard to early detect and it has no warning sign. When a patient consults a gynecologist

due to some symptoms, the ovarian tumor usually spread within the pelvis and even the

abdomen. In recent years, the mortality of ovarian cancer is increasing while mortality of

some other kinds of cancer is either decreasing or not increasing due to the improvement

of medical science and techniques. Computed tomography (CT) is one kind of three­

dimensional image and is used for the diagnosis of ovarian cancer as well as many other

kinds of cancers. An analysis pipeline and a machine learning classification model by

using CT images can be widely used in many situations.

Image annotations of organs and tumors are usually needed in machine learning

workflow in medical image analysis, but image annotations of ovaries and ovarian tu­

mors are hard to label. Therefore, this thesis aims to build up a pipeline for distinguishing

ovarian cancerous tumors from ovarian benign tumors in CT images by deep learning

vii doi:10.6342/NTU202102364
models without using image annotations of ovaries and ovarian tumors. For this purpose,

we develop a body parts partition algorithm to find the breakpoints of body parts by using

wavelet transform. Only a few body parts annotations are needed in this algorithm. In

our experiments, the prediction errors of 6 body part breakpoints are of medians approx­

imately 2 cm, which is accurate enough for data preprocessing. We use our algorithm to

crop image to pelvis and lower abdomen, which are related to ovarian cancer on the dataset

from Linkou Chang­Geng Memorial Hospital. The dataset consists of 161 cancerous cases

and 240 benign cases. Then we train deep learning models and provide cross­validation

results. Overall, the mean test ROCAUC is 0.8129 and the standard deviation is 0.0154,

which shows the pipeline has the potential to distinguish cancerous ovarian tumors from

benign ovarian tumors.

Keywords: Computed Tomography, Wavelet Transform, Body Parts Partition, Deep


Learning, Ovarian Tumors Classification

viii doi:10.6342/NTU202102364
Contents

Page

Acknowledgements iii

摘要 v

Abstract vii

Contents ix

List of Figures xiii

List of Tables xv

Chapter 1 Introduction 1

Chapter 2 Method 5

2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Bone Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Global Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.2 The Largest Connected Component Method . . . . . . . . . . . . . 12

2.2.3 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.4 Hysteresis Thresholding . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Internal Air Segmentation . . . . . . . . . . . . . . . . . . . . . . . 16

2.4 Bone Sequences, Internal Air Sequences, and Transformations . . . . 17

2.4.1 Smoothing by Gaussian Filters . . . . . . . . . . . . . . . . . . . . 18

ix doi:10.6342/NTU202102364
2.4.2 Feature Extraction by Wavelet Transforms . . . . . . . . . . . . . . 20

2.5 Curve Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6.1 Machine Learning in Medical Image Analysis . . . . . . . . . . . . 25

2.6.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . 26

2.6.3 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . 30

2.6.4 Dataset and Data Splitting . . . . . . . . . . . . . . . . . . . . . . 31

2.6.5 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Chapter 3 Results and Discussion 35

3.1 Bone Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 Internal Air Segmentation . . . . . . . . . . . . . . . . . . . . . . . 42

3.3 Registration and Partition of Body Parts . . . . . . . . . . . . . . . . 46

3.3.1 Preparation Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3.2 Gaussian Filters as the Transformations . . . . . . . . . . . . . . . 48

3.3.3 Wavelet Transforms as the Transformations . . . . . . . . . . . . . 50

3.3.4 Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.4 Classification by Deep Learning . . . . . . . . . . . . . . . . . . . . 58

Chapter 4 Conclusion 61

References 63

Appendix A — Outliers in Error Analysis 67

A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

A.2 Body Parts Predicted by Our Algorithm . . . . . . . . . . . . . . . . 67

x doi:10.6342/NTU202102364
Appendix B — Cross­Validation Results 71

B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

B.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

xi doi:10.6342/NTU202102364
xii doi:10.6342/NTU202102364
List of Figures

2.1 Three views in CT scan. . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1 An example of a CT slice in axial view and its bone segmentation obtained
by global thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Bone segmentation and contrast­enhanced structure segmentation. In both
plots, black pixels mean the background. White pixels in the left plot
mean the bone pixels and that in the right plot mean the contrast­enhanced
structure. Gray pixels in the left plot mean the contrast­enhanced structure
and that in the right plot mean the bone pixels. . . . . . . . . . . . . . . . 37
3.3 Bone segmentation and contrast­enhanced structure segmentation. The
white pixels mean the foreground and the gray and black pixels mean the
background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Original axial images of case 1. . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Bone segmentation in axial view of case 1. . . . . . . . . . . . . . . . . . 39
3.6 Bone signal and bone segmentation in coronal view of case 1. . . . . . . 39
3.7 Original axial images of case 2. . . . . . . . . . . . . . . . . . . . . . . . 40
3.8 Bone segmentation in axial view of case 2. . . . . . . . . . . . . . . . . . 41
3.9 Bone signal and bone segmentation in coronal view of case 2. . . . . . . 41
3.10 Original axial images of case 1. . . . . . . . . . . . . . . . . . . . . . . . 43
3.11 Internal air segmentation in axial view of case 1. . . . . . . . . . . . . . 43
3.12 Air segmentation in axial view of case 1. . . . . . . . . . . . . . . . . . . 44
3.13 Original axial images of case 2. . . . . . . . . . . . . . . . . . . . . . . . 44
3.14 Internal air segmentation in axial view of case 2. . . . . . . . . . . . . . 45
3.15 Air segmentation in axial view of case 2. . . . . . . . . . . . . . . . . . . 45

xiii doi:10.6342/NTU202102364
3.16 Bounds of body parts of the reference image. . . . . . . . . . . . . . . . 46
3.17 Air segmentation in axial view of the reference image. . . . . . . . . . . 47
3.18 Bone segmentation in axial view of the reference image. . . . . . . . . . 47
3.19 Signals transformed by Gaussian filter of the reference case. . . . . . . . 48
3.20 Signals transformed by Gaussian filter of case 1. . . . . . . . . . . . . . . 49
3.21 Signals transformed by Gaussian filter of case 2. . . . . . . . . . . . . . . 49
3.22 Bone signal and its features extracted by wavelet transforms. . . . . . . . 51
3.23 Time­scale plots of case 1. . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.24 Time­scale plots of case 2. . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.25 Registration results of case 1. The bone structure of the reference image
and the transformed bone structure of the moving image. . . . . . . . . . 53
3.26 Registration results of case 1. The air structure of the reference image and
the transformed air structure of the moving image. . . . . . . . . . . . . . 54
3.27 Registration results of case 2. The bone structure of the reference image
and the transformed bone structure of the moving image. . . . . . . . . . 55
3.28 Registration results of case 2. The air structure of the reference image and
the transformed air structure of the moving image. . . . . . . . . . . . . . 55
3.29 Body parts partition of case 1. . . . . . . . . . . . . . . . . . . . . . . . 56
3.30 Body parts partition of case 2. . . . . . . . . . . . . . . . . . . . . . . . 56

A.1 Outlier 1 removed in Table 3.2. . . . . . . . . . . . . . . . . . . . . . . 68


A.2 Outlier 2 removed in Table 3.2. . . . . . . . . . . . . . . . . . . . . . . 68
A.3 Outlier 3 removed in Table 3.2. . . . . . . . . . . . . . . . . . . . . . . 69
A.4 Outlier 4 removed in Table 3.2. . . . . . . . . . . . . . . . . . . . . . . 69

xiv doi:10.6342/NTU202102364
List of Tables

2.1 The numbers of patients and images of folds. . . . . . . . . . . . . . . . 31


2.2 The numbers of data in each fold. The test sets in all lists are identical.
We keep the test set unseen for final test. For i = 1, 2, 3, 4, 5, list i regards
fold i as the validation set and the others as the training set. . . . . . . . . 31

3.1 Error distribution of each body parts. We denote lower, middle, upper,
pelvis, and abdomen by l, m, u, pel, and abd. . . . . . . . . . . . . . . . . 57
3.2 Error distribution of each body parts after excluding outliers. We denote
lower, middle, upper, pelvis, and abdomen by l, m, u, pel, and abd. . . . . 58
3.3 Mean AUCs for different stategies. This table shows means and standard
deviations of validation AUCs and test AUCs. The means are standard
deviations are computed from 5­fold cross validation results. . . . . . . . 58

B.1 This table shows metrics obtained in details. The threshold is simply cho­
sen as 0.5 for computing accuracy. Here, we denote validation, accuracy,
upper pelvis, and lower abdomen by val, acc, u­pel, l­abd, respectively,
for short. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

xv doi:10.6342/NTU202102364
xvi doi:10.6342/NTU202102364
Chapter 1 Introduction

Ovarian cancer is one of the most dangerous cancers for women. In Taiwan, the num­

ber of death in ovarian cancer is ranked number seven in female cancer. In recent years,

due to the improvement of medical science and techniques, mortality of some other kinds

of cancer is either decreasing or not increasing. However, the mortality of ovarian can­

cer is increasing. According to the statistics of Ministry of Health and Welfare (R.O.C.),

the mortality of ovarian cancer is even higher than cervical cancer in 2020 (updating in

2021/06/18). Difficulty in early detection is a reason why the mortality of ovarian cancer

is still increasing. Some other kinds of cancer often follow some warning signs, such as

pain, bleeding, changes in physical appearances, or other symptoms. For example, blad­

der cancer usually cause hematuria and breast cancer usually cause some lumps in the

breast or underarm. However, ovarian cancer is often difficult to detect. When a patient

gets some warning signs and consults an gynecologist, for ovarian cancer, the cancerous

tumor has often spread within pelvic or even abdomen. Moreover, the five­year survival

rate of a ovarian cancer patient in early stage is far higher than that of late stage and the

medical cost for early detection of cancer is lower than that of late stage. Therefore, early

detection of ovarian cancer is a crucial task.

Computed tomography (CT) is a three dimensional imaging technique and is a com­

mon tool used to detect ovarian cancer as well as many other cancers. Moreover, CT

1 doi:10.6342/NTU202102364
is one of the most general medical image, so an analysis pipeline for CT images can be

widely used in many situations. In this work, we used CT scan as the inputs of our analysis

pipeline.

In medical science, there were researches that aimed at ovarian tumors. S. E. Jung

et al. [10] aimed to find features in CT scan for the use in differential diagnosis. U. R.

Acharya et al. [1] extracted features, such as deviation and entropy, from ultrasound using

computer­aided diagnostic (CAD) techniques and classified ovarian tumors by decision

tree. A. Vlahou et al. [21] used clinical data to established a decision tree for diagnosis of

ovarian cancer.

However, using CT images to establish machine learning models to classify ovarian

tumors is still not well­investigated. In this thesis, we build up an analysis pipeline of

ovarian tumors by using CT images.

Although machine learning became a popular word in medical image analysis, clas­

sification ovarian tumors in by machine learning still seems challenging. An important

reason is that ovaries and ovarian tumors are difficult to segment, even by manual seg­

mentation. Locations of ovaries as well as pelvis organs in CT images vary from case to

case. Moreover, ovary is a small organ, and sometimes the boundary of a ovary in CT is

not clear. Hence, it is hard to define the ground truth of annotations of ovaries and ovarian

tumors. Also, labeling ovaries and ovarian tumors is time­consuming. Therefore, an anal­

ysis pipeline that does not depends on image annotations of ovaries and ovarian tumors is

more applicable in medical practice.

Many machine learning algorithms are based on image annotation. One may say

convolutional neural network (CNN) didn’t strongly depend on image annotation; instead,

2 doi:10.6342/NTU202102364
CNN learnt to extract useful features from images during training process. However,

training a CNN for CT tasks usually needs image annotations as a mask or locating the

organs. Cropping to region of interest (ROI) is a standard technique for locating the organs.

Medical images often contains a lot of organs or tissues and hence includes amounts of

information, but many problems usually aim at one or a few organs. Moreover, CT images

may capture different body parts. Hence, using the whole CT images as inputs of models

does not make sense. Even though a model trained by using whole images gets good

performance, it may not be explainable and is hard to use in medical practice. Therefore,

a proper preprocessing algorithm that crops some meaningful parts from CT images is a

crucial part of ovarian tumors classification.

In this thesis, we design an analysis pipeline that does not depend on image annota­

tions of ovaries and ovarian tumors. Instead, we use bounding boxes of ROI as inputs of

DL model. The ROI can be the pelvis or the union of the pelvis and the lower abdomen,

since the lower abdomen may provide some features related to ovarian cancer, such as

ascites. Except for avoiding the difficulty of image annotation, cropping a bounding box

is easier to design a rule­based method than image segmentation which is usually done by

another deep learning model nowaday.

There are several types of ovarian tumors, such as benign, cancerous, and some other

types of tumors, but, in this work, we aim at distinguishing cancerous ovarian tumors from

benign ones by using CT images. We design an algorithm for the body parts partition by

using wavelet transform and crop the pelvis and lower abdomen from CT images. The

algorithm is based on the following main steps.

1. Obtain a query case

3 doi:10.6342/NTU202102364
2. Segment bone, and internal air

3. Construct the bone sequence and air sequence from the segmentation of bone and

internal air

4. Register the body parts by aligning bone sequence and air sequence to a reference

case

5. Propagate the body­part breakpoints from the reference case to the query case

6. Crop to target parts

By this algorithm, we may automatically preprocess CT images before model training

and inference, which only needs a few body parts labels rather than image annotations of

ovaries and ovarian tumors.

We apply our algorithm to the dataset from Linkou Chang Gung Memorial Hospi­

tal, train and evaluate deep learning models for ovarian tumors classification by cross­

validation and testing. We compared the body­part breakpoints obtained from our body

parts partition algorithm to the ground truth labeled by a radiologist (Dr. Lin, Gigin). The

mean absolute errors in lower pelvis, upper pelvis, middle abdomen, lower chest, middle

chest, and upper chest are approximately 2.48 cm, 1.73 cm, 1.73 cm, 1.25 cm, 2.37 cm,

and 1.42 cm after we remove 4 outliers, which shows the body parts partition algorithm

is accurate for data preprocessing. We use cross­validation to train and evaluate the deep

learning models. By cropping CTs to pelvis only, we obtain the mean test AUCs 0.8129

with the standard deviation 0.0154, which shows the analysis pipeline has the potential to

distinguish the cancerous ovarian tumors from benign ones.

4 doi:10.6342/NTU202102364
Chapter 2 Method

In this section, we introduce our body parts partition algorithm and the training details

of deep learning models. First, we give an overview of our algorithm. Then we talk about

the details in each step of this algorithm. In this thesis, we define images as follow for

convenience.

Definition 2.1. Let m, n, p ∈ N and let [k] = {0, 1, 2, ..., k − 1} for k ∈ N.

1. A function I : [m] × [n] × [p] → R is called a (three­dimensional) image of size

m × n × p.

2. Suppose I is an image of size m × n × p. The spacing (of I) in x­axis is the width

between two voxels in x­axis. Similar definition for spacing in y­axis and spacing

in z­axis.

3. Suppose I is an image of size m × n × p. If S is an image of the same size with I,

whose range is contained in {0, 1}, then we say S is a binary image.

Remark 2.2.

1. We may omit the size of an image if no confusion.

2. CT images are three­dimensional images, so we usually plot a cross section of the

volume for visualization. Figure 2.1 shows three different views of a CT image.

5 doi:10.6342/NTU202102364
3. The spacing is saved in the meta data of and the size of a CT image is usually

512 × 512 × p where p ∈ N. p is often determined by the scanned body parts.

4. We use a binary image S of size m × n × p to present the segmentation of some

ROI in a given image. More precisely, the preimages

{(x, y, z) ∈ [m] × [n] × [p] : S(x, y, z) = 1}

and

{(x, y, z) ∈ [m] × [n] × [p] : S(x, y, z) = 0}

are the sets of ROI and background, respectively.

Figure 2.1: Three views in CT scan.

2.1 Overview

We give an overview of our analysis pipeline in this section. Due to the difficulty

of labeling image annotations of ovaries and ovarian tumors, using a bounding box of a

proper region instead of image segmentation is more applicable. In this work, the main

6 doi:10.6342/NTU202102364
idea of body parts partition contains the following three steps. First, we segment the bone

and the internal air in a CT image. Second, we compute the numbers of bone pixels and

internal air pixels in each axial slices for defining the bone sequence and the internal air

sequence. The formal definition of the bone sequence and the air sequence are defined

in Definition 2.3. Note that the bone sequence and the internal air sequence (or simply

the air sequence) are two signals that carry the body parts information of the CT image.

Hence, the third step is to align these two signals between cases registration of body parts.

Eventually, we only need a few body parts annotations to automatically find the break­

points of body parts of any query CT image by propagating the body parts annotations

from known cases to the query case.

Definition 2.3. Suppose m, n, p ∈ N and I is an image of size m × n × p. Let S be a

bone segmentation of I and let the set of bone voxels B = S −1 ({1}). The bone sequence

(of I) is defined as the sequence {bk }pk=1 , where

bk = |{(x, y, z) ∈ B : z = k}| .

Here, |A| is the cardinality of a given set A.

The air sequence {ak }pk=1 is defined in the same manner.

Registration is aligning signals and is usually formulated by an optimization prob­

lem. Let f, g : [0, 1] → R be two functions with some proper assumptions, such as

f, g ∈ L2 ([0, 1]), being continuous or smooth. The registration problem is, roughly say­

ing, finding a one­to­one, increasing function h : [0, 1] → [0, 1] such that f ≈ g ◦ h under

some sense, such as being close under L2 distance. h is called a warping function and is

usually assumed to have an integrable second­derivative [16].

7 doi:10.6342/NTU202102364
There are different formulations of the registration problem between f and g, for

example, landmark­based matching [2]. However, since each CT image usually scan dif­

ferent body parts, landmark­based matching is difficult in our case and assuming that

codomain of h is [0, 1] is not applicable. Moreover, bone sequence and air sequence are

two types of information for a single CT, so they shall be taken into consideration at the

same time. Although bone sequences and air sequences are discrete data, it is convenient

to consider them as continuous signals. Therefore, in this thesis, the registration problem

between bone sequences and air sequences is formulated as following.

Every CT image has its bone signal and its air signal and we are going to align these

two signals at the same time, which is an important difference between our problem and

other curve registration settings. Assume that f1 and f2 are bone signals and g1 and g2

are air signals. We are going to transform f1 , f2 , g1 , and g2 by a transformation T . T

is expected to have abilities to denoise and to represent the patterns of the bone signals

and air signals. The codomain of T is not necessary a subset of univariate functions.

Then we define a distance between the pairs (T (f1 ), T (g1 )) and (T (f2 ◦ h), T (g2 ◦ h)) by

using 2­norm or its generalization. Finally, we register body parts between CT images by

minimizing the distance.

Now, we summarize the main ideas of body parts partition in the following steps.

Preparation Stage

1. Choose a reference CT image and obtain the breakpoints of body parts of interest

2. Segment bone and internal air

3. Compute the bone sequence and the air sequence

8 doi:10.6342/NTU202102364
4. Apply some proper transforms to the bone sequence and the air sequence and regard

them as the reference signal

We obtain reference signals and body parts breakpoints from the preparation stage. The

reference image is expected to include all body parts of interest and to represent the im­

portant pattern of the bone sequence and the air sequence.

In functional data registration, we usually have a reference signal and a moving sig­

nal. We transform the moving signal into the coordinates of the reference signal. In our

problem, we treat the transformation outputs of the bone signal and the air signal of the

reference image as reference signals. At the inference stage, if we are given a query CT

image, then the transformation outputs of the bone signal and the air signal of the query

image will be treated as moving signals.

In the inference stage, we label the body parts for a query CT image in the following

steps. Inference Stage

1. Obtain a CT image and then segment bone and internal air

2. Segment bone and internal air

3. Compute the bone sequence and the air sequence

4. Apply some proper transforms to the bone sequence and the air sequence and regard

them as moving signal

5. Compute the optimal warping function by minimizing the distance between the

moving signal and the reference signal

6. Get the body parts breakpoints by propagating the breakpoints from the reference

9 doi:10.6342/NTU202102364
signal to the moving signal

By these steps, we can propagate the body parts breakpoints from only a few labeled

images to lots of query images. Finally, we use this body parts partition algorithm to

preprocess CT images and then use techniques in deep learning to establish models.

2.2 Bone Segmentation

Bone segmentation is an important topic. One usage of bone segmentation is that it

may be used as a reference information for surgery. There are already some researches on

this topic. Moreover, bone segmentation is one step in our body parts partition algorithm

since the bone sequence relies on the bone segmentation. Here are some related works

about bone segmentation.

A graph­based segmentation approach is proposed by Y. Boykov and G. Funka­

Lea[3], which is done by solving a generalized eigenvalue problem. By minimizing an

energy function consisting of per­pixel term and boundary term, Y. Boykov and G. Funka­

Lea [3] could find a segmentation where pixels are labeled by the same class if they have

high similarity. M. Krčah et al. [12] extend the work in [3] to a another formulation that

is more suitable for 3D images, which is more applicable to bone segmentation in CT.

By assuming that the intensities of bone and non­bone voxels are sampled from a

mixture of Gaussian distributions, an iterative method for bone segmentation is proposed

by [22]. Starting with an initial segmentation (obtained from global thresholding, for

instance), this method reclassified pixels in bone class by Bayesian decision rule to update

the segmentation and so on.

10 doi:10.6342/NTU202102364
Moreover, H. Lamecker et al. [13] uses the statistical shape model (SSM) to segment

pelvic bone. SSM aims to find a deformation from one shape to another shape. Based on

the deformation, SSM can transform shapes into a base space and obtain the distance

between surfaces. The distance can be used to estimate the average shape as well as the

variation in shape.

Although bone segmentation is a step in our proposed analysis pipeline, we don’t

need a extremely accurate segmentation for the next step and the computational efficiency

may be more important in medical practices. So, we try some other methods for bone

segmentation. In fact, if the bone sequence from a bone segmentation can present its

essential patterns, then we may accept the segmentation. We have tried the following

approaches to segment bone and they are good at different situations.

2.2.1 Global Thresholding

Global thresholding is a typical segmentation approach. This approach distinguishes

whether a voxel shall be classified into object class by comparing the intensity of the voxel

with a threshold t ∈ R. We formally define the global thresholding in Definition 2.4.

Definition 2.4 (Global Thresholding). Let t ∈ R and m, n, p ∈ N. Suppose I is an

image of size m × n × p. Global thresholding is a segmentation approach that obtains a

segmentation S of I defined by





1 if I(x, y, z) > t
S(x, y, z) =



0 if I(x, y, z) ≤ t

for (x, y, z) ∈ [m] × [n] × [p]. In which case, t is called a threshold.

11 doi:10.6342/NTU202102364
In our case, a voxel in CT is classified into the bone class by global thresholding if

its gray­scale value is larger than t, otherwise it is classified into the non­bone class.

Global thresholding is useful when the intensity of the object is very different from the

background. Since bone voxels are very bright in CT (in other words, they are of high gray­

scale values), it is natural to consider global thresholding as our segmentation approach.

In fact, a common threshold of bone in CT is a gray­scale value 400 HU (Hounsfield

Unit) due to medical properties of bone. Hence, global thresholding is a common method

to segment bone. However, since the CT images we use are venous phase CT images,

this approach must include the contrast­enhanced structures if any. Therefore, we also

consider another methods in the following sections.

2.2.2 The Largest Connected Component Method

Choosing the largest connected component from the foreground of a segmentation is

an approach to retouch the segmentation if it is noisy. The concept of connected compo­

nents depends on the connectivity in images, so we firstly introduce the connectivity in

images as follow.

Definition 2.5 (Voxels Connectivity). Let (x, y, z) and (x′ , y ′ , z ′ ) be voxels and let ∆x =

|x − x′ |, ∆y = |y ′ − y|, and ∆z = |z ′ − z|.

1. We say (x′ , y ′ , z ′ ) is 6­connected to (x, y, z) if ∆x + ∆y + ∆z = 1.

2. We say (x′ , y ′ , z ′ ) is 26­connected to (x, y, z) if ∆x ≤ 1, ∆y ≤ 1, and ∆z ≤ 1 but

∆x, ∆y, and ∆z are not all zeroes.

Remark 2.6. Let S be a binary image and F = S −1 ({1}).

12 doi:10.6342/NTU202102364
For voxels u and v, we say u ∼ v if either u = v or ∃ v1 , v2 , ..., vk−1 ∈ F such that

vi−1 is 6­connected to vi for i = 1, 2, ..., k, where we denote vk = v and v0 = u. Then it

is clear that ∼ defines an equivalence relation on F .

Similar arguments for 26­connectivity.

Definition 2.7 (Connected Components). Let S be a binary image and F = S −1 ({1}).

The equivalence classes of ∼ defined in Remark 2.6 are called a connected components

of F (or the foreground object).

A motivation of taking the largest component is that the contrast­enhanced structures

are not connected to the bone. In fact, soft tissues are usually covered by fat, so they are

not connected to the bone. Moreover, the volume of bone is larger than that of contrast­

enhanced structures in general. Hence, if we further assume that the bone is a connected

component of the segmentation obtained by global thresholding, then we may compute the

largest component of the segmentation to obtain the bone segmentation without including

contrast­enhanced structures.

However, there may be some contrast­enhanced structures that is connected to the

bone in images. They may just near to each other, but in the sense of voxels, they are

connected. Therefore, we consider the spectral clustering in the next section.

2.2.3 Spectral Clustering

Since contrast­enhanced structures may be connected to bone in the sense of voxels,

we consider the segmentation approach by spectral clustering proposed in [18]. we use

spectral clustering to revise the bone segmentation from an initial segmentation.

13 doi:10.6342/NTU202102364
Before we introduce the details of this approach, we assume the human bone is con­

nected in CT images. Consider a CT image I and a segmentation S. Here, S can be

constant 1 but it can’t be constant 0. Let V = S −1 ({1}) and let

E = {uv : u is 6­connected to v, ∀u, v ∈ V }.

We may further define the weight for each edge as follow. For each edge uv ∈ E, define
|I(u)−I(v)|
the weight of uv to be e σ , where σ > 0 is a parameter. σ is usually chosen to be

the standard deviation of |I(u) − I(v)| over all uv ∈ E. Then G(V, E) is a weighted

undirected graph. By the assumption of connectivity of human bone, G is connected. Let

A be the adjacency matrix and D be the degree matrix of G. The Laplacian matrix of G

is given by L = D − A. Shi, Jianbo and Malik, Jitendra [18] propose a method for image

segmentation by solving the generalized eigenvalue problem

Lx = λDx. (2.1)

In our situation, we obtain an initial bone segmentation S from global thresholding

and use spectral clustering to revise the initial segmentation. We may construct a graph as

above with and solve the generalized eigenvalue problem on the corresponding Laplacian

matrix to obtain two clusters on the graph. By choosing the larger cluster, we obtain a

revised segmentation. This approach take more information into consideration and can

remove some contrast­enhanced structures that connect to bone in the sense of voxels.

However, it is still not good enough since the bone itself may be not connected in the

sense of voxels, which violates the basic assumption of connectivity of graph in this ap­

proach. Therefore, we continue to consider the hysteresis thresholding introduced in the

14 doi:10.6342/NTU202102364
next section.

2.2.4 Hysteresis Thresholding

Since the bone may not be connected in the sense of voxels, we consider the hysteresis

thresholding that proposed in [4]. This approach perform two­stage thresholding to avoid

some non­object voxels. Before we introduce hysteresis thresholding, let’s define the

connectivity between two sets for convenience.

Definition 2.8. Let F and G be two sets of voxels and let u be a voxel.

1. We say u is connected to G if there is v ∈ G such that u is connected to v

2. We say F is connected to G if there is v ∈ F such that v is connected to G

Now we introduce the hysteresis thresholding. Let s, t ∈ R with s < t. At the first

stage, we obtain segmentations S and T by applying global thresholding with respect to

thresholds s and t. Let F = S −1 ({1}) and G = T −1 ({1}). At the second stage, the

segmentation obtained by hysteresis thresholding is defined as

[
{C : C is a connected component of F and C is connected to G}.

We applied the hysteresis thresholding with the high threshold t = 800 HU and the

low threshold s = 400 HU to obtain bone segmentation. An observation about true bone

and contrast­enhanced soft tissue is that a true bone voxel is often of intensity greater

than 1000 HU while the contrast­enhanced soft tissue is often of intensity smaller than

800 HU. Although ribs are usually of intensity smaller than 800 HU and are filter out in

the first stage, they can be usually recovered at the second stage due to the connectivity

15 doi:10.6342/NTU202102364
between ribs and spine. Moreover, 400 HU is a common threshold for bone segmentation.

Therefore, the choices of s and t are suitable in our problem.

2.3 Internal Air Segmentation

Air is a crucial information in a CT image. We can easily recognize the lung since it

contains a lot of air which has low intensity and is black in the image. Moreover, the inter­

nal air signal is another important information to align the body parts between CT images.

Therefore, we introduce some image processing techniques to segment the internal air. In

this section, we assume that the external air is connected, the external air and the internal

air are not connected, and the volume of the external air is greater than the volume of the

internal air in CT.

Here are two notes about these assumptions. First, the the internal air and the external

air are not connected in CT in almost all cases, even though the respiratory system is a

path connects the internal air and the external air. Second, the 3D volume of external air

is greater than that of internal air in CT, even though it may be false in a specific single

axial slice.

The main idea to segment internal air is to segment the external air first and obtain the

internal air by removing the external air. We use the gray­scale value −700 as a threshold

to apply global thresholding to segment air. In this case, voxels of intensity smaller than

−700 HU are what we need. By assumption 1 and 2, the external air is a connected

component of the air segmentation, and assumption 3 implies the external air is the largest

component of the air segmentation. Hence, we compute the largest component of the air

segmentation to obtain the external air. Finally, we obtain the internal air by removing the

16 doi:10.6342/NTU202102364
external air from the air segmentation.

The performance of this method is good in almost all cases. As we discussed in previ­

ous sections, contrast­enhanced structures may be confused with bone, in either intensity

or connectivity. Hence, we survey a lot of methods to solve this problem. Fortunately,

there is no other tissue would be confused with air. Therefore, the typical approaches in

image processing is good enough to segment the internal air in CT images.

2.4 Bone Sequences, Internal Air Sequences, and Trans­

formations

Our goal is body parts registration by align bone signals and air signals. Hence, we

will introduce the bone sequences and the internal sequences in this section. By calculating

the numbers of bone pixels and internal air pixels in each axial slice, we obtain two se­

quences indexed by the z­axis coordinate of CT. The formal definition of bone sequences

and air sequences have shown in Definition 2.3. The bone sequences and air sequences

can be regarded as discrete version of bone signals and air signals which are usually con­

sidered in continuous manner. Unfortunately, the two sequences might be noisy due to

the following reasons.

• The segmentations are not perfect

• We are using thick cut CT scan, which means the spacing of z­axis was 5 mm.

• We are using venous phase CT scan, which means the bone segmentation may in­

clude some non­bone contrast­enhanced structures.

17 doi:10.6342/NTU202102364
• The number of bone pixels is depending on the amount of calcium in the bone of a

patient.

To deal with these two noisy signals, we apply transformations to them for denoising

and recognizing patterns. We introduce the Gaussian filters and wavelet transform in the

following sections.

2.4.1 Smoothing by Gaussian Filters

Gaussian filters are widely used in signal processing. One of the most common use

of a Gaussian filter is the ability to denoise. We use Gaussian filters to smooth our signals.

Mathematically, we may define a Gaussian filter by convolution. We use the knowl­

edge in [17] to describe convolution transformations. Let f1 , f2 : R → R be two signals.

Assume f1 and f2 are measurable. The convolution of f1 and f2 is defined by

Z
(f1 ∗ f2 )(x) = f1 (t)f2 (t − x)dt, x ∈ R,
R

provided the integral exists. Let K : R → R be a L1 function and define T (f ) = f ∗K, for

f ∈ Lp (R). In which case, we call T a convolution transformation and K the kernel. Then

T is a linear transformation from Lp (R) to Lp (R). Furthermore, assume K is a smooth

function. Then [T (f )]′ = f ∗ K ′ for f ∈ Lp (R). That is, the convolution transformation

transforms a Lp function to a smooth function provided by the kernel smooth.

For the self­containment, we quote Theorem 9.9 in Section 9.2 of [17].

Z

Theorem 2.9 (Zygmund [17]). Let K ∈ L (R) ∩ L (R) with
1
K = 1, and suppose
R

18 doi:10.6342/NTU202102364
K(x) = o(1/|x|) as |x| → ∞. For ϵ > 0, define

1 x
Kϵ (x) = K( )
ϵ ϵ

and let fϵ = f ∗ Kϵ , where f ∈ L1 (R). Then fϵ → f as ϵ → 0+ at each point of continuity

of f .

This theorem is roughly saying a kernel function that decay rapidly in infinity induces

an approximation of identity for continuous functions by convolution transformations.

1
As a special case, we let K(x) = √ e−x be the Gaussian function, then we call
2

π
the convolution transformation induced by the kernel Kϵ a Gaussian filter. Note that K ∈

L1 (R) with
Z Z Z ∞
1
e−x dx = 1.
2
∥K∥1 = |K| = K=√
R R π −∞

1
Since K(x) ≤ √ , we have that K ∈ L∞ (R). Finally, since
π

K(x) |x|
= √ e−x → 0 as |x| → ∞,
2

1/|x| π

we conclude K(x) = o(1/|x|) as |x| → ∞. Hence, K satisfies all conditions in Theorem

2.9. Therefore, Gaussian filters are approximations of identity for continuous functions

Moreover, since Kϵ is smooth, the convolution f ∗Kϵ is also smooth. Therefore, Gaussian

filters are precisely smoothing signals.

To summarize properties of Gaussian filters, f ∗ Kϵ is a smooth approximation of

a continuous function f with a lower noise level. Under the sense of convergence, we

may say the Gaussian filter preserves some patterns of f . Smoothness is an important

reason why we use Gaussian filters. Smooth approximation obtained by Gaussian filters

19 doi:10.6342/NTU202102364
make the L2 distance of bone sequences (or air sequences) between patients robuster. As

a result, the distance minimization described in Section 2.1 is robuster. In applications,

an discrete approximation of Gaussian function is used since the true Gaussian function

has infinite support.

2.4.2 Feature Extraction by Wavelet Transforms

Wavelet transform is a technique in signal process and is widely used in engineering.

It is also used for edge detection in image processing. We introduce wavelet transform in

this section.
Z ∞
Let ψ ∈ L (R) with
2
ψ = 0. Define ψ ab by
−∞

t−b
ψ ab (t) = a−1/2 ψ( ), for a > 0, b ∈ R (2.2)
a

and ψjk by

ψjk (x) = 2j/2 ψ(2j x − k), for j, k ∈ Z. (2.3)

For a function f ∈ L2 (R), the continuous wavelet transform is defined as

Z ∞
W (f )(a, b) = f ψ ab , for a > 0, b ∈ R (2.4)
−∞

and the discrete wavelet transform is defined by

Z ∞
wjk = f ψjk , for j, k ∈ Z. (2.5)
−∞

We say ψ is the mother wavelet and ψ ab and ψjk are wavelets. If {ψjk }j,k∈Z forms an

orthonormal basis for L2 (R), then we say ψ is an orthonormal wavelet and wjk is called

20 doi:10.6342/NTU202102364
the wavelet coefficient.

Although not every mother wavelet is an orthonormal wavelet, wavelet transform still

extract some important features of a given function. Moreover, since the scale factor (a and

j) makes the wavelets be of different widths, wavelet transform can capture information

in different scale. Due to the ability of wavelet transform to handle multi­scale problems,

it can be used to recognize the patterns of a signal. Applying wavelet transform by using

wavelet with small width tends to extract local features, which is often high frequency

wave and is possible noisy. And applying wavelet transform by using a wavelet with large

width tends to extract wide features, which is often low frequency wave and recognize only

rough patterns such as monotone of a function on some intervals.

Let a1 , a2 , ..., ap > 0 be some scale factors with p ∈ N. For signal f , define the

transformation T as mentioned in Section 2.1 by

T (f )(t) = (W (f )(a1 , t), W (f )(a2 , t), ..., W (f )(ap , t)). (2.6)

Note that the scale factors ai ’s are expected to extract features from input signal and filter

out the high frequency wave for denoising.

In this work, we use wavelet transform with Mexican hat (or the Ricker wavelet)

mother wavelet, which is defined as

2  2
ψ(t) = √ 1 1 − t2 e−t /2 ,
3π 4

to extract some features from observed bone sequences and internal air sequences. Based

on some observations from our experiments, we find that the scale factors small than 3 are

too noisy and the scale factors larger than 10 tend to detect redundant features. Therefore,

21 doi:10.6342/NTU202102364
we choose the scale factors ai = i + 2 for i = 1, 2, ..., 8. In which case, p = 8.

2.5 Curve Registration

Curve registration is a crucial step in our pipeline. We align the body parts between

cases by the registration between bone signals and air signals.

There are many studies in functional data analysis about curve registration. For ex­

ample, [6] aligns curves by solving a global minimization problem and [2] matches curves

by aligning landmarks.

However, there are two critical issues in our situation. First, each CT images may

capture different body parts, and hence the bone sequences (and air sequences) usually

contain different landmarks between images. Second, we have two kinds of information

to use, i.e. bone sequences and air sequences, so the formulation in [6] cannot be directly

used.

Even though all CT images are processed so that they are of size 512 × 512 in axial

view and are of spacing 5 mm in z­axis for thick cut images, they still have its essential

scale. For example, heights and body proportions are different over cases. So CT images

have various numbers of axial slices even if some of them capture the same body parts.

Hence, we consider a one dimensional affine transformation h(t) = at + b, t ∈ R, as our

warping function, where a and b are in some compact intervals, respectively, to match the

essential scales over two cases.

Here are two notes about our method. First, affine transformation cannot adjust the

body proportion. However, finding a non­linear transformation usually needs some accu­

22 doi:10.6342/NTU202102364
rate landmarks, but landmarks are not easy to locate and to recognize due to different body

parts between CT images as previous discussion. More precisely, although [16] proposes

a good approach to finding landmarks, selection of landmarks for registration in our situa­

tion is still an issue. Therefore, we don’t consider non­linear transformations in this work.

Second, we have considered 2D or 3D bone registration, but the registration is hard to do.

Although there is a standard brain in medical science, the bone structures are so various

between people. Hence, the loss function corresponding to the 2D or 3D optimization

problem is very non­convex. Nevertheless, standard bone signals and standard air signals

are more convincing in medical knowledge.

Assume that f1 and f2 are bone signals and g1 and g2 are air signals. The notation ∥·∥2

denotes 2­norm. Let F be a function space. For instance, F can be Lp (R). If T : F →

F is a transformation, then we align body parts by solving the following optimization

problem:

min(1 − λ) ∥T (f1 ) − T (f2 ◦ h)∥22 + λ ∥T (g1 ) − T (g2 ◦ h)∥22 (2.7)


h∈H

where

• H = {h : [0, 1] → R | h(t) = at + b where a ∈ I, b ∈ R}, I is a compact interval

contained in (0, ∞)

• λ ∈ [0, 1] is a parameter

23 doi:10.6342/NTU202102364
If T : F → F p for some p ∈ N, say

(1) (1)
T (f1 ) = (w1 , w2 , ..., wp(1) )

(1) (1)
T (g1 ) = (v1 , v2 , ..., vp(1) )

(2) (2)
T (f2 ◦ h) = (w1 , w2 , ..., wp(2) )

(2) (2)
T (g2 ◦ h) = (v1 , v2 , ..., vp(2) )

(j) (j)
where wi ∈ F and vi ∈ F , for i = 1, 2, ..., p and j = 1, 2, then we align body parts by

solving the following optimization problem:

X
p 2 2
(1) (2) (1) (2)
min (1 − λ) wi − wi + λ vi − vi (2.8)
h∈A 2 2
i=1

where

• H = {h : [0, 1] → R | h(t) = at + b where a ∈ I, b ∈ R}, I is a compact interval

contained in (0, ∞)

• λ ∈ [0, 1] is a parameter

The interval I is chosen to be [0.8, 1.2] in this work. Note that for b with large absolute

value |b|, the warping function h will make f1 ◦ h = 0 and g2 ◦ h = 0 on [0, 1] since the

translation factor b move the curves outside the window. Thus

supp(f1 ) ∩ supp(f2 ◦ h) = supp(g1 ) ∩ supp(g2 ◦ h) = ϕ

where supp(f ) = {x ∈ R : f (x) ̸= 0} for f : R → R. Hence, b can be also restricted

to a compact interval that depends on a. Since heights of almost all people are in some

fixed range, we may solve the minimization problem in a reasonable time even by grid

24 doi:10.6342/NTU202102364
search. Moreover, grid search can be easily parallelly computed, so we may accelerate the

computation by parallelly computing in practice. Therefore, we use grid search to solve

the minimization problem in this thesis.

2.6 Deep Learning

Deep learning has been the most popular topic in artificial intelligence. It is a kind of

machine learning algorithm which establishes a model from data. Moreover, deep learning

is a feasible framework so that it can be used in both supervised learning and unsupervised

learning. In this thesis, we use techniques in deep learning to establish models.

2.6.1 Machine Learning in Medical Image Analysis

Machine learning has been widely used in medical image analysis and it obtains many

good results in different problems. Classification, segmentation, and detection are main

topics in medical image analysis. Classifying normal and abnormal CT, segmenting organs

and tumors in CT, and detecting lesions are examples of these topics, respectively. All of

these three tasks need to extract features from images.

In traditional machine learning alrogithms, feature extraction of images relies on im­

age annotation. For example, radiomics is a feature extraction approach that relies on

image annotation. By using both image and image annotation, radiomics computes the

intensity­based and texture­based features in ROI and analyze shape­based features of the

geometry properties of ROI. With the radiomics features, one may train a machine learn­

ing model to classify images. Another approach of using radiomics is to divide a image

into patches and compute feature for each patch. This approach is usually used in classifi­

25 doi:10.6342/NTU202102364
cation and the image annotation is used to assign a label for each patch. One possibility is

to label both the organ and tumors. A machine learning algorithm is then used to build up

a model to distinguish patches that contain tumors from patches that does not contain tu­

mors. There are several possible choices of machine learning algorithms, such as k nearest

neighbors (KNN), support vector machine (SVM), random forest, XGboost. The outputs

of this approach patch­based results. One may summarize the patch­based results in a

heat map and make conclusions by the heat map. Heat maps not only provide explainable

information but also can be used to generate patient­based prediction.

However, as we previous discussed, image annotation is not a good idea in our prob­

lem. Therefore, we use deep learning to establish models rather than traditional machine

learning.

2.6.2 Artificial Neural Networks

Deep learning has shown its power in many fields by constructing artificial neural

networks (ANN) of multiple layers. We introduce ANN and some related concepts in this

section.

First, we formally define the ANN. A single­layer neural network is a parametrized

function

F (x; A, b) = f (Ax + b) for x ∈ Rn ,

where f is a non­linear continuous function, A ∈ Rm×n , and b ∈ Rm . An ANN or a

multi­layer neural network is a composition of single­layer neural networks. Suppose that

L ∈ N and n0 , n1 , n2 , ..., nL ∈ N. For i = 1, 2, ..., L, let fi : Rni → Rni be a non­

linear continuous function, Ai ∈ Rni ×ni−1 , and bi ∈ Rni ×1 . For i = 1, 2, ..., L, define a

26 doi:10.6342/NTU202102364
single­layer neural network as

Fi (x; Ai , bi ) = fi (Ai x + bi ) for x ∈ Rni−1 ×1 .

Then an ANN is defined as the composition FL ◦ FL−1 ◦ ... ◦ F1 . We denote the ANN

by F (x; θ) where x ∈ Rn0 ×1 and θ is the parameter consisting of Ai ’s and bi ’s for conve­

nience. In which case, L is called the number of layers, ni is called the number of neurons

in i­th layer, fi is called an activation function, Ai is called a weight matrix (or simply

weight), and bi is called a bias vector (or simply bias), for i = 1, 2, ..., L.

A popular choice of activation functions is the Rectified Linear Unit (ReLU), which

is defined by 



x if x ≥ 0
ReLU(x) = max{x, 0} = x∈R



0 if x < 0

A neural network uses ReLU as its activation function at every layer is called a ReLU

network.

The configuration of an ANN means the number of layers and numbers of neurons

of the ANN. With a fixed configuration, we would like to find some good weights and

biases for the dataset and the task. Loss functions are smooth functions that evaluate how

good an ANN is. The choice of loss function depends on tasks. Given a dataset and a task.

We treat the loss function as the objective function and weights and biases as independent

variables. By minimizing the loss function, we may find optimized weights and biases.

Solving this minimization problem is called training.

The terms parameters and hyperparameters mean variables that are determined by

training and variables that are defined at the first stage, respectively. Note that the con­

27 doi:10.6342/NTU202102364
figuration is defined at the first stage and the weights and biases are randomly initialized

and are optimized during training. As a result, weights and biases are parameters and the

number of layers, numbers of neurons are kinds of hyperparameters.

A loss function depends on the task we are given. In a classification task, we are

given a dataset {(xi , yi )}N


i=1 , where N is the number of data, xi ∈ R is an input data, and
n

yi ∈ {0, 1} is the label of corresponding xi , for i = 1, 2, ..., N . Let F be an ANN and

ŷi = F (xi ; θ) for i = 1, 2, ..., N , where θ denotes the parameters. In this case, a typical

loss function L is defined by

L(ŷi , yi ) = − (yi ln(ŷi ) + (1 − yi ) ln(1 − ŷi )) for i = 1, 2, ..., N.

The loss function L is called the binary crossentropy loss. Note that ŷi ’s are functions of
XN
θ. Let L(θ) = L(ŷi , yi ). Then, we solve the following minimization problem to find
i=1
good θ.

min L(θ) (2.9)


θ

Here are some natural questions.

• What kinds of functions can an ANN approximate?

• How do we solve this minimization problem?

For the first question, Universal Approximation Theorem for Width­Bounded ReLU

Networks has shown that ANN with ReLU activations has the ability to approximate any

Lp function. [15] For the second question, gradient descent is a common method to solve

minimization problem for differentiable objective functions. Gradient descent update the

28 doi:10.6342/NTU202102364
parameters at each iteration by the recursive formula

θi+1 = θi − η∇θ L(θi ) for i = 1, 2, ...

where θj is the parameter at the j­th iteration for j ∈ N and η > 0 is called the step

size or the learning rate. However, since the number of data N and the dimension of
X
N
input data n are usually large. It is difficult to compute L(θ) = L(ŷi , yi ) as well as its
i=1
gradient. Therefore, mini­batch stochastic gradient descent is widely used in deep learning

algorithms. In which case, we update the parameter by using a batch of data instead of

using the whole dataset. It has been shown that the mini­batch stochastic gradient descent

with adaptive learning rate η converges to some local minimum.

Although ReLU networks have ability to approximate Lp functions, the other activa­

tion functions are still play some roles. In fact, Parametric Rectified Linear Unit (PReLU)

[8], defined by





x if x ≥ 0
PReLU(x) = max{x, 0} + a min{x, 0} = x∈R



ax if x < 0

where a > 0 is a parameter, has been shown that it can improve model fitting in the. Note

that the parameter a can also be trained during the training process and [8] has derived the

update formulation.

ANN is one of the most important concepts in deep learning. However, real­world

applications do not use the plain ANN. Convolutional neural network is an extension of

the concept of ANN that is widely used in computer vision. We introduce convolutional

neural networks in the following section.

29 doi:10.6342/NTU202102364
2.6.3 Convolutional Neural Networks

ANN can approximate many functions, but it is not applicable for image analysis

in practice. Computational cost is a big issue in this situation since images are repre­

sented by large arrays and so the numbers of neurons. Moreover, using a ANN with dense

weights would destroy the structure of the input being an image. Hence, Convolutional

neural networks (CNNs) are widely used in image tasks. Note that single­layer neural net­

work is a composition of a linear transformation, a translation, and an activation function.

A convolution transformation (with a kernel) is also a linear transformation. CNN is a

special case of ANN that replaces general linear transformations in ANN by convolution

transformations.

There are many advantages of using CNN in image tasks. We list three advantages

as follow.

1. A convolution transformation parametrized by its kernel, which is a k × k matrix

or a k × k × k tensor, where k ∈ N. In practice, k is far less the size of images, for

example k = 3. Hence, using CNN can reduce the numbers of parameters.

2. A convolution transformation extracts local features and takes the location into con­

sideration. Hence, using CNN can preserve structures of images.

3. The matrix representation of a convolution transformation is a circulant matrix,

which has many powerful properties for computation.

Due to these advantages, CNN has became state of the art in image tasks. In this thesis,

we focus on applications of CNN in medical image analysis. We use some configurations

that has validated in ImageNet and train it for the ovarian tumor classification task.

30 doi:10.6342/NTU202102364
2.6.4 Dataset and Data Splitting

We use the dataset maintained from Linkou Chang­Gung Memorial Hospital to es­

tablish models. After some rechecks and quality controls by Dr. Lin, Gigin, the dataset

contains 411 CT images obtained from 401 patients that consists of 161 cancerous cases

and 240 benign cases.

In this thesis, we split the patient list into 5­fold training sets and an extra test set.

First, we keep a test set of 81 patients. This test set does not involve in any training and

validation process. It is only used in testing. Second, we split the remaining 320 patients

into 5 folds in a stratified manner. Each fold contains 64 patients. We perform a 5­fold

cross­validation in the folds data and finally test the models on test set.

fold 1 fold 2 fold 3 fold 4 fold 5 test set


patients 64 64 64 64 64 81
images 67 64 64 66 66 84
Table 2.1: The numbers of patients and images of folds.

training validation test


fold 1 260 67 84
fold 2 263 64 84
fold 3 263 64 84
fold 4 261 66 84
fold 5 261 66 84
Table 2.2: The numbers of data in each fold. The test sets in all lists are identical. We keep
the test set unseen for final test. For i = 1, 2, 3, 4, 5, list i regards fold i as the validation
set and the others as the training set.

2.6.5 Training

The training process of deep learning model is precisely solving an optimization prob­

lem which aims to minimize the loss function. Note that the minimization problem has no

analytic expression and the computation complexity is very high. A common approach to

31 doi:10.6342/NTU202102364
solve this optimization problem is stochastic gradient descent. However, the loss function

is usually very non­convex, so there are some studies aim to improve the convergence of

stochastic gradient descent and Adam [11] is the most popular one. Adam is a widely used

optimizer that combine AdaGrad [5] and RMSProp [20]. It has been shown that Adam has

high stability and high efficiency. NovoGrad [7] uses layer­wise gradient normalization

to improve the performance and combines the advantages of stochastic gradient gradient

descent and Adam. In our experiemnts, we use NovoGrad as our optimizer.

By using the registration methods discussed in previous sections, we can extract some

specific body parts from CT images. Two body parts selection strategies are used in pre­

processing, using only pelvis or using pelvis and lower abdomen.

As we discussed in Section 2.6.4, we perform a 5­fold cross validation and test the

resulting models on the test set. At the ith­stage, we treat the i­th fold as the validation set

and treat the other 4 folds as training set. We train models on the training set and monitor

the training process by evaluating the models on the validation set for each epoch. After

the training process, we evaluated the models on the test set. Finally, we compared the

performance between models as well as body parts selection strategies.

We choose Densenet121 [9] with dropout [19] rate 0.2 as the classification model

in the experiments and choose NovoGrad as the optimizer as previous discussed with

learning rate 3 × 10−4 and weight decay 10−4 with a batch size 8. Also, we use the cosine

annealing [14] learning rate scheduler to decay the learning rate during training, where the

decay period was 50 and the minimal learning rate was 10−6 and trained these models for

1000 epochs.

Each data is resampled to spacing 1mm × 1mm × 1mm for a uniform spacing. And

32 doi:10.6342/NTU202102364
we use center crop or padding to obtain an array from an image of uniform spacing. For

the case of cropping to pelvis, the size of the array is 224 × 224 × 192. For the case of

cropping to pelvis and lower abdomen, the size of the array is 224 × 224 × 256. Then we

apply random 3D affine transform Gaussian noise for data augmentation. The parameters
π
of random 3D affine transform are maximal rotation angle , maximal shear range 0.2,
12
maximal translation range 0.1, and tri­linear interpolation.

33 doi:10.6342/NTU202102364
34 doi:10.6342/NTU202102364
Chapter 3 Results and Discussion

We discuss the results of the proposed pipeline in this chapter.

3.1 Bone Segmentation

We assume the bone is a connected component in Section 2.2.2 and Section 2.2.3

since the human bone is connected. However, due to some reasons in CT imaging tech­

niques, the assumption does not always hold. For example, the older CT scan usually had

its spacing 1 mm in z­axis, which is called thin cut image. But nowaday, the CT scan usu­

ally has its spacing 5 mm in z­axis by using compression techniques for saving storage.

Therefore, the bone in CT scan may not be connected. Hence, we perform a morpho­

logical closing to retouch the segmentation obtained from global thresholding in practice.

Although we expect that the closing binary segmentation is connected, the connectivity

assumption may still fail.

Without the connectivity assumption, the bone segmentation obtained by using the

largest component method may fail. In fact, the segmentation usually omits some parts

of bone, such as mandible. In some cases, the segmentation may even omit the spine

and ribs, which means the segmentation preserve pelvis only. In which case, the bone

sequence cannot represent the pattern of the amounts of bone in axial slices. Moreover,

35 doi:10.6342/NTU202102364
the graph­cut method assume the graph is connected, which is equivalent to connectivity

assumption of bone segmentation in some sense of connectivity. Hence, using graph­cut

method to segment 3D whole bone does not make sense if the connectivity assumption

fails.

Since our main purpose is to obtain a brief approximation of the amount of bone in

each axial slice and that recognize the pattern by the next step, registration by wavelet

transform. Moreover, there are large numbers of CT images to be analyzed and thus the

computational performance of the pipeline shall be more efficient in medical practices.

Hence, we decide to use hysteresis thresholding for its efficiency and good enough per­

formance.

As we previous discussed, the choice of the low threshold is based on medical knowl­

edge. In fact, the CT value 400 HU is a common threshold for bone. The choice of the high

threshold is based on experimental observation. We observe from the case shown in 3.1

that the bone voxels are usually of intensities greater than 800 while the contrast­enhanced

structures are usually of intensities less than 650.

Figure 3.1: An example of a CT slice in axial view and its bone segmentation obtained by
global thresholding

Note that the set of foreground pixels F in the segmentation consists of both spine

(at the center) and contrast­enhanced structure (on the right). We can manually separate

36 doi:10.6342/NTU202102364
these two parts in this case and obtain the following results. In fact, there are 16 connected

components C1 , C2 , ..., C16 in the segmentation. By pi , we denote the 90­th percentile of

the intensity histogram in Ci . Let

A = {i ∈ {1, 2, ..., 16} | pi > 650} and B = {i ∈ {1, 2, ..., 16} | pi ≤ 650}.

Then the we divide F into A and B. More precisely F = A ∪ B and A ∩ B = ϕ. A is

exactly the subset of bone pixels in F and B is exactly the subset of contrast­enhanced

structure pixels in F . The separation result has shown in Figure 3.2.

Figure 3.2: Bone segmentation and contrast­enhanced structure segmentation. In both


plots, black pixels mean the background. White pixels in the left plot mean the bone
pixels and that in the right plot mean the contrast­enhanced structure. Gray pixels in the
left plot mean the contrast­enhanced structure and that in the right plot mean the bone
pixels.

Based on the separation, we plot the histogram of the intensities in the bone set A and

that in the non­bone set B in Figure 3.3. Observe that the intensities of non­bone pixels

is less possible to be higher than a bound, say 650. On the other hand, the intensities of

bone pixels can achieve high intensities, such as 1000. Therefore, we use 800 as the high

threshold in hysteresis thresholding.

Hence, we decide to use hysteresis thresholding described in Section 2.2.4 to segment

bone. Figure 3.4, Figure 3.5, and Figure 3.6 show segmentation results of Case 1.

37 doi:10.6342/NTU202102364
Figure 3.3: Bone segmentation and contrast­enhanced structure segmentation. The white
pixels mean the foreground and the gray and black pixels mean the background.

Figure 3.4: Original axial images of case 1.

38 doi:10.6342/NTU202102364
Figure 3.5: Bone segmentation in axial view of case 1.

Figure 3.6: Bone signal and bone segmentation in coronal view of case 1.

39 doi:10.6342/NTU202102364
Case 1 is one of regular cases, which means there is no distinct contrast­enhanced

structure that interrupts the bone segmentation. Note that there is an M structure in the

pelvis. This fact is highly related to anatomy in medical science. The other patterns in

the bone signal also correspond to some bone structures and body parts. For example, the

numbers of bone pixels are small in the lower abdomen since the only bone in that part is

spine, and the numbers of pixels are increasing as ribs appear in the slices.

Figure 3.7: Original axial images of case 2.

Case 2 is one of not regular cases, which means there are some distinct contrast­

enhanced structures that interrupt the bone segmentation. CT slices of Case 2 are shown

in Figure 3.7. In fact, there are several contrast­enhanced structures in pelvis, and we

may also see the segmentation obtained by hysteresis thresholding includes these non­

bone pixels in Figure 3.8. One of our main purpose is to find the pelvis slices in CT scan,

but these contrast­enhanced structures interrupt the pattern in bone signal. Accordingly,

the M structure is not perfect and the third peak appears in the bone signal Figure 3.9.

40 doi:10.6342/NTU202102364
Figure 3.8: Bone segmentation in axial view of case 2.

Figure 3.9: Bone signal and bone segmentation in coronal view of case 2.

41 doi:10.6342/NTU202102364
3.2 Internal Air Segmentation

We use the approach described in Section 2.3 to segment internal air. Fortunately,

air signals are not confusing during segmentation process. The most important features of

air is low gray­scale values, such as −1000 HU. The other things in CT scan are usually

of gray­scale values larger than −200 HU. For example, body fat is darker than many

tissue in CT, and the gray­scale values of body fat usually lies in the interval [−70, −30].

Therefore, the performance of the approach described in Section 2.3 is good enough for

further analysis.

Figure 3.10 to 3.15 show segmentation results of Case 1 and Case 2. The slices

chosen in this section are near lung, which is the organ that contains most air in human

body. We may see the lung is segmented in Figure 3.12 and Figure 3.15. Gastrointestinal

tract may also contains air but the amount of air in gastrointestinal tract is far less than

that in lung. As a result, we may see a part of gastrointestinal tract is segmented in Figure

3.12. However, the whole lung is included in Case 2 and hence the gastrointestinal tract

is not clear in Figure 3.15.

42 doi:10.6342/NTU202102364
Figure 3.10: Original axial images of case 1.

Figure 3.11: Internal air segmentation in axial view of case 1.

43 doi:10.6342/NTU202102364
Figure 3.12: Air segmentation in axial view of case 1.

Figure 3.13: Original axial images of case 2.

44 doi:10.6342/NTU202102364
Figure 3.14: Internal air segmentation in axial view of case 2.

Figure 3.15: Air segmentation in axial view of case 2.

45 doi:10.6342/NTU202102364
3.3 Registration and Partition of Body Parts

As we mentioned before, the bone signal and internal air signal are often noisy, so

directly calculating the Euclidean distances between bone signal and internal air signal

does not make sense. Therefore, we need to apply some transformation to the signals first

for denoising and pattern recognition before we minimizing the distances between these

signals.

3.3.1 Preparation Stage

We use the following case as the reference image and the breakpoints of body parts

are shown in Figure 3.16. Then we compute the bone segmentation and the internal air

segmentation, and the segmentation results, the bone signal, and the air signal are shown

in Figure 3.17 and 3.18.

Figure 3.16: Bounds of body parts of the reference image.

46 doi:10.6342/NTU202102364
Figure 3.17: Air segmentation in axial view of the reference image.

Figure 3.18: Bone segmentation in axial view of the reference image.

47 doi:10.6342/NTU202102364
3.3.2 Gaussian Filters as the Transformations

We use the Gaussian filter with width ϵ = 2 (unit) in Section 2.4.1 for denoising.

The results of the reference image, Case 1, and Case 2 are shown in Figure 3.19 to 3.21.

Note that the air signals have a large maximum in the part of lung and the typical pattern

of the bone signal is present in Case 1.

Figure 3.19: Signals transformed by Gaussian filter of the reference case.

While Gaussian filter is denoising, it also removes some important potential patterns

in images. Notice that the M structure disappears in the reference case and Case 2. But the

M structure is an important pattern in bone signal as our previous discussion. Therefore,

we don’t use Gaussian filter as the transformation.

48 doi:10.6342/NTU202102364
Figure 3.20: Signals transformed by Gaussian filter of case 1.

Figure 3.21: Signals transformed by Gaussian filter of case 2.

49 doi:10.6342/NTU202102364
3.3.3 Wavelet Transforms as the Transformations

Due to the ability of wavelet transform to deal with multi­scale problem, the wavelet

transform recognize the patterns of bone sequence and internal air sequence. Choosing a

suitable set of scale factors is an important part in this step. Besides, normalization is also

a necessary step. Since people have different body types, the range of bone sequences and

air sequences are also different. Hence, even if we assume two people have bone structure

only different in scales, the reference bone signal and the warping moving bone signals

may be far to each other in the sense of L2 norm. However, using either L2 norm or L∞

norm for normalization may get confused since every CT has different body parts. If a

CT contains lung, then the L∞ norm of a air signal can be large, for example, it may be

50000 HU. On the other hand, if a CT does not contain any parts of lung, then the L∞

norm is small, for example, it may be 4000 HU. A Similar situation also occurs in case of

the bone signal. So we use some percentiles to normalize the signals. More precisely, we

use the maximum of 75­percentile of the bone sequence and 3000 HU to normalize the

bone signal. And we use the maximum of 97.5­percentile of the air sequence and 4000 HU

to normalize the air signal. Taking the maximums is preventing the results from outliers.

These percentiles are expected to emphasize some landmarks, such as M structure in the

bone signal and the peak in the air signal.

Figure 3.22 shows an example of bone signal and its wavelet basis with normaliza­

tion. If we use a lot of scale factors to form a basis, then there would be a lot of redundant

vectors. By choosing a suitable set of scale factors, we may see some import patterns in

the feature vectors. For example, the M structure and the local monotonicities appear in

some feature vectors of suitable scale factors. As we mentioned in Section 2.4, we use

50 doi:10.6342/NTU202102364
Figure 3.22: Bone signal and its features extracted by wavelet transforms.

{3, 4, ..., 10} as the set of scale factors to perform the wavelet transforms.

The time­scale plots of Case 1 and Case 2 are shown in Figure 3.23 and 3.24. In

Case 1, we may see the M structures are captured by several transformed bone signals of

different scale factors. The M structures and the third peak related to the contrast­enhanced

structures are captured by transformed bone signals in Case 2. Moreover, the peaks of the

air signals are captured by the transformed signals of both cases. In either bone signals or

air signals, due to the ability to deal with multi­scale problem, wavelet transforms captures

more patterns than Gaussian filters. Therefore, we will use the transformation defined by

wavelet transforms to do the next step.

3.3.4 Registration

By solving the optimization problem (8) described in Section 2.4, we may register the

bone signals and the air signals between cases. The parameter λ is chosen to be 0.3 since

the main information should be given by the bone signals while air signals is regularizing

the registration results.

51 doi:10.6342/NTU202102364
Figure 3.23: Time­scale plots of case 1.

Figure 3.24: Time­scale plots of case 2.

52 doi:10.6342/NTU202102364
Figure 3.25: Registration results of case 1. The bone structure of the reference image and
the transformed bone structure of the moving image.

Recall that Case 1 is a regular case. The pattern of bone structure is complete in both

segmentation and the bone signal. The registration result as shown in Figure 3.25 is great

as we may expect.

On the other hand, since we use air signal just for regularization, the registration re­

sults of internal air of Case 1 as shown in Figure 3.26 is not good. In fact, the maximums

of air signals align to each other due to our normalization approach. The registration result

may not be improve anymore since the lung structure in Case 1 does not be completely

captured. Moreover, we are not going to improve the registration result due to the fol­

lowing reasons. First, the shape of lung is not fixed. In fact, lung is flexible. The shape

of lung is changing at every moment since we are breathing. Hence, registration of lungs

is difficult. Second, our purpose is the partition of body parts and the usage of air sig­

nal is just regularizing the registration, so we do not focus on improving the internal air

53 doi:10.6342/NTU202102364
Figure 3.26: Registration results of case 1. The air structure of the reference image and
the transformed air structure of the moving image.

registration.

Although Case 2 is not so regular, the registration results are still good as shown

in Figure 3.27 and 3.28. Even though the transformed bone signals near pelvis are not

close enuough to the reference case, we still obtain a good body parts partition due to the

regularization of air signals. In fact, the internal air registration of Case 2 is great as shown

in Figure 3.28.

As we see in these examples, the air segmentation is usually good enough but the

bone segmentation may be bad. However, even though the bone segmentation seems not

regular as in Case 2, our body parts partition algorithm can still obtain a good registration

and hence a good body parts partition if the bone segmentation is not totally broken. We

show the final body parts partition of Case 1 and Case 2 in Figure 3.29 and Figure 3.30.

During the research, we listed 44 cases for the difficulty to segment bone or other

54 doi:10.6342/NTU202102364
Figure 3.27: Registration results of case 2. The bone structure of the reference image and
the transformed bone structure of the moving image.

Figure 3.28: Registration results of case 2. The air structure of the reference image and
the transformed air structure of the moving image.

55 doi:10.6342/NTU202102364
Figure 3.29: Body parts partition of case 1.

Figure 3.30: Body parts partition of case 2.

56 doi:10.6342/NTU202102364
problems, such as wrong body parts and wrong phases. And we discussed these 44 CT

images with Dr. Lin. Dr. Lin rechecked these cases and then labeled the body parts for

these cases. After that, 8 images were excluded in the analysis. Some of them are replaced

by correct ones and some of them are removed due to the wrong body part, such as chest

CT images.

We test body parts partition algorithm to the remaining 36 cases and compare the

partition obtained by the algorithm with the ground truth labeled by Dr. Lin. For each

case, we compute the absolute error (cm) between partitions of our algorithm and the

ground truth. We summarize the corresponding statistics as shown in Table 3.1.

It is clear that there are some outliers in these distributions. These cases basically are

of average errors larger than 10 slices. We leave the segmentation and partition results in

Appendices A.2. These cases contain low resolution image, leg CT, and artifact.

l­pel u­pel m­abd l­chest m­chest u­chest


count 36 36 34 34 34 17
mean 5.53 5.07 2.42 2.07 3.13 1.42
std 16.10 16.05 2.67 3.01 2.82 1.45
min 0.06 0.07 0.07 0.04 0.02 0.09
Q1 1.05 0.52 0.81 0.50 1.56 0.51
Q2 2.39 1.52 1.53 1.25 2.63 1.01
Q3 3.75 3.10 2.80 1.77 3.75 2.08
max 98.24 97.07 11.01 12.89 12.07 6.14
Table 3.1: Error distribution of each body parts. We denote lower, middle, upper, pelvis,
and abdomen by l, m, u, pel, and abd.

We exclude outliers described above to obtain robust statistics as shown in Table 3.2.

The body parts partition results of outliers are shown in Appendix A.2 We are interested

in l­pel, u­pel, m­abd since the three breakpoints are used in our preprocessing of deep

learning models. The mean errors in l­pel, u­pel, m­abd are 2.48 cm, 1.73 cm, and 1.73

cm. And the respective standard deviations are 1.96 cm, 1.87 cm, and 1.46 cm. The results

57 doi:10.6342/NTU202102364
l­pel u­pel m­abd l­chest m­chest u­chest
count 32 32 31 31 31 17
mean 2.48 1.73 1.73 1.25 2.37 1.42
std 1.96 1.87 1.46 1.00 1.40 1.45
min 0.06 0.07 0.07 0.04 0.02 0.09
Q1 1.02 0.49 0.78 0.47 1.43 0.51
Q2 2.23 1.20 1.33 1.02 2.37 1.01
Q3 3.29 2.17 2.51 1.73 3.34 2.08
max 7.27 9.33 6.91 4.48 5.50 6.14
Table 3.2: Error distribution of each body parts after excluding outliers. We denote lower,
middle, upper, pelvis, and abdomen by l, m, u, pel, and abd.

shows the algorithm has good enough performance for preprocessing of our deep learning

models.

3.4 Classification by Deep Learning

We use two cropping strategies to preprocess the CT images. The first one is cropping

to pelvis. That is, crop CT images to the region between l­pel and u­pel determined by our

registration results. The second one is to crop images to pelvis and lower abdomen. That

is, crop CT images to the region between l­pel and m­abd determined by our registration

results.

cropping strategy validation test


upper pelvis 0.8601 ± 0.0414 0.8129 ± 0.0154
lower abdomen 0.8486 ± 0.0526 0.7891 ± 0.0300
Table 3.3: Mean AUCs for different stategies. This table shows means and standard devia­
tions of validation AUCs and test AUCs. The means are standard deviations are computed
from 5­fold cross validation results.

Table 3.3 shows the means and standard deviations of AUCs of 5 folds and we leave

the details in Table B.1. By cropping to upper pelvis, we obtain a mean validation AUC

0.8601 and a mean test AUC 0.8129. On the other hand, we obtain a mean validation AUC

0.8486 and a mean test AUC 0.7891 by cropping to lower abdomen.

58 doi:10.6342/NTU202102364
The first thing we may see is that the validation performance is better than test perfor­

mance in either cases. Note that the validation set is used to monitor the training process

and we evaluate validation performance on each epoch end. Moreover, the model check­

point is selected to have the highest validation AUC. Hence, it may overfit the validation

set when the dataset is not large enough. We may, in theory, reduce the phenomenon by

adding data.

Second, both the mean validation AUC and mean test AUC obtained by cropping to

upper pelvis are higher than that obtained by cropping to lower abdomen. Although there

are some features in lower abdomen that is related to ovarian cancer, it seems that mod­

els are making decisions by organs in pelvis. Moreover, the standard deviations obtained

by cropping to upper pelvis are lower than that obtained by cropping to lower abdomen.

Therefore, cropping to upper pelvis gives robuster models and we choose it as our prepro­

cessing strategy.

59 doi:10.6342/NTU202102364
60 doi:10.6342/NTU202102364
Chapter 4 Conclusion

In this thesis, we propose an analysis pipeline that does not need image annotations

of ovaries and ovarian tumors. We decide to preprocess the CT images by cropping the

images to some specific body parts to avoid image annotations. For this purpose, we

develop a body parts partition algorithm which only needs a few body parts annotations

to automatically obtain body­part breakpoints in a query image. In our algorithm, we

use wavelet transform to smooth the bone signals and air signals as well as recognize

patterns. And we align body parts between the reference image and the moving image

by solving a minimization problem. Although the algorithm fails in some cases, such as

images with artifacts or cases that does not contain abdomen and chest parts, our body parts

partition algorithm still has a good enough performance as a preprocessing technique for

deep learning. Error in 5 body part breakpoints are of means approximately 2cm, which

is under a controllable level.

On the part of ovarian tumors classification, we preprocess CT images by two strate­

gies, cropping to pelvis and to the union of pelvis and lower abdomen and train CNNs for

classification in a cross­validation manner. We compare the performance between these

two strategies and find that cropping to pelvis is not only of a high mean test AUC but also

of a lower standard deviation. Therefore, we decide to crop CT images to pelvis for this

task. Moreover, the mean test AUC obtained by cropping to upper pelvis is 0.8129 and the

61 doi:10.6342/NTU202102364
standard deviation 0.0154 and it shows that CNN as well as our preprocessing approach

has the potential to classify ovarian tumor.

Some future works includes the following. The first one is improving preprocessing

pipeline, including the bone segmentation methodology and the computational efficiency

so that the pipeline has its capability to support medical practices. Second, find an optimal

parameter λ in the registration setting to minimize the error of registration results. Third,

use a set of reference images for different subsets of patients rather than one reference

image in the registration step. People with different conditions tend to have different types

of bone or some other structures. For example, the bone structures between a young man

and a old man are different. Therefore, we may divide people into groups and prepare a

reference image for each group to improve the registration performance. Fourth, improves

the model performance. In theory, we may improve the classification performance by

hyperparameter optimization.

Also, adding more data for training is another approach to improve the model per­

formance.

62 doi:10.6342/NTU202102364
References

[1] U. R. Acharya, S. V. Sree, L. Saba, F. Molinari, S. Guerriero, and J. S. Suri. Ovarian

tumor characterization and classification using ultrasound—a new online paradigm.

Journal of digital imaging, 26(3):544–553, 2013.

[2] J. Bigot. Landmark­based registration of curves via the continuous wavelet trans­

form. Journal of Computational and Graphical Statistics, 15(3):542–564, 2006.

[3] Y. Boykov and G. Funka­Lea. Graph cuts and efficient nd image segmentation.

International journal of computer vision, 70(2):109–131, 2006.

[4] J. Canny. A computation approach to edge detection. IEEE Trans. Pattern Anal.

Mach. Intell., 8(6):670–700, 1986.

[5] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning

and stochastic optimization. Journal of machine learning research, 12(7), 2011.

[6] T. Gasser and K. Wang. Synchronizing sample curves nonparametrically. The Annals

of Statistics, 27(2):439–460, 1999.

[7] B. Ginsburg, P. Castonguay, O. Hrinchuk, O. Kuchaiev, V. Lavrukhin, R. Leary, J. Li,

H. Nguyen, Y. Zhang, and J. M. Cohen. Stochastic gradient methods with layer­wise

63 doi:10.6342/NTU202102364
adaptive moments for training of deep networks. arXiv preprint arXiv:1905.11286,

2019.

[8] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing

human­level performance on imagenet classification. In Proceedings of the IEEE

international conference on computer vision, pages 1026–1034, 2015.

[9] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected

convolutional networks. In Proceedings of the IEEE conference on computer vision

and pattern recognition, pages 4700–4708, 2017.

[10] S. E. Jung, J. M. Lee, S. E. Rha, J. Y. Byun, J. I. Jung, and S. T. Hahn. Ct and mr

imaging of ovarian tumors with emphasis on differential diagnosis. Radiographics,

22(6):1305–1325, 2002.

[11] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint

arXiv:1412.6980, 2014.

[12] M. Krčah, G. Székely, and R. Blanc. Fully automatic and fast segmentation of the

femur bone from 3d­ct images with no shape prior. In 2011 IEEE international

symposium on biomedical imaging: from nano to macro, pages 2087–2090. IEEE,

2011.

[13] H. Lamecker, M. Seebass, H.­C. Hege, and P. Deuflhard. A 3d statistical shape

model of the pelvic bone for segmentation. In Medical imaging 2004: Image

processing, volume 5370, pages 1341–1351. International Society for Optics and

Photonics, 2004.

[14] I. Loshchilov and F. Hutter. Sgdr: Stochastic gradient descent with warm restarts.

arXiv preprint arXiv:1608.03983, 2016.

64 doi:10.6342/NTU202102364
[15] Z. Lu, H. Pu, F. Wang, Z. Hu, and L. Wang. The expressive power of neural networks:

A view from the width. arXiv preprint arXiv:1709.02540, 2017.

[16] J. O. Ramsay and X. Li. Curve registration. Journal of the Royal Statistical Society:

Series B (Statistical Methodology), 60(2):351–363, 1998.

[17] A. Z. Richard L. Wheeden. Measure and Integral. An Introduction to Real Analysis.

CRC Press, 2015.

[18] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions

on pattern analysis and machine intelligence, 22(8):888–905, 2000.

[19] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov.

Dropout: a simple way to prevent neural networks from overfitting. The journal

of machine learning research, 15(1):1929–1958, 2014.

[20] T. Tieleman and G. Hinton. Lecture 6.5 ­ rmsprop, coursera: Neural networks for

machine learning. Technical report, 2012.

[21] A. Vlahou, J. O. Schorge, B. W. Gregory, and R. L. Coleman. Diagnosis of

ovarian cancer using decision tree classification of mass spectral data. Journal of

Biomedicine and Biotechnology, 2003:308–319, 2003.

[22] J. Zhang, C.­H. Yan, C.­K. Chui, and S.­H. Ong. Fast segmentation of bone in ct im­

ages using 3d adaptive thresholding. Computers in biology and medicine, 40(2):231–

236, 2010.

65 doi:10.6342/NTU202102364
66 doi:10.6342/NTU202102364
Appendix A — Outliers in Error
Analysis

A.1 Introduction

In error analysis of our partition algorithm, we find there are 4 outliers in our results.

We remove the 4 outliers for a robust mean and standard deviation. Also, we analyze the 4

outliers and give some explanation for them. We plot the partition results of these 4 cases

in this appendix.

A.2 Body Parts Predicted by Our Algorithm

The first one is a low resolution image as shown in Figure A.1. The ribs are not

included in the bone segmentation, so the bone signal does not present the information

about ribs. Moreover, only a little part of lung is captured in this image, so the air signal

cannot regularize the registration as we expected. In summary, the bone segmentation is

not good enough and only a little part of lung is captured, so the registration result is bad.

There are some artifacts in the second case as shown in Figure A.2. Hence, we may

see that the large number of ”bone pixels” detected by the hysteresis bone segmentation.

67 doi:10.6342/NTU202102364
Figure A.1: Outlier 1 removed in Table 3.2.

And therefore, the bone signal does not present the true patterns of bone. The bone signal

is the most important part in our body parts partition algorithm. As a result, the registration

result is bad.

Figure A.2: Outlier 2 removed in Table 3.2.

The third case, as shown in Figure A.3 is not so strange. However, it seems that some

breakpoints such as breakpoints in chest are not correct. One reason for this case is that

this image captures only a little part of lung. Therefore, the registration of air sequences

is somehow interrupting the results.

The fourth case is a leg CT, as shown in Figure A.4. So the air signal cannot provide

any information. Furthermore, it may interrupt the registration since it tries to align the

abdomen of this case to the lung of reference case. Therefore, our algorithm align the

pelvis of the reference case to the feet of this case and align the lung of the reference case

68 doi:10.6342/NTU202102364
Figure A.3: Outlier 3 removed in Table 3.2.

to the lower abdomen in this case.

Figure A.4: Outlier 4 removed in Table 3.2.

69 doi:10.6342/NTU202102364
70 doi:10.6342/NTU202102364
Appendix B — Cross­Validation Results

B.1 Introduction

We show the details of cross validation results in this appendix.

B.2 Results

metric strategy list 1 list 2 list 3 list 4 list 5 mean std


val AUC u­pel 0.8902 0.8103 0.9124 0.8533 0.8343 0.8601 0.0414
val AUC l­abd 0.9325 0.7964 0.8527 0.8485 0.8129 0.8486 0.0526
test AUC u­pel 0.8157 0.8303 0.8236 0.7927 0.8020 0.8129 0.0154
test AUC l­abd 0.7901 0.8347 0.7720 0.7942 0.7545 0.7891 0.0300
val acc u­pel 0.8209 0.7344 0.8281 0.8030 0.7273 0.7827 0.0483
val acc l­abd 0.8358 0.7031 0.7813 0.6212 0.7576 0.7398 0.0816
test acc u­pel 0.7500 0.7143 0.7143 0.7143 0.7262 0.7238 0.0155
test acc l­abd 0.7024 0.7381 0.6905 0.6429 0.7143 0.6976 0.0353
Table B.1: This table shows metrics obtained in details. The threshold is simply chosen
as 0.5 for computing accuracy. Here, we denote validation, accuracy, upper pelvis, and
lower abdomen by val, acc, u­pel, l­abd, respectively, for short.

In view of Table B.1, we may see metrics obtained by cropping to upper pelvis are

higher than or approximately equal to that obtained by cropping to lower abdomen in many

folds. Based on the results, cropping to pelvis is a better strategy in our experiments.

71 doi:10.6342/NTU202102364

You might also like