Bdair T MasterThesis PDF

BREAST CANCER CLASSIFICATION USING THE SPATIAL
DISTRIBUTION OF ULTRASOUND-BASED FEATURES
By
Tariq Bdair
Dr. Mohammad Daoud
Thesis submitted in partial fulfillment of the requirements for the degree of M.Sc. in
Computer Engineering
At
The Deanship of Graduate Studies
German Jordanian University
July, 2016
DEDICATIONS
{
}

, 9
To my Parents, Wife, Brothers, and Children.
ACKNOWLEDGMENTS
I extend my endless gratitude to Almighty ALLAH for blessing me in completing this

work. My great thankfulness is to Him for His mercy, blessing and help. I wish Almighty
ALLAH accept this work and grant me this help to thank Him.
First of all, I would thank Dr. Mohammad Daoud for his invaluable guidance, concern,
enthusiasm, and his support. This work has evolved between his hands, has grown up and
has ended under his great supervision.
My full thank goes to Dr. Mahsen Al-Najar, from Jordan University Hospital, who
provided us with the medical ultrasound images, and manually segmented all tumors
employed in this study.
My great appreciation to Dr. Rami Alazrai, Dr. Ashraf M.A. Ahmad, and Dr. Dia Abu
Nadi, the members of committee for their valuable comments and discussion.
I am very grateful to the staff of the Department of Computer Engineering at German
Jordanian University. Also, I am very thankful to all my friends for their encouragement
and support.
Lastly, I would like to express my special gratitude and appreciation to my parents, wife,
and my brother for their endless support and prayers. I dedicate my work to them as a
gesture of my indebting to them.
TABLE OF CONTENTS
Page
DEDECATIONS ........................................................................................................ i
ACKNOWLEDMENTS ........................................................................................... ii
TABLE OF CONTENTS .........................................................................................iii
LIST OF FIGURES ................................................................................................. vi
LIST OF TABLES .................................................................................................. vii
ABSTRACT ...........................................................................................................viii
CHAPTER ONE: INTRODUCTION ....................................................................... 1

1.1 Problem Description ........................................................................................ 1
1.2 Motivation ....................................................................................................... 3
1.3 Contribution ..................................................................................................... 3
1.4 Thesis Organization ......................................................................................... 5
CHAPTER TWO: LITERATURE REVIEW ....................................................................... 6

2.1 Preprocessing algorithms for enhancing ultrasound images ........................... 6
2.2 Ultrasound Images Segmentation .................................................................... 9
2.3 Feature Extraction and Selection ................................................................... 12
2.4 Breast Cancer Classification .......................................................................... 15
CHAPTER THREE: MATERIAL AND METHODS............................................ 22

3.1 Data acquisition............................................................................................. 22
3.2 Morphological Features ................................................................................. 23
3.2.1 Tumor Area and Perimeters .................................................................... 23
3.2.2 Form Factor ............................................................................................. 23
3.2.3 Roundness ............................................................................................... 23
3.2.4 Aspect Ratio ............................................................................................ 24
3.2.5 Convexity ................................................................................................ 24

3.2.6 Solidity .................................................................................................... 25
3.2.7 Extent ...................................................................................................... 26
3.2.8 Best Fit Ellipse Features .......................................................................... 26
3.2.9 The Normalized Radial Length (NRL) Features ..................................... 28
3.2.10 The Compactness .................................................................................. 29
3.2.11 The Undulation Characteristics ............................................................. 29
3.3 Texture Features ............................................................................................ 31
3.3.1 The Lesion Boundary .............................................................................. 31
3.3.2 Posterior Acoustic Characteristic ............................................................ 33
3.3.3 The Contrast Feature ............................................................................... 34
3.3.4 The Surrounding Tissue .......................................................................... 35
3.3.5 The Gray-Level Co-Occurrence Matrix (GLCM) Texture Features ...... 35
3.3.6 Summary of the features ......................................................................... 42
3.4 Tumor classification using the conventional morphological and texture
approaches ........................................................................................................... 44
3.5 Tumor Classification Using the Spatial Distribution of Ultrasound-based
Features ................................................................................................................ 46
3.5.1 Texture Features Selection ...................................................................... 47
3.5.2 Voting technique and ROIs reduction ..................................................... 48
CHAPTER FOUR: SIMULATION AND EXPERIMENT RESULTS ................. 52

4.1 Dataset Acquisition........................................................................................ 52
4.2 Conventional Morphological Features Results ............................................. 53
4.3 The Results obtained by combining the Conventional Morphological
Features with conventional Textures Features .................................................... 54
4.4 The results obtained by combining the conventional morphological,
Textures, and GLCM Features ............................................................................ 54
4.5 The Classification Results obtained by dividing the tumor into 1-mm2
without Applying the ROI Reduction and Voting Mechanism ........................... 55
4.6 Results obtained using the proposed ROIs reduction and Voting-based
Method ................................................................................................................. 56
4.7 Summary ........................................................................................................ 58
CHAPTER FIVE: DISCUSSION, CONCLUSIONS, AND FUTURE WORKS .. 60

5.1 Discussion ...................................................................................................... 60
5.2 Conclusions ................................................................................................... 61
5.3 Future works .................................................................................................. 62
REFERENCES........................................................................................................ 63
ARABIC ABSTRACT ............................................................................................ 67
LIST OF FIGURES
Figure
Description
Page
3.1.
The maximum and minimum diameters of a tumor
24
3.2
The Convex Hull that includes the tumor
25
3.3
The Bounding Box around tumor
26
3.4
The Best Fit Ellipse
27
3.5
The minor and major axes of the best fit ellipse and Theta
28
3.6
The eight neighborhoods around pixel P(x, y)
30
3.7
The undulation characteristics features [37]
31
3.8
The Inner and Outer bands of a tumor
32
3.9
The posterior acoustic area of a tumor
34
3.10
The four directions used in GLCM at distance 1 pixel
36
3.11
GLCM calculation
37
3.12
The main steps of the proposed tumor classification method
46
3.13
The ROIs in malignant tumors that voted correctly with tumor class 49
3.14
The ROIs in benign tumors that voted correctly with tumor class
50
LIST OF TABLES
Table
Description
Page
Table I
Texture features extracted from GLCM
38
Table II
The expressions used to calculate GLCM features
40
Table III
Summary of all extracted features in this thesis
43
Table IV
Summary of the experiments results
59
ABSTRACT
BREAST CANCER CLASSIFICATION USING THE SPATIAL
DISTRIBUTION OF ULTRASOUND-BASED FEATURES
By
Tariq Bdair
Thesis supervisor: Dr. Mohammad Daoud
Breast cancer is one of the major causes of death in women across the globe. According
to the statistics from Jordan National Cancer Registry (JNCR), 864 Jordanian women and
9 Jordanian men were diagnosed with breast cancer in 2008. Detection of breast cancer in
earlier stages can reduce the treatment cost and minimize the death rate. Many
technologies have been used to classify breast cancer in early stages, among which is
ultrasound. In fact, there is an increasing interest in using ultrasound imaging to detect
abnormalities in dense breasts due to its safety, portability and low cost. Several studies
have been reported in the literature about the detection and classification of breast tumors
based on texture and morphological features extracted from ultrasound images. However,
to best of our knowledge, previous studies did not analyze the spatial distribution of the
ultrasound-based texture features with the goal of improving breast cancer classification.
This thesis investigates the use of an improved texture analysis technique for increasing
the accuracy and specificity of breast tumor classification using two-dimensional (2D)
ultrasound images. Most conventional texture analyses classify the tumor using texture
features extracted from the entire tumor region. Instead of analyzing the entire tumor, our
improved technique divides the tumor into non-overlapping ROIs and extracts texture
features from the individual ROIs. The spatial distribution of the ROIs is analyzed to
improve the accuracy of tumor classification. In particular, the analysis indicate that the
ROIs located at bright regions of the image, which are usually close the tumor
boundaries, are more sensitive to the class of the tumor compared with the ROIs located
at the dark regions. Therefore, our improved classification technique selects the ROIs in
the tumor that can improve the classification accuracy. The texture features of each
selected ROI were analyzed using a classifier to determine the tumor class. The
classification indicators obtained from the individual ROIs were combined using a voting
mechanism to classy the tumor as benign or malignant.
An extensive study was employed to evaluate the performance of the improved texturebased classification technique. In particular, the study employed an extend set of 19
morphological features, 5 textures features, and 25 Gray Level Co-occurrence Matrix
(GLCM) features, which were reported in several previous studies, for classifying breast
tumors. The morphological features quantify the shape of the tumor as well as the
characteristics of the area surrounding the tumor boundaries. The texture features quantify
the statistics of image pixels to classify the tumor. The texture features were extracted for
both the entire tumor, as reported in previous studies, and the individual ROIs, as
suggested in this thesis. The improved texture analysis technique as well as the
conventional morphological analysis, conventional texture analysis, and the combined

conventional morphological and texture analysis were employed to classify 105 breast
ultrasound images that include 64 benign and 41 malignant tumors. The ultrasound
images were acquired during the period of 2012 and 2015 at the Jordan University
Hospital, Amman, Jordan. The classification of the tumors was performed using a support
vectors machine (SVM) classifier.
The results show that the accuracy and specificity obtained using our improved texturebased technique are equal to 96.89% and 98.28%, respectively. In comparison, the
accuracy and specificity of classifying breast tumors using the conventional
morphological features, the conventional texture features of the entire tumor, and
combined morphological feature and texture features of the entire tumor are 92.56% and
94.80%, 90.05% and 92.43%, and 94.46% and 95.20%, respectively.
These results indicate that our improved texture-based classification technique can
achieve classification accuracy and specificity that outperform conventional
morphological and texture features. An important advantage of our proposed technique is
the limited dependency of the classification results on the accuracy of outlining the tumor
since the classification is performed by analyzing and selecting individual ROIs within
the tumor.
CHAPTER ONE
INTRODUCTION
1.1 Problem Description
Breast cancer is the most common cause of death in women across the globe [1]. In fact,
10% of women in Europe and 12.5% of women in the United States suffer from this
disease [2]. According to the statistics from Jordan National Cancer Registry (JNCR), 864
Jordanian women and 9 Jordanian men were diagnosed with breast cancer in 2008, with a
percentage of 18.8% of the total new cancer cases [3]. Moreover, according to JNCR,
breast cancer is ranked first among all cancers in women, accounting for 36.7% of all
cancers in female. The disease is considered the leading cause of cancer deaths among
women in Jordan [3]. Early detection of breast cancer increases the survival rates, and
improves the treatment outcome [3].
The most common imagining modalities for diagnosing breast cancer are mammography
and ultrasound. Mammography provides an accurate modality for breast screening. The
study reported in [4] indicated that mammography enables 18% to 30% mortality
reduction rates. However, mammography is characterized by low specificity, which leads
to unnecessary (65-85%) biopsies operations [5]. It also has low accuracy when used to
detect breast cancers in women with dense breast.
The ionizing radiation of
mammography can increase the health risk for both the patients and physicians [5].
Ultrasound imaging has been one of the most successful diagnostic methods for breast
cancer detection, but due to its operator-dependency, the interpretation of ultrasound
images depends on the experience of the radiologist [5]. In fact, the interpretation of
breast cancer images is operator-dependent due to many facts, including: breast cancer
masses have complex shapes and appearance; their patterns can change from patient to
patient; ultrasound image usually are of low contrast and have a lot of noises and speckles
[5].
To improve the accuracy of breast cancer diagnosis and reduce the operator dependency;
computer-aided diagnostic (CAD) systems have been proposed to analyze breast
ultrasound images and provide a second opinion to the radiologists. In general CAD
systems use four steps to classify and detect breast cancer: (1) Image preprocessing:
because of the low contrast and high speckle in ultrasound images, preprocessing
techniques are used to enhance the quality of the image and decrease ultrasound speckles
without degrading the important features of the image. (2) Image segmentation: divide
the image into non-overlapping regions and separate the tumor from background tissue
then define a region of interest (ROI) in the image to extract cancer-detection features. (3)
Feature extraction and selection: extract a group of features that can be used to classify
benign and malignant tumors accurately. Typically, large number of features can be
extracted from the ultrasound image, and hence selecting a set of features that enables
accurate tumor classification is an important task. (4) Classification: after features
extracting and selecting the ultrasound-based features, an intelligent classification
algorithms are applied to analyze the features and classify breast tumors as benign or
malignant [5].
1.2 Motivation
Ultrasound imaging is an important complement to mammography for breast cancer
screening. Due to the advantages of low cost, portability, safety, and real-time,
ultrasound has been widely used for breast cancer detection [5][6-8]. Recent studies
show that the use of ultrasound images can distinguish between the benign and
malignant tumors with high accuracy [5, 9, and 10]. In addition, ultrasound images
can increase the accuracy of breast cancer detection by 17% [5, 11] and decrease the
unneeded biopsies by 40%, which saves $1 billion in the United States every year [5,
12]. Breast ultrasound (BUS) images provide an important complementary to
mammography for many reasons; it is safe, and fast and more accurate than
mammography in detecting abnormalities in dense breasts for women younger than
35 years [13] [14].
1.3 Contribution
The main contributions of this thesis can be summarized as follows:
An extended review of previous ultrasound-based breast tumor classification

methods is made. The classification methods are divided into three groups:
classifications based on morphological features, classifications based on texture
features, and classifications based on combined morphological and textures
features.
Implement and evaluate conventional texture, morphological, and combined

texture-morphological methods for classifying breast tumors.
To improve the accuracy of texture-based breast tumor classification, the tumors

region is divided into a set of non-overlapping regions of interest (ROIs). The
spatial distribution of the ROIs is analyzed with the goal of identifying the most
sensitive ROIs for breast cancer classification. The results demonstrate that he
ROIs located at bright regions in the tumor, which are usually close to the tumor
boundaries, are more sensitive to the tumor class than the ROIs located in the dark
regions.
An improved method for breast tumor classification is introduced. In this method,

the ROIs located at bright regions of the tumor are analyzed to extract texture
feature. Then, a well-trained classifier is used to classify the individual ROIs
based on their texture features into two classes: benign and malignant. A voting
mechanism in then applied to combine the classification results obtained from the
individual ROIs to determine the class of the tumor. The results indicate that the
accuracy and sensitivity achieved with our proposed classification method
outperform conventional texture, morphological, and combined texturemorphological methods.
1.4 Thesis Organization

The rest of this thesis is organized as follows. In Chapter 2, a review of previous studies
in fields related to ultrasound images classification is presented. Chapter 3 provides
detailed distribution about the proposed method for classifying breast tumors using
texture features extracted from the individual ROIs. In Chapter 4, the classification results
obtained using the proposed methods as well as conventional texture, morphological, and
combined texturemorphological methods are presented. Finally, the conclusions and
future directions are provided in Chapter 5.
CHAPTER TWO
LITERATURE REVIEW
In this chapter, we present a review of previous studies in four fields related to

ultrasound-based tumor classification. In particular, Section 2.1 presents an overview of
the image preprocessing techniques that can be used for enhancing breast ultrasound
images. Section 2.2 provides a brief summary the segmentation algorithms that can be
used to outline the tumor. Section 2.3 presents a review about the methods used to extract
features from ultrasound images. Finally, the ligature review about ultrasound-based
breast cancer classification is given in Section 2.4.
2.1 Preprocessing algorithms for enhancing ultrasound images

Ultrasound is the most commonly used imaging modality in the medical field. Despite of
the fact that ultrasound images usually are of low contrast and have a lot of noises and
speckles [15], one from each four studies in medical imaging is done in the ultrasound
field [16]. Ultrasound is widely used in diagnosis due to its affordability, safety,
portability, adaptability, non-invasiveness, and nonionizing radiation [17]. One major
limitation of ultrasound images is the speckle interference pattern which degrades the
quality of the images, reduces the spatial and contrast resolution, obscures the underlying
anatomy, and degrades the diagnosis accuracy [17]. Moreover, speckle decreases the
performance and the accuracy of image processing tools such as segmentation,
classification, and registration [18].
Noises could have additive natures such as thermal and electronic noises, or could have
multiplicative natures. Speckle is a form of noises that have multiplicative nature, and its
pattern is formed based on constructive or destructive interferences of the backscatters
echoes. It has been known that the additive nature noises, such as thermal and electronic
noises, are trivial and could be negligible compared to the multiplicative speckle noise
[16]. The common distribution model that is used to model speckle is Rayleigh
distribution [19]. Due to the limited dynamic ranges of the monitors, the envelop-detected
echo signal is compressed with logarithmic transformation. After logarithm
transformation, the multiplicative speckle noise model is converted to additive noise
model, so that the filters used for additive noise can be applied to logarithm-transformed
ultrasound images [19].
Speckle reduction might cause the losing of important details in the ultrasound image.
Powerful speckle reduction techniques are designed to suppress speckle and preserve the
important information and edges in the image [20]. Speckle reduction methods are
divided into three groups: (1) filtering methods; (2) wavelet methods; and (3)
compounding approaches [6].
Filtering techniques can be grouped into linear and
nonlinear filters. The linear filters are divided into: (1) mean filter and (2) adaptive mean
filter. In the mean filter, each pixel is replaced by the average intensities of its
neighborhood pixels; this filter leads to smoothing and blurring the image .The mean
filter is suitable for additive Gaussian noises. Because of the fact that the speckle noise is
multiplicative noises, the mean filter is not good approach in such cases [6]. On the other
hand, the nonlinear filters include the median filter, Wiener filter and Anisotropic
Diffusion Filter. The Median filter is used when impulsive noise affected the image [6].
The median filter preserves the edges and produces less blurring image. However, the
drawback of using the median filter is the extra computation time that is needed to sort
the intensity value for each set [18]. The Wiener filter aims to decrease the amount of the
noise by comparing it with noiseless signal. It performs smoothing of the image based on
local variance computation. The smoothing is less when the local variance is large, while
the Wiener filter performs more smoothing when the variance is small. The result of the
Wiener filter is better than the linear filter as it preserves the edges and the important
information of the image, but it needs more computation time than linear filter [18]. The
Anisotropic Diffusion Filter is firstly proposed by Perona and Malik [20]. It has been
considered as one of the most popular filtering techniques used for imaging [20]. It
removes the speckles and enhances the edges at the same time. Anisotropic diffusion is
suitable with additive Gaussian noise and difficult with multiplicative noise. The filter
techniques are simple and fast, but they are sensitive to the size and shape of the filtering
window. The smoothing of the filter will decrease and the speckle noise cannot be
reduced effectively if the window size is too small, while over-smoothing will occur if the
window size is too large. Considering the shape of the window, the square window,
which is the most common window shape, will lead to corner rounding of the rectangular
features in the image [6].
2.2 Ultrasound Image Segmentation

Ultrasound image segmentation is one of the most difficult tasks in image processing and
pattern recognition. In breast cancer, the goal of segmentation is to determine the
boundaries of the tumor. Many approaches were proposed in the literature that address
ultrasound images segmentation.
The active contour model (ACM), known as snake, is a segmentation algorithm that
works by minimizing the energy associated with the current contour. The contour energy
is composed of internal and external energies. The snake model modifies its shape
iteratively to approximate the desired contour. During the deformation process, the force
is calculated from the internal energy and external energy. The external energy that
derived from image features and aims to converge the contour of the desired object
boundary. The internal energy is derived from the contour model and aims to control the
shape and the regularity of the contour [5]. The ACM algorithms can be categorized into
three classes: edge-based, region-based, and hybrid models.
Edge-based approaches achieve stable segmentation results when image gradients are
given with small variations. However, if the target boundary is not well-defined or
contains weak parts, the edge-based active contours could converge to wrong solutions.
To solve the above problem, region-based ACM algorithms have been proposed. The
target of these algorithms is to represent the region inside and outside the evolving curve
with region descriptors, such as probability distribution and local mean intensities. Active
contour without edges (ACWE) that was proposed by Chan and Vese Ref [21], is an
example of region-based ACMs. It works well with blurry boundaries and homogeneous
intensity. However, this algorithm achieves limited performance when the image contains
inhomogeneous intensities. To overcome this problem, active contour models based on
local statistics are proposed. Besides, region-based models with local statistics are usually
very sensitive to the size of the local window.
Hybrid approaches, such as phase-based active contour (PBAC) model, are proposed to
overcome problems that produced by the two previous approaches [1]. The PBAC model
integrates both the boundary and region information. Such a hybrid model can attract the
curve to strong edge points owing to the presence of the edge-based term while evolving
the curve by using the region-based term when the edge information is weak. In the study
presented in [1], the region-scalable fitting (RSF) energy model is used to model the
region-based energy in the image. Such an approach is robust against intensity
inhomogeneity, and unlike the traditional edge indicator that are based on the intensity
gradient magnitude, the RSF energy employs phase asymmetry to form a new phasebased edge indicator that is invariant to intensity and independent of image contrast [1].
A Multi-scale geodesic active contour for ultrasound image segmentation is proposed in

[22]. The algorithm is composed of two major steps. The first step corresponds to speckle
reduction by the Speckle reducing anisotropic diffusion (SRAD) pre-processing algorithm
[23]. Specifically, the SRAD is used to construct an image series of three scales for each
speckled image. As in conventional multi resolution approaches, the images at coarse
scales are down sampled to reduce their spatial resolution. In the second step, multi scale
geodesic active contours are employed to capture the object contour progressively. After
the user specifies an initial contour, the segmentation starts from coarse to fine scales
along the image series. During the evolution, the final contour from a coarse scale is
interpolated and passed to next finer scale as initial contour. In addition to that, the
boundary shape similarity between different scales is used as an additional constraint to
guide the contour evolution. By incorporating this boundary shape similarity constraint
into traditional geodesic active contour (GAC) model within a multi scale framework,
they can successfully segment the target objects in ultrasound images with both high
speckle noise and low image contrast [22].
In addition to ACM methods, graph-based (GB) segmentation algorithms have been

proposed. GB algorithms become a hot research topic due to its simple structure and solid
theory. Based on the advantages of graph theory, the graph-based image segmentation
methods aim to group the neighboring pixels that have similar intensities into one
minimal spanning tree (MST), which corresponds to a homogeneous region of the image.
An optimized graph-based segmentation algorithm for ultrasound images is proposed by
Qinghua Huang et al [24]. In this study, the particle swarm optimization (PSO) algorithm
is chosen for finding the optimal parameters in the GB algorithm.
The artificial neural network (ANN) has been widely used in medical image
segmentation. Examples of the segmentation approaches that are based on ANN include
the Multilayer perceptron (MLP), self-organizing maps (SOM), Hopfield, and pulse
coupled neural networks [25]. The SOM is one of the best neural networks used for the
segmentation task. This neural network is unsupervised and employs competitive learning
to discover the topological structures hidden in the input image. Two advantages of the
SOM-based segmentation methods are the unsupervised training and fast learning. The
disadvantages of the segmentation methods that use this neural network are: (1) they need
high dimensional input space with empirical features for the best performance, and (2)
they cannot segment images with heavy noise successfully [25].
A Bayesian neural network (BNN) is used for image segmentation. At first, a radial
gradient index filtering technique was used to locate the ROIs in the image, where the
centers of the ROIS are recorded as points of interest. A region growing algorithm was
then used to determine candidate lesion margins. The lesion candidates were segmented
and detected by the BNN. However, the algorithm would fail if the lesion was not
compact and round-like [5].
2.3 Feature Extraction and Selection

Feature extraction and selection is a key step in breast cancer classification and diagnosis.
The set of features should be selected carefully as it will affect the classification
performance. An important step is to reduce the redundancy of the feature space to avoid
the curse of dimensionality. The curse of dimensionality states that the sampling
density of the training data is too low to promise a meaningful estimation of a high
dimensional classification function with the available finite number of training data.
In artificial neural network (ANN) and support vector machine (SVM) algorithms, the
training time of the algorithm and the performance of the classification are highly
affected by the dimensionality of the feature space. Thus, the selection and the extraction
of a suitable feature combination is a crucial task in computer aided diagnosis (CAD)
systems.
Four important factors should be considered to select a good feature
combination: discrimination, reliability, independence and optimality [5]. The most

common features that are used for ultrasound image classification are texture features and
morphological features.
Most of the texture features are calculated from the entire image or ROIs based in pixel
gray level values [5]. An auto-covariance coefficient is a basic and traditional texture
feature which can reflect the inner-pixel correlation within an image. The variance, autocorrelation, or average contrast feature is defined as the ratio of the variance, autocorrelation coefficients, or intensity average inside the lesion to that outside the lesion.
The larger the ratio is, the lower the possibility of the tumor being malignant. Distribution
distortion of wavelet coefficients feature is defined as the summation of differences
among the real distribution of wavelet coefficients in each high-frequency sub-band and
distribution of the expected Laplacian distribution, this feature can reflect the margin
smoothness [5].
The posterior acoustic behavior, minimum side difference (MSD), or posterior acoustic
shadow feature is defined as the difference between the gray scale histograms of the
regions inside the lesion and posterior to the lesion. Homogeneity of the lesion feature is
the Boltzmann/Gibbs entropy over the gray scale histogram relative to the maximum
entropy. The higher the entropy is, the more homogeneous the lesion is [5].
Unlike texture features which are extracted from the rough ROIs, the morphologic
features focus on some global characteristics of the lesion, such as the shape and margin
[5]. The speculation feature is the ratio of low-frequency component to high-frequency
component; the larger the value is the lower possibility of the tumor being malignant is.
The depth to width ratio (or width to depth ratio) feature is one of the most effective
distinguishing features mentioned in many papers. Malignant lesions tend to have the
width-to-depth ratio bigger than 1, while benign lesions usually have this ratio to be
smaller than 1. The branch pattern feature is the number of local extremes in the lowpass-filtered radial distance graph. Malignant lesions tend to have high values of the
branch pattern feature. The margin sharpness, margin echogenicity, and angular variance
in margin features are computed by dividing the lesion into N sectors and comparing the
mean gray levels of the pixels in the inner and outer shells in each sector. The margin
echogenicity feature is the mean gray level difference of the inside and outside of the
sector. The three features described above (margin sharpness, margin echogenicity,
angular variance) have been proven to be significantly different by the student t-test when
they are used to distinguish benign and malignant lesions. To get the number of
substantial protuberances and depressions (NSPD) feature, the breast lesion is delineated
by a convex hull, and concave polygon. Ideally, a malignant breast lesion has a larger
NSPD value. The elliptic-normalized circumference (ENC) feature is defined as the
circumference ratio of the lesion to its equivalent ellipse, and it represents the
anfractuosity of a lesion which is a characteristic of malignant lesions Malignant lesions
tends to have a higher value of the ENC feature. The elliptic-normalized skeleton (ENS)
feature is defined as the number of skeleton points normalized by the circumference of
the equivalent ellipse of the lesion. The computation cost of this feature is relatively high.
Malignant lesion tends to have a higher value of ENS. The NSPD, LI, ENC and ENS
features capture mainly the contour and shape characteristics of the lesion. The long axis
to short axis ratio (L:S) feature is the ratio of the long- to short-axes, where the long and
short axes are determined by the major and minor axes of the equivalent ellipse.
Therefore, the L:S ratio is different from the traditional depth/width ratio because it is
independent of the scanning angle. The area of lesion is considered as a morphologic

feature. For both the long-axis-to-short-axis-ratio and lesion-area features, the larger the
value is the more likely the lesion is malignant. The normalized radial gradient (NRG)
along the margin feature is used to measure the average orientation of the gray level
gradients along the margin. The margin circularity feature is defined as the standard
deviation of the normalized distances of every boundary point to the ROI's geometric
centers. A high value of the margin circularity feature is a sign of malignancy. The degree
of abrupt interface across lesion boundary feature is designed to estimate the degree of
abrupt interface across the lesion boundary, where the likelihood of malignancy decreases
as the value of this feature increases . The angular characteristic feature is defined as the
numbers of local maxima in each lobulate area. High values of the Angular characteristic
feature indicate high probability of malignancy [5].
2.4 Breast Cancer Classification

After the extraction of the ultrasound-based features, a classifier is usually employed to
analyze the features and classify the tumor as benign or malignant. There are many
classification algorithms used to accomplish this task. Examples of these classifiers are
the linear classifiers, artificial neural networks, and support vector machines (SVMs) [5].
An interesting study about breast tumor classification is presented in [22]. In this study, a
hybrid segmentation method, which combined the gradient vector flow (GVF) and
geodesic active contour (GAC) models, is used to outline the tumor region in the
ultrasound image. A set of six novel features are extracted from the ultrasound image to
quantify the texture, region, and shape of the tumor. These six features, which are the
eccentricity, solidity, deference area hull rectangular, deference area mass rectangular,
cross correlation left, and cross correlation right, were analyzed using an SVM classifier
to characterize the tumor as benign or malignant. The experimental results show that the
classification method achieved an accuracy of 95%, sensitivity of 90.91%, specificity of
97.87%, positive predictive value of 96.77%, negative predictive value of 93.88%, and
Matthews correlation coefficient of 89.71%.
In another study [26], a genetic algorithm was combined with SVM to classify ultrasound
breast images. The ultrasound images were preprocessed by applying an anisotropic
diffusion filter and binary thresholding to obtain a binary image. The binary image was
combined with the original image and processed using a level set algorithm to segment
the tumor. A set of auto-covariance texture features and morphologic features were
extracted from the combined image. These features were analyzed using a genetic
algorithm to determine the significant features and the near-optimal parameters for an
SVM, which was used to identify the tumor as benign or malignant. According to the
experimental results reported in the study, the accuracy of the proposed system for
classifying breast tumors was 95.24%.
A novel ultrasound-based diagnostic method for breast cancer classification was proposed
in [27]. The method, which is called a fuzzy cerebellar model neural network (FCMNN),
is based on intelligent classification to distinguish benign and malignant breast tumors. In
this method, the auto covariance textural features were extracted from the ultrasound
image.
Following the feature extraction, a fuzzy cerebellar model neural network
classifier (FCMNN) is implemented. The FCMNN can be viewed as a generation of a
fuzzy neural network that incorporates a learning mechanism imitating the cerebellum of
a human being. In contrast to a fuzzy neural network, the FCMNN is structured with
layers in the input space. It is often referred to as an associative neural network, where
only a small subset addressed by the input vector determines instantaneous output. The
FCMNN has several advantages such as good generalization and rapid learning speed and
convergence. The accuracy of the proposed FCMNN classifier was between 90% and
92%.
Huang et al. [28] applied Shearlet theory for breast cancer classification. Shearlet theory
is composite wavelets version of the traditional wavelet transform, which is designed to
identify anisotropic and directional information at deferent scales. Shearlet has strong
localization property and directional sensitivity. After decomposition the ultrasound
image by shearlet, the shearlet-based texture features were extracted from the horizontal
and vertical cone as following: (1) the entropy, correlation, contrast and angular second
moment calculated from the first layer horizontal shearlet coefficients; (2) the mean,
variance, energy from both the first and third horizontal and vertical layer coefficients; (3)
the maximal shearlet coefficients in each column of the coefficient Metrix; and (4)
Adaptive Boosting (AdaBoost) algorithm was used to distinguish the benign tumors from
malignant tumors. The results show that the classification accuracy, sensitivity,
specificity and precision of shearlet based method are 88.0%, 0.839, 93.2% and 94.0%,
respectively.
A computer-aided diagnosis (CAD) system for classifying breast masses based on

quantified findings of the American College of Radiology, Breast Imaging Reporting, and
Data System (BI-RADS) [29] was proposed in [30]. In this study, the tumor was
segmented using a level set method as follows. At the beginning, a sigmoid filter was
applied to the image to improve the its contrast, and then the gradient magnitude filter
was applied to compute the gradient of the image. Then, the sigmoid filter was applied
again to the gradient magnitude image for contrast enhancement. Finally, the level set
method, which was proposed to model a complex shape with changing topology, was
applied to for outlining the contour of the tumor. After image segmentation, six
quantitative features sets, including shape, orientation, margins, lesion boundary, echo
pattern, and posterior acoustic features, were extracted from the ultrasound image. In this
CAD system, any tumor with one or more malignant findings was classified as malignant.
Only the tumors that had no malignant findings and had at least one benign finding were
classified as benign. Based on the experimental results, the area under the curve of the
proposed CAD system were 0.96.
The Gomez et al. [31] combined the co-occurrence texture statistic features with gray
level quantization to classify breast ultrasound images. In their study, 22 texture features
were extracted from the gray level co-occurrence matrix (GLCM) at four different
orientations (0, 45, 90 and 135) and ten distances (1 pixel, 2 pixel,.10 pixels) . All
these calculations were performed for six gray level of quantization (8, 16, 32, 64, 128,
and 256) of the ultrasound image. The study included 436 breast ultrasound images. To
reduce the dimensionality of the feature space, the texture descriptors were averaged over
all orientations for the same distance. Moreover, the feature space was ranked using the
mutual information with minimal redundancy maximum relevance (MI-mRMR)
technique. The classification of the features was performed using the Fisher linear
discriminant analysis (FLDA). The study investigated the effect of quantization on the
classification performance, with the goal of determining the most useful texture features
and the effect of averaging the GLCM features to reduce the feature space dimensionality.
The results show that: (1) the averaging decreases the performance; (2) the results
obtained using the GLCM features without averaging indicate that the orientation of 90
and the distances larger than 5 pixels achieve have good classification of breast lesions;
(3) among 22 texture features, there are nine features that appeared repeatedly
independently of the GLCM orientations and distances, overall quantization levels, where
these nine features quantify the contrast, correlation cluster prominence, cluster shade,
difference variance, and inverse difference moment; (4) the gray-level quantization does
not improve or worsen the discrimination power of texture features. The achieved area
under the ROC curve (AUC), accuracy, sensitivity, specificity, positive predictive value,
and negative predictive value were equal to 0.87, 83.05%, 87.02%, and 88.11%, 86.82%,
and = 80.10%, respectively.
The study reported in [32] proposed a CAD system based on gray-scale invariant features
via ranklet transform. The Ranklet transform is an image process method that is
characterized by multi-resolution and orientation-selective approach, which can be used
as a rank descriptor of the pixels within a local region. It deals with the rank of the pixels
rather than their gray-scale intensity values as in the wavelet transform. In this study, the
ultrasound image is decomposed into ranklets based on multi-resolution and orientationselective properties of the ranklet transform. The gray-scale invariant GLCM texture
features were extracted from the transformed image. These features were calculated at
deferent resolutions (scales) and deferent orientations. Finally, an SVM classifier was
used to classify the tumor as benign or malignant. To evaluate the robustness and
effectiveness of the proposed method, an extensive experimental evaluation was
performed, in which three ultrasound image databases acquired with three different
sonographic breast ultrasound platforms were classified. The achieved area under the
curve for the three ultrasound image databases obtained via ranklet transform are 0.918,
0.943, and 0.934, respectively.
Shankar et al. [6] applied spectral analysis for breast cancer classification. In particular,
the breast ultrasound images were analyzed to estimate the compound probability density
function (PDF) that model the tumor (healthy and malignant) in the ultrasound image. A
combination of Nakagami parameter with the K distribution (that is sensitive to the
presence of speculations), was used to model the breast lesions in the ultrasound image.
The estimated PDFs were then employed to classify the tumors in breast ultrasound
images. Using this technique, good tumor classification was achieved with an area under
the ROC curve of 0.955.
Alam et al. [33] employed a multi features approach for breast tumor classification. In
particular, a combination of quantitative acoustic parameters, calculated using spectrum
analysis of ultrasound radio-frequency (RF) echo signals, and morphometric features,
computed based on the lesion shape, has been used to classify the solid breast lesions in
breast ultrasound images. The acoustic features contained echogenicity, heterogeneity
and shadowing. The morphometric features included area, location, aspect ratio and
boundary roughness of the lesions. The classification analysis reported in the study was
based on the logistic regression (LR) For 130 patient cases; the LR-based analysis
achieved an area under the ROC curve of 0.9470.045.
An innovative approach for breast cancer classification has been proposed in [34] based
on time series analysis of the ultrasound radio-frequency (RF) signals received from the
tumor. In this approach, a set of features were extracted from the time series of ultrasound
RF signals. These features were then classified using a machine learning framework. The
features extracted using the RF time series analysis, single-frame RF spectral analysis,
and B-mode texture were ranked based on their importance using two different feature
ranking approaches. In both feature-ranking algorithms, all the three best performing
features were from the RF time series group. The RF time series features were processed
using two classifiers: the SVM and the Random Forests. Using the best three RF time
series features, accurate breast tumor classification can be achieved with areas under the
curve of 0.86 and 0.81 using the SVM and Random Forests classifiers, respectively The
study indicated that the ultrasound RF time series features can enable breast cancer
classification using a small set of raw ultrasound data without the need for additional
instrumentation.
CHAPTER THREE
MATERIAL AND METHODS
In this chapter, we describe our contributions in the field of breast cancer classification
using ultrasound images. Section 3.1 describes the data collection process and the
ultrasound image database used in the analysis. Sections 3.2 and 3.3 provide a description
of extended sets of morphological features and texture features, respectively, that have
been suggested in previous studies for breast cancer classification. These features are
implemented and applied in this thesis to enable breast cancer classification as described
in Section 3.4. In Section 3.5, our improved breast cancer classification algorithm is
proposed. In this algorithm, the tumor is divided into non-overlapping ROIs. The spatial
distribution of the ROIs is analyzed and employed to select the ROIs for tumor
classification. The selected ROIs are then classified individually and combined using a
voting mechanism to determine the class of the tumor.
3.1 Data acquisition

The image database consists of 105 breast ultrasound images of pathologically proven
benign and malignant tumors (64 benign and 41 malignant). The ultrasound images were
acquired with an Acuson S2000 system (Siemens AG, Munich, Germany) equipped with
an 14L5 linear transducer. The image size was 418 566 pixels and the pixel size was
100 m 100 m. For each image, the tumor was manually outlined by an experienced
radiologist. Informed consent to the protocol was obtained from each patient.
3.2 Morphological Features

In this section, we will describe an extended set of 19 morphologic features that have
been reported by several previous studies for breast cancer classification. The
morphological features quantify the characteristics of the lesion, including the shape,
margin, orientation, and backscattered echo from the breast tissue.
3.2.1 Tumor Area and Perimeters

These are two basic morphological features that are extracted from the tumor outline to
measure the tumor area and tumor perimeter [30].
3.2.2 Form Factor

The Form Factor feature is defined as the ratio between the area and the perimeter of the
tumor, and can be computed as follows [26]:-
Form Factor =
(1)
3.2.3 Roundness
The Roundness is defined as the ratio between the area and the maximum diameter of
tumor, and computed be expressed as [26]:
Roundness =
(2)
The maximum diameter is defined as maximum distance between two pixels in the tumor
boundary that passes along the center of the tumor as shown in Figure 3.1.
Figure 3.1 The maximum and minimum diameters of a tumor.
3.2.4 Aspect Ratio

The Aspect Ratio is defined as the ratio between the maximum and the minimum
diameters of the tumor are computed as follows [26]:
Aspect_Ratio =
(3)
The minimum diameter is defined as minimum distance between two pixels in the tumor
boundary that passes along the center of the tumor as shown in Figure 3.1.
3.2.5 Convexity
The Convexity is defined as the ratio between the convex hull perimeter and the tumor
perimeter, and is computed using following formula [26]:
Convexity =
(4)
The convex hull is defined as the area of the smallest polygon that includes the tumor, as
illustrated in Figure 3.2.
Figure 3.2 The Convex Hull that includes the tumor.
3.2.6 Solidity
The solidity is a scalar quantifying the proportion of the pixels in the convex hull that are
also in the tumor area [26], and it can be computed using following formula:-.
Solidity =
(5)
where Convex_Area is the area of the convex hull and N is the number of tumors in the
database.
3.2.7 Extent
The Extent feature is a scalar that quantifies the ratio of the tumor area to the area of the
smallest bounding box that includes the tumor [26]. This feature is computed as follows:
Extent =
where
(6)
is the area of the smallest rectangle that contains the tumor as
shown in Figure 3.3.
Figure 3.3 The Bounding Box.
3.2.8 Best Fit Ellipse Features

The Best Fit Ellipse is the ellipse that has a normalized second central moments that
matches the region of the tumor [30]. Figure 3.4 illustrates the best fit ellipse.
Figure 3.4 Best Fit Ellipse.
Many features can be extracted from Best Fit Ellipse as listed below:
The length of the major axis of the ellipse, as shown Figure 3.5.
The length of the minor axis of the ellipse, as shown Figure 3.5.
The ratio between the major and minor axes.
The ratio of the ellipse perimeter to the tumor perimeter.
Ellipse compactness: defined as the overlap region between the tumor and ellipse.
Ellipse theta: is the angle of the major axis of the ellipse, as illustrated in Figure
3.5.
Figure 3.5 the major and minor axes of the Best Fit Ellipse and Theta.
3.2.9 The Normalized Radial Length (NRL) Features

The normalized radial length (NRL) of the tumor quantifies the Euclidian distances
between the center of the tumor and the pixels at the tumor boundary divided by the
maximum radial length [30, 35]. In this study we extracted three NRL features, as
described by following formulas:NRL mean
where
NRL =
(7)
is the jth normalized radial length
NRL entropy :
where
ENRL = -
* log2(
(8)
(9)
NRL variance :
NRL =
( - NRL )
(10)
where N is the number of pixels located at the tumor boundary and H is the number of
NRL probability values.
3.2.10 The Compactness

The compactness describes the relation between the area and perimeter of the tumor. It is
calculated as using below formula:
Compactness = 1 -
(11)
The compactness of the circle is equal to 0. The maximum compactness value, which is
computed for complex shapes, is equal to 1.
3.2.11 The Undulation Characteristics

The distance map can be used to quantify the angular characteristics of the tumor [37].
The computation of the distance map is performed by defining a neighborhood region, N8
(P), around each pixel, P. The neighborhood region is composed of eight pixels as
illustrated in Equation 12 and Figure 3.6:
N8 (P) = {(x-1, y-1), (x, y-1), (x+1, y-1), (x-1, y), (x+1, y), (x-1, y+1), (x, y+1), (x+1,
y+1)}
(12)
Figure 3.6 The eight neighborhoods around pixel P(x, y).
The value of the distance map at a pixel P is iteratively computed as shown in the
following equation:
Distance (P) = Min {Distance (N8 (P))} + 1
(13)
Figure 3.7 (b) and Figure 3.7 (c) illustrate the distance map computation for the tumor
shown in Figure 3.7 (a) [37]. After computing the distance map, the maximum inscribed
circle located inside the tumor is identified. By computing the maximum inscribed circle,
the lobulate areas included in the tumor and excluded from the inscribed circle are
identified, as shown in Figure 3.7 (d). However, only the large lobulate areas should be
considered. Therefore, if the maximum distance within a lobulate area is less than four
pixels, the area should be ignored. Finally, the number of significant lobulate areas is
defined as the undulation characteristics features (MU) as shown in Figure 3.7 (f).
These figures are adopted from [37]. Figure 3.7 (a) A malignant lesion. (b) The distance map of
(a). (c) The distance map of mass is represented by the gray scales. The black color denotes the
farthest distance to the lesion boundary. (d) Five undulation characteristics are explored. (e) The
local maxima could be used to detect the angular characteristics. (f) Three angular characteristics
are explored.
3.3 Texture Features

In this section, we will describe an extended set of texture features that are used in this
thesis for breast tumor classification. Below is a brief description for each feature.
3.3.1 The Lesion Boundary

The lesion boundary (LB) is defined as the average intensity difference between the inner
band and the outer band around the tumor boundary [30]. Figure 3.8 illustrates the inner
and outer bands of a tumor. The inner band calculated at a distance k is composed of all
pixels that are located inside the tumor and separated by a distance k from the tumor
boundary. Similar, the outer band at a distance k can be computed for the pixels located
outside the tumor and located within a distance k from the tumor boundary. In this thesis,
the inner and outer bands are computed using a distance k of 3 pixels. This value of k has
also been employed by a previous study [30].
Figure 3.8. The Inner and Outer bands of a tumor computed at a distance k = 3.
The average intensity for the inner and outer bands can be calculated as below:
(14)
where I(P) is the gray-level intensity of pixel P,
(15)
and
are the number of pixels in
the inner and outer bands, respectively. Finally, the lesion boundary LB feature is defined
as below:LB = avg_outer avg_inner
(16)
3.3.2 Posterior Acoustic Characteristic

First we calculate the Echo Pattern (PE) feature of the tumor. The echo Pattern is defined
as the average gray-level intensity of the tumor [37], and it can be calculated as below:EP =
(17)
where NR is the number of the pixels in the tumor, and I(P) is the gray-level intensity of a
pixel P located inside the tumor. To find the posterior acoustic features (PS), the region
under the tumor is found. To find the area under the tumor we find the posterior area
width (pw), which is equal to two thirds of the tumor mass width (mw), and posterior area
height (ph) which is equal to the tumor height (mh), but should not exceed 100pixels
[30]. The posterior area is shown in Figure 3.9. Then, the average gray-level intensity of
the posterior acoustic area is defined as:
(18)
where NPA is the number of the pixels in the posterior acoustic area, and I(P) is the graylevel intensity of posterior acoustic pixel P.
The difference between Eq. 17 and Eq. 18 is used to calculate the posterior acoustic
characteristic:PS =
EP
(19)
Figure 3.9. A tumor with width mw and height mh and the posterior acoustic area with height ph.
3.3.3 The Contrast Feature

The Contrast Feature (EPC) is defined as the average intensity difference between the
25% brighter pixels in the tumor and average intensity of the entire tumor [30]. The
average gray intensity of the brighter pixel group is defined as:
(20)
Where I (P) is the gray-level intensity of a pixel P located inside the tumor and NBP is the
number of the 25% brighter pixels, k is a dynamic threshold, for example, when the
threshold is set to k=51, the group of brighter pixels contain 28.22% of tumors pixels.
The contrast feature, EPC, is defined as:EPC =
Where EP is defined in Equation (17).
(21)
3.3.4 The Surrounding Tissue

First, we calculate the average intensity of the surrounding tissues at distance k = 10 using
the following formula [37]:
(22)
where I(P) is the gray-level intensity of pixel P, NTissue is the number of pixels in the
surrounding tissues at a distance k = 10.
The average intensity difference between the surrounding tissues and the region under the
tumor is calculated as below:PSDiff =
Where
(23)
is defined by Eq. 18
And the average intensity difference between the tumor and the surrounding tissues is
defined by:EPDiff =
EP
(24)
Where EP is defined in Equation (17).
3.3.5 The Gray-Level Co-Occurrence Matrix (GLCM) Texture Features

The GLCM is a matrix that is generated over an image or an ROI to quantify the
distribution of co-occurring values at a given offset. GLCM calculated how many times a
pixel with gray-level (grayscale intensity) value i occurs to a pixel with the value j. In
GLCM we can specify the pixels spatial relationships using the offsets. Offsets specify
the distance between the pixel and its neighbor. Offset is often used in conjugation with
the angle theta . The angle determines the direction that the calculation will be done at
specific distance d. The value of can be one of four angles: 0, 45, 90 and 135, as
shown in Figure 3.10. The number of grays levels is another parameter that used to
calculate GLCM. The number of grays levels is the value that used to determine the
number of gray-levels that will be used when scaling the grayscale values of the input
image I. For example, if the number of level is set to 8, then we scale the values of
intensities of the image I to be between 1 and 8. In addition to that, the number of graylevels determines the size of the gray-level co-occurrence matrix that will be generated,
so if we set the number of level equals to 8 then, a 8x8 matrix will be generated. Finally,
the intensity limit parameter determines the maximum and minimum intensity that will be
used when dividing the intensities between levels. The default values for the intensity
limits is the maximum and minimum intensity in the image. A simple example of
computing the GLCM matrix is presented in Figure 3.11 Ref [39]. Element (1,1) in the
GLCM contains the value 1 because there is only one instance in the image where two,
horizontally adjacent pixels have the values 1 and 1. Element (1,2) in the GLCM contains
the value 2 because there are two instances in the image where two, horizontally adjacent
pixels have the values 1 and 2.
Figure 3.10. 3x3 matrix shows the four directions used in GLCM at distance 1 pixel.
Figure 3.11 shows how GLCM is calculated for the 4-by-5 image I.
In fact, the GLCM matrix represents the joint frequencies of all combinations of gray
levels i and j that are separated by a distance d and along the direction . The GLCM can
be defined as below [31]:
where
{[
and
]}
(25)
are two pixels in an ROI, I (.,.) is the gray-level of the pixel,
and . is the number of the pixel pairs that satisfy the conditions.
The GLCM is used to calculate texture features that quantify the pixel statistics In fact, a
set of GLCM texture features can be extracted for each GLCM matrix computed at each
combination of direction and distance using the expressions given in Table I and Table II
[31]:-
Feature
Table I
TEXTURE FEATURES EXTRACTED FROM GLCM
Equation
Autocorrelation
Contrast
Correlation I
Correlation II
Cluster Prominence
Cluster Shade
Dissimilarity
Energy
Entropy
Homogeneity I
Homogeneity II
Maximum probability
Sum of squares
Sum average
Sum entropy, F
Sum variance
Difference variance
Difference entropy
Information measure of correlation I
Information measure of correlation II
Inverse difference normalized
Inverse difference moment normalized
Table II
NOTATION AND EXPRESSION USED FOR CALCULATING the GLCM
FEATURES
Notation
Meaning
th entry of the co-occurrence probability matrix.
Gray-Level quantization.
Mean value of
HX
HY
HXY
HXY1
HXY2
In this thesis, the 22 texture features defined in Table I and ( ,
defined in
Table II were calculated for each ROI at four different directions ( = 0, 45, 90, 135),
four distances (d=1, 2, 3, 4 pixels), and one quantization level of L= 32. Using this
configuration, a set of 400 GLCM features was extracted from each ROI. At ROI
boundaries, the used distance can affect the result of GLCM, as the GLCM ignores the
pixels outside the ROI boundaries from the calculation.
To generate the textures features that is listed in Table I and Table II. First, we calculated
the GLCM matrix at four distances, d = (1, 2, 3 and 4) pixels. For every distance, four
directions of theta, = (0, 45, 90 and 135), were used. The number of levels is set to
32, so we scale the values of intensities within the image I to be between 1 and 32. Now,
suppose that we have two images t and s, image t has a maximum and minimum intensity
values equals to 194 and 2 respectively, and image s has a maximum and minimum
intensity values equals to 234 and 10 respectively. In the first image t, every level size
will contain 6 intensity values, for example the intensity of values from194 to 189 will be
at level number 32, and level number 31 will contain the intensity values from 188 to 183,
and so one until the first level that will contain the intensity of values from 7 to 2. For the
second image s, every level size will contain 7 intensity values, for example the intensity
of values from 234 to 229 will be at level number 32, and level number 31 will contain
the intensity values from 228 to 223, and so one until the first level that will contain the
intensity of values from 16 to 10. To prevent such variations in intensity values that used
in each level, we set the intensity limits values that will be used in our calculation to 255
and 0. Thereafter, the extracted ROI is normalized to the gray-level range [0, 255] for
stretching the dynamic range of all images to the same scale. So that every level from the
32 levels will contain 8 gray intensity values, the first level has the values from 0 to 7,
and the last level will have the values from 248 to 255.
Based on above settings, we calculate the 25 GLCM features listed in Table I and Table
II. These features were calculated at four distances d = (1, 2, 3 and 4) at four different
direction = (0, 45, 90 and 135). A total of 400 (25x4x4) textures features were
extracted for every ROI in the ultrasound image.
3.3.6 Summary of the features
Based on reference [29], malignant tumors tend to have irregular shapes while benign
tumors tend to be oval shapes. Also, malignant tumors are not parallel in their orientation
while benign tumors have parallel orientation. Angular, speculated margins are usually
found in malignant tumors, and circumscribed, microlobulated margins are found in
benign tumors. Lesion boundaries are echogenic halo for malignant tumors, and abrupt
interface lesions boundaries for benign tumors. Malignant tumors have hypo-echoic
appearance, while benign tumors have hyper-echoic and anechoic appearance. Finally,
shadowing is found in malignant tumors, while there is no shadow below benign tumors.
Table III summarizes all features that are used in this thesis, 19 morphological features, 5
textures features, and 25 GLCM features.
TABLE III
SUMMARY OF ALL EXTRACTED FEATURES IN THIS THESIS
Category
Features
Morphology
Tumor area
Tumor perimeter
Form Factor
Roundness
Aspect Ratio
Convexity
Solidity
Extent
The length of the major axis of the ellipse
The length of the minor axis of the ellipse
The ratio between the major and minor axis
The ratio of the ellipse perimeter and the tumor
perimeter
Ellipse Compactness: the overlap between the
ellipse and the tumor
Texture
Ellipse Theta: the angle of the major axis of the

ellipse
NRL Mean: normalized radial length mean
NRL entropy: normalized radial length entropy
NRL variance: normalized radial length variance
Tumor Compactness
MU: undulations on the tumor boundary
LB: the average intensity difference between the
inner band and outer band
PS: the average intensity difference between the
tumor and the region under the tumor
PS Diff: the average intensity difference between
the surrounding tissues and the region under the
tumor
EPC: the average intensity difference between the
25% brighter pixels and whole tumor pixels
EP Diff: the average intensity difference between
the tumor and the surrounding tissues
GLCM Features
Autocorrelation
Contrast
Correlation I
Correlation II
Cluster Prominence
Cluster Shade
Dissimilarity
Energy
Entropy
Homogeneity I
Homogeneity II
Maximum probability
Sum of squares
Sum average
Sum entropy, F
Sum variance
Difference variance
Difference entropy
Information measure of correlation I
Information measure of correlation II
Inverse difference normalized
Inverse difference moment normalized
Mean value of
.
3.4 Tumor classification using the conventional morphological and texture

approaches
Before extracting the texture and morphological features of each tumor, the boundaries of
the tumor must be determined in advance. Therefore, the tumors in the ultrasound images
are segmented using different approaches as described in section 2.2. Image
preprocessing and level set methods are used for segmentation process in [30]. The
disadvantage of using automatic segmentation approaches is that the accuracy of
classification algorithm will be affected by segmentation algorithm accuracy. In addition
to automatic segmentation algorithms, manual segmentation by expert radiologist can be
used. The advantage of using manual segmentation is that the classification will be more
accurate. In this thesis we used manual segmentation algorithm. Each tumor in our study
was segmented manually by an expert radiologist. After segmenting the tumor, the best fit
ellipse and bounding rectangle which include the tumor were determined. After that, the
morphological features were extracted.
To extract the GLCM texture features, the region of interest (ROI), which is a small
image region that includes the tumor, is selected. The bounding rectangle that includes
the tumor can be used as ROI [26]. This bounding box, denoted here as region of interest
(ROI), was cropped for computing the GLCM features. The size of the ROI that includes
the entire tumor depends on the width and height of the tumor, so that the window size
that is used to calculate the GLCM is equal to the width and height of the tumor. Finally,
the GLCM texture features were extracted using the expressions given in Table I and
Table II.
After features extractions, a support vector machine (SVM) classifier is used to classify
the tumor as either benign or malignant. The SVM is a robust data classification
algorithm and has been used in many fields during last years [26]. The aim of an SVM is
to find a hyperplane to separate the training data with a maximal margin using a kernel
functions, such as radial basis kernel functions (RBF) [26]. The RBF is the most widely
used kernel. Redundant features can increase the computation time and affect the
classification accuracy. Thus, an efficient and robust feature selection method that
reduces the effect of noisy as well as irrelevant and redundant data is required [26]
3.5 Tumor Classification Using the Spatial Distribution of Ultrasound-based

Features
This section provides detailed explanation of the proposed method, which enables
accurate classification of breast tumors using the spatial distribution of ultrasound-based
texture features. The proposed method can be summarized as follows. First, the tumor is
divided into non-overlapping 1-mm2 ROIs. In this study, deferent sizes of ROI first used,
then the ROI size with best results was selected. The Each individual ROI is processed to
extract the GLCM features. The mutual information (MI) algorithm [31] is used to rank
all features and select the most important features that will be used for classification. The
individual ROIs are then classified using an SVM classifier. The spatial distribution of the
ROIs was analyzed with the goal of identifying the ROIs that are most sensitive to the
tumor class. Our analysis indicated that the ROIs located near the center of the tumor at
dark regions are less sensitive to the tumor class. Therefore, these ROIs are excluded
from the tumor classification phase. The ROIs located at bright regions in the tumor are
selected and combined to determine the class of the tumor. In particular, a voting
mechanism, similar to the voting mechanism proposed by Bergstra et al [38], was used to
combine the classification results of the individual ROIs and determine the class of the
tumor. Figure 3.12 summarizes the main steps of our proposed method.
Figure 3.12 A diagram showing the main steps of the proposed tumor classification method.
3.5.1 Texture Features Selection
The large number of features extracted from the tumor could have a large degree of
irrelevant and redundant information, which may reduce the accuracy of tumor
classification. An irrelevant feature does not contribute to distinguish data of different
classes and can be deleted without affecting the classification accuracy. On the other
hand, a redundant feature implies the co-existence of another feature with relevant
content, and hence the removal of one of them will not affect the classification
performance. To eliminate the irrelevant and redundant feature, a feature selection phase
is employed to remove the irrelevant and redundant features while maintaining acceptable
classification accuracy [31]. One of the most commonly used algorithms for feature
selection is mutual information (MI), which ranks the features in a manner that meets the
minimal-Redundancy-Maximal-Relevance (mRMR) criterion [40]. Pereira et al. [41] has
shown that MI is very helpful to rank the features extracted from breast ultrasound
images.
In this thesis, the MI is used to rank all features based on mRMR method, where the first
feature in the ranked features set has the maximum relevancy to the target class and the
last features in the set has the minimum relevancy to the target class. Mutual information
(MI) measures the degree of dependency between the features. The minimal redundancy
condition selects the features such that they are mutually exclusive to each other [31]. In
this study, we have selected the best 49 features to carry out the classification.
3.5.2 Voting technique and ROIs reduction
Bergstra et al. [38] proposed a voting-based algorithm that classifies musical genre and
artist from an audio waveform. Their work can be summarized as follow: For each
iteration t, the algorithm invoke a weak learning algorithm that returns a classifier h(t)
and computes its coefficient (t). The output of h(t) is a vector contains values of 1 and -1
over k classes. If h(t) = 1, then the classifier votes for the class , whereas h (t) = -1
means that the classifier votes against class . Then for all classes, the values of h(t) and
(t) are multiplied by each other. Then, the results from all iterations are added to each
other. Finally, the class that receives the maximum number of votes is selected as the
outcome of classification.
In this thesis, we used the concept of voting along with the spatial distribution of 1-mm2
ROIS to improve the classification accuracy. In particular, each tumor was divided into a
set of non-overlapping 1-mm2 ROIs. The ROIs that contribute to the tumor classification
were selected based on their spatial location. The selected ROIs are then combined using
a voting mechanism to accurately determining the class of the entire tumor. The steps of
the proposed algorithm are as follows. First, the textures features are extracted from all
individual ROIs inside the tumor region in the ultrasound image. Then, the MI algorithm
was applied to reduce the features space, from a total of 400 GLCM textures features that
have been extracted in previous steps to 49 GLCM texture features to represent each ROI.
Then, we perform a ten-fold-cross-validation by dividing the ROI features into two subdatasets. The first subset contains 90% of the data and will be used as training dataset,
and the second one (which contains the remaining 10% of the data) will be used for
validation. Both datasets include benign and malignant cases chosen using random
selection. Next, we trained the SVM using the training data set. After the training phase,
we used the validation dataset to evaluate the performance of the classifier. For every
tumor in validation dataset, we classified the ROIs for that tumor using the well-trained
SVM and we studied the ROIs that predict the correct tumor class and those that predict
the wrong class. The process of selecting 90% of the tumors for training and the
remaining 10% for validation has been repeated for 10 trials. As a result, we have noticed
that the ROIs that produced incorrect predictions are located in the dark regions of the
tumor, close to the center of the tumor. However, the ROIs that predict correctly the class
of the tumor are located in bright regions of the tumor, close to the tumor boundaries as
shown in Figures 3.13 and 3.14.
Figure 3.13. Shows the distribution of ROIs in malignant tumor that voted with the correct tumor
class.
Figure 3.14. Shows the distribution of ROIs in benign tumor that voted with the correct tumor
class.
To evaluate our finding, we calculated the average intensity of each tumor, and then we
excluded the ROIs that have average intensity less than 2% from the average tumor
intensity. Then, the remaining ROIs have been divided into training and validation data.
Finally, we used the well-trained SVM and the voting technique to classify the tumors
into benign and malignant tumors. The SVM and voting technique are applied by
classifying each ROI individually, and then calculating the number of ROIs that voted
correctly. If the number of ROIs that correctly classify the tumor is greater than half the
total number of ROIs in the tumor, then we supposed that the SVM classifier predicts the
tumor class correctly. Otherwise, the SVM is considered to misclassify the tumor.
Our contribution in the field of ultrasound breast cancer classification can be summarized
by the following points. First, the spatial distribution of ROIs was studied. In particular,
the ROIs located at bright regions, which are usually close to the tumor boundaries, are
more accurate than the ROIs located in the dark regions. Second, a new technique called
ROIs reduction is proposed to reduce the number of ROIs that will be used in the
classification phase. Third, the voting technique is employed to classify the breast tumors
into benign and malignant based on the classification results of the individual 1-mm2
ROIS.
CHAPTER FOUR
SIMULATION AND EXPERIMENT RESULTS
In this chapter, I will present the experiment results that are done in my thesis. In section
4.1, I will present the data acquisition. In section 4.2, I will present the classification
results obtained using the conventional morphological features. Section 4.3 provides the
results obtained using a combination of conventional morphological and textures features.
To increase the classification accuracy, we also combined the gray level co-occurrence
matrix (GLCM) features with the morphological and textures features in section 4.4. In
section 4.5 the gray level co-occurrence matrix (GLCM) features from multiples ROIs for
each tumor were extracted, then the classification results are registered. The classification
results obtained using the proposed method is presented in section 4.6. We show that the
proposed method outperforms the previous studies. Finally, the summary of this chapter
is presented in section 4.7.
4.1 Dataset Acquisition
The ultrasound image database used in this study is composed of 105 BUS images. These
images were acquired during routine breast diagnostic procedures at the Jordan
University Hospital, Jordan during the period between 2012 and 2014. The image dataset
has been acquired by our medical collaborator, Dr. Mahsen Al-Najar, and provided to me
by my supervisor, Dr. Mohammad Daoud. The image set was composed of 41 malignant
tumors and 64 benign cases. All breast ultrasound images had known ground-truth class
label obtained based on biopsy analysis. Each breast tumor was manually segmented by
our medical collaborator. All images were resampled to have the same pixel size of 0.1
mm x 0.1 mm.
4.2 Conventional Morphological Features Results

All 19 morphological features, described in section 3.2, were extracted from the tumor
region in each image in the database. Then the mutual information (MI) algorithm was
applied to select the best 10 features. After that, an SVM classifier was used to
distinguish the malignant tumors from benign tumor. We run the SVM classifier for 50
trials, and in each trial the ten-fold-cross-validation procedure was performed.
In
particular, in each fold the data was distributed randomly, and then it was divided into
two datasets, training dataset that contains 90% of the data and validation data that
remaining contains 10%. The result showed that from 19 morphological features that
were extracted for each case in this study, 10 features can represent data with high
accuracy. The ten selected features are:- Form Factor, Roundness, Aspect Ratio,
Convexity, Solidity, Extent, Tumor area, the ratio between the major and minor axis of
the best fit ellipse (Ellipse_ab), The ratio of the best fit ellipse perimeter and the tumor
perimeter (Ep_Tp), and normalized radial length variance (NRL_Variance). The results
obtained using the selected features are as follows: accuracy = 92.56%, Specificity =
94.80%,
Sensitivity = 89.60%, positive predictive value (PPV) = 93.41%, negative
predictive value (NPV) = 94.12% and Matthews correlation coefficient = 85.70%.
4.3 The Results obtained by combining the conventional morphological

Features with conventional Textures Features
In this experiment, the 19 morphological features, described in section 3.2, were extracted
in combination with the textures features described in sections 3.3.1 to 3.3.4 for all
images in the database. Then the mutual information (MI) algorithm was applied. After
that, the SVM classifier was used to distinguish the malignant tumors from benign tumor.
We run the SVM classifier for 50 trials; in each trial ten-fold cross-validation algorithm
was applied. The results obtained using the combined morphological and texture features
are as follows: accuracy= 93.60 %, Specificity = 96.40 %,
Sensitivity = 89.40 %,
positive predictive value (PPV) = 95.48 %, negative predictive value (NPV) 94.04%, and
Matthews correlation coefficient = 87.53%.
4.4 The results obtained by combining the conventional morphological,

textures, and GLCM Features
The gray level co-occurrence matrix (GLCM) features were computed for the entire
tumor as follows. First, the minimum bounding rectangle was determined automatically
using an automated algorithm, written using MATLAB. This bounding box, denoted here
as region of interest (ROI), was cropped for computing the GLCM features. The size of
the ROI that includes the entire tumor depends on the width and height of the tumor, so
that the window size that used to calculate GLCM is equals to the width and height of the
tumor. After that, the ROI was normalized to the gray-level range [0, 255] to normalize
the dynamic range to the same scale. The quantization level of the GLCM is set to 32
levels. Then, the 25 GLCM texture features, described in section 3.3.5, were computed
across the entire ROI at four distances d= (1, 2, 3, and 4) with four directions = (0, 45,
90 and 135). The above settings generate GLCM features matrix that consists of 400
texture features (25x4x4). The generated matrix was then concatenated with the 19
morphological and 5 textures features for each tumor in the database. Now we have a
total of 424 features that represent each tumor. The MI algorithm was applied to rank the
features. The best 29 features were selected. Then, an SVM classifier was used. We run
SVM classifier for 50 trials; in each trial the ten-fold cross-validation procedure method
was applied. The classification results are as follows: accuracy = 94.46 %, Specificity =
95.20 %,
Sensitivity = 93.35%, positive predictive value (PPV) = 94.06 %, negative
predictive value (NPV) 96.11 %, and Matthews correlation coefficient = 89.30%.
4.5 The classification results obtained by dividing the tumor into 1-mm2
without Applying the ROI Reduction and Voting Mechanism
In previous experiment, we used the conventional minimum bounding box to define the
ROI. The bounding box is the area of the minimum rectangle that contains the tumors as
shown in Figure 3.3. In fact, the GLCM features were extracted from this ROI that
approximately matches the size of the tumor.
In this section, however, we divided the tumor into a set of non-overlapping ROIs. The
size of each ROI is equal to 1x1 mm (10x10 pixels). Hence, every tumor contained a set
of ROIs. These ROIs will be used to calculate GLCM features. As in previous
experiments, the ROIs are normalized to the gray-level range [0, 255] to normalize the
dynamic range to the same scale. The quantization level is set to 32 levels. Then, the 25
GLCM features, defined in section 3.3.5, were extracted from each ROI inside the tumor
at four distances d= (1, 2, 3, and 4) and four directions = (0, 45, 90 and 135) . The
above settings generate GLCM features matrix consist of 400 texture features (25x4x4),
and every tumor will be represented by a matrix with size of Nx400, where N is the
number of ROIs inside the tumor and 400 is the number of GLCM features. Then, the MI
algorithm was applied to rank the features. The best 49 features were selected to represent
the data in the next steps. Then, an SVM classifier was used. We run the SVM classifier
for 50 trials, in each trial the ten-fold cross-validation procedure was applied. The
classification results obtained by classifying the individual ROIs are as follows: accuracy
= 90.05 %, Specificity = 92.43 %,
Sensitivity = 88.13 %, positive predictive value
(PPV) = 89.92 %, negative predictive value (NPV) 90.46 %, and Matthews correlation
coefficient = 80.40 %.
4.6 Results obtained using the proposed ROIs reduction and Voting-based
Method
In section, the tumor classification performance obtained using the proposed is evaluated.
As in last section, we divided each tumor into a set of non-overlapping ROIs with a size
of 1x1 mm (10x10 pixels). The ROIs were normalized to the gray-level range [0, 255].
The quantization level was set to 32 levels. Then, the GLCM features for each ROI inside
the tumor were calculated at four distances d= (1, 2, 3, and 4) and four directions = (0,
45, 90 and 135). Now we have GLCM features matrix consist of 400 texture features
(25x4x4), and every tumor will be represented by a matrix with size of Nx400, where N is
the number of ROIs and 400 is the number of GLCM features. Then, the MI algorithm
was applied to rank the features. The best 49 features were selected to represent the data.
Until now we just applied the same steps that used in previous studies, but in the next
steps we will modified to match the proposed method. Based on our study on the spatial
distribution ROIs, we have noticed that the ROIs that incorrectly predict the tumor class
are located in the dark regions, while the ROIs that correctly predict the tumor are located
in brightest regions. Therefore, we have calculated the average intensity for the tumors,
and then we excluded the ROIs that have average intensity less than 2% of the average
tumor intensity. This step is called ROIs reduction. Then, the 105 tumors were divided
into training and validation datasets. During the training phase, the ROIs of the tumors in
the training set are used to train the SVM. In the testing phase, the ROIs of each tumor
were classified using the trained SVM classifier and the voting technique was applied to
determine the class of the tumor. In particular, we calculated the number of ROIs that
voted correctly and incorrectly; if the number of correctly-classified ROIs is greater than
50% of the total number of tumor ROIs, and then the classifier is considered to correctly
predict the tumor class. Otherwise, the tumor is considered to be incorrectly classified.
We run SVM classifier for 50 trials, and in each trial the ten-fold cross-validation method
is used. In each fold, the tumors are distributed randomly, and then it divided them into
two datasets, training dataset that containing 90% of the tumors and validation dataset
that contains the remaining 10%. The tumor classification results are as follows: the
accuracy is equal to 96.89%, the specificity equals to 98.28%, the sensitivity equals to
94.72%, the Positive Predictive Value equals to 97.92%, the Negative Predictive Value
equals to 97.11%, and the Matthews Correlation Coefficient equals to 93.95%.
4.7 Summary
To evaluate our proposed method, we compare its performance with conventional
morphological, texture, and GLCM analysis reported in previous studies. First, we
evaluated the classification performance obtained using the morphological features only.
Then we added the textures features and we had noticed that the accuracy of the
classification has increased. Then the GLCM features were added to the set of the
features, the classification results showed that accuracy is better than previous methods.
In the fourth experiment, we divided the tumor into multiple ROIs. Then, the GLCM
features were extracted. Then SVM is used as a classifier to classify the individual ROIS.
The accuracy in this experiment is worse than before. Finally, we evaluated the
performance of our proposed method, which that can be summarized by two main steps;
ROIs reduction and ROI voting. The experimental results showed that our proposed
algorithm outperforms the accuracy, specificity, and sensitivity of the conventional
morphological, texture, and GLCM analysis, as summarized in Table IV.
Method
Table IV
SUMMARY OF THE EXPERIMENTS RESULTS
Accuracy
Specificity Sensitivity PPV
Conventional
NPV
MCC
92.56
94.80
89.60
93.41
94.12
85.70
93.60
96.40
89.40
95.48
94.04
87.53
94.46
95.20
93.35
94.06
96.11
89.30
90.05
92.43
88.13
89.92
90.46
80.40
96.89
98.28
94.72
97.92
97.11
93.95
Morphological Features
Combined Conventional
Morphological and
Texture Features
Combined conventional
Morphological, Texture,
and GLCM Features
Classification results
obtained by dividing the
tumor into ROIs and
classifying each ROI
using the GLCM
Features
Tumor classification
results obtained using
the proposed Method
CHAPTER FIVE
DISCUSSION, CONCLUSIONS, AND FUTURE WORKS
In this chapter, we conclude the martial of this thesis with a discussion of what we have
proposed in Section 5.1. The conclusions are provided in Section 5.2. Suggestions for
future work are summarized in section 5.3.
5.1 Discussion
Breast cancer is a major cause of death in women all around world, and especially in
Jordan. Mortality rate caused by this disease can be reduced by early detection of breast
cancer. Ultrasound images are considered as one of the most widely used technology to
detect abnormalities in dense breasts.
In previous studies, breast ultrasound CAD systems were based on conventional

morphological features, textures features, and combined morphological and texture
features to distinguish malignant tumors from benign tumors. These analysis extracted
features from a region of interest that includes the entire tumor. To the best of our
knowledge, no other study has evaluated the spatial distribution of texture features inside
the tumor with the goal of improving breast tumor classification using ultrasound images.
Also, no one has used the voting technique in the field of ultrasound breast cancer
classification. In this thesis, we proposed a new method to classify breast tumors into
malignant and benign tumors. The proposed method divides the tumor into a set of nonoverlapping ROIs of size 1 mm2. Moreover, an ROI reduction algorithm is proposed to
eliminate the ROIs that are located at the dark region in the tumor. The individual ROIs
are classified using a well-trained SVM, and a voting technique is used to combine the
votes of the individual ROIs and determine the tumor class. The experimental result
showed that the accuracy of the proposed method outperforms the conventional analysis
reported in previous studies.
One important contribution of the thesis is the analysis of the spatial distribution of the
ROIs. Our analyses indicate that the ROIs located in dark regions are less accurate in
prediction the tumor class from the ROIs in brightest regions.
5.2 Conclusions
This thesis presented a study of the spatial distribution of features inside the tumor. This
thesis showed that the distribution of ROIs features inside the tumor can affect the
accuracy of classification. Also, we proposed a new method for tumor classification that
divides the tumor into ROIs, performs ROIs reduction based on the spatial location of the
ROIs, and combines the votes of the individual ROIs to determine the class of the tumor.
It has been shown in this thesis that the proposed method outperforms conventional
morphological and texture analyses that were reported in previous studies.
5.3 Future works

In order to enhance the performance of the proposed algorithm, further investigations
might be carried as follows:1. Further work and investigation might be done to enhance and to maturate the
ROIs reduction method as a new concept that will be used in the field of
ultrasound breast cancer classification.
2. Study might be done on voting technique to enhance the voting algorithm and to
select best mathematical model that will be used in voting process.
3. Further work and investigation might be done to study the spatial distribution of
the ROIs using different scenarios to extract the best ROIs that can predict the
class of the tumor in better manner.
REFERENCES
1.
Cai, L. and Wang Y., A Phase-Based Active Contour Model for Segmentation of
Breast Ultrasound Images, IEEE 6th International Conference on Biomedical
Engineering and Informatics, pp. 91-95, 2013.
2.
Bothorel, S., Meunier, B. B., and Muller, S. A., Fuzzy logic based approach for
semi logical analysis of micro calcification in mammographic images, Intell.
Syst., vol. 12, pp. 819848, 1997.
3.
The Jordan Breast Cancer Program, http://www.jbcp.jo/node/15, 2015.
4.
Marcomini, K. D., Caneirob, A. A. O., and Schiabela, H., Development of a

computer tool to detect and classify nodules in Ultrasound breast images, SPIE
Image Proc., pp. 90351O-90351O, 2014.
5.
Cheng, H. D., Shan, J., Ju, W., Guo, Y., and Zhang, L., Automated breast cancer
detection and classification using ultrasound images: A survey, Elsevier Pattern
Recognition, vol. 43, no.1, pp. 299 - 317, 2010.
6.
Shankar, P. M., Piccoli , C. W., Reid, J. M., Forsbergand F., and Goldberg, B. B.,
Application of the compound probability density function for characterization of
breast masses in ultrasound B scans, Phys. Med. Biol., vol. 50, no.10, pp. 2241
2248, 2005.
7.
Taylor, K., Merritt, C., Piccoli, C., Schmidt, R., Rouse, G., Fornage, B., Rubin, E.,
Georgian-Smith, D., Winsberg, F., Goldberg, B., and Mendelson E., Ultrasound
as a complement to mammography and breast examination to characterize breast
masses, Ultrasound in Medicine and Biology, vol. 28, no.1, pp. 1926, 2002.
8.
Zhi, H., Ou, B., Luo, B., Feng, X., Wen, Y., and Yang, H., Comparison of
ultrasound elastography, mammography, and sonography in the diagnosis of solid
breast lesions, Journal of Ultrasound in Medicine, vol. 26, no.6, pp. 807815,
2007.
9.
Sahiner, B., Chan, H. P., Roubidoux, M. A., Hadjiiski, L. M., Helvie, M. A.,
Paramagul, C., and Blane, C., Malignant and benign breast masses on 3D US
volumetric images: effect of computer-aided diagnosis on radiologist accuracy,
Radiology, vol. 242, no. 3, pp. 716724, 2007.
10.
Chen, C. M., Chou, Y. H., Han, K. C., Hung, G. S., Tiu, C. M., Chiou, H. J., and
Chiou, S. Y., Breast lesions on sonograms: computer-aided diagnosis with nearly
setting- independent features and artificial neural networks, Radiology, vol. 226,
no.2, pp. 504514, 2003.
11.
Drukker, K., Giger, M. L., Horsch, K., Kupinski, M. A., Vyborny, C. J., and
Mendelson, E. B., Computerized lesion detection on breast ultrasound, Medical
Physics, vol. 29, no. 7, pp. 14381446, 2002.
12.
Huang, Y. L., Chen , D. R., and Liu Y. K., Breast cancer diagnosis using image
retrieval for different ultrasonic systems, IEEE International Conference on
Image Processing, vol. 5, pp. 25982960, 2004.
13.
Zakeri, F. S., Behnam, H., and Ahmadinejad N., Classification of Benign and
Malignant Breast Masses Based on Shape and Texture Features in Sonography
Images, Springer Journal of medical systems, vol. 36, no.3, pp. 16211627,
2012.
14.
Chen, D. R., Chang, R. F., Kuo, W. J., Chen, M. C., and Huang, Y. L , Diagnosis
of breast tumors with sonographic texture analysis using wavelet transform and
neural networks, Ultrasound Med. Biol., vol. 28, no.10, pp. 1301 1310, 2002.
15.
Jiang, P., Peng, J., Zhang, G., Cheng, E., Megalooikonomou, V., and Ling H.,
Learning-Based Automatic Breast Tumor Detection and Segmentation in
Ultrasound Image, IEEE 9th International Symposium on In Biomedical Imaging
(ISIB), pp. 1587-1590, 2012.
16.
Park, J., Kang, J. B., Chang, J. H., and Yoo, Y., Speckle Reduction Techniques
in Medical Ultrasound Imaging, Biomedical Engineering Letters, vol. 4, no.1, pp.
32-40, 2014.
17.
Uddin, M. S., Tahtali, M., Lambert, A. J., and Pickering, M. R., Speckle
Reduction for Ultrasound Images Using Nonlinear Multi-Scale Complex Wavelet
Diffusion, IEEE International Conference on Signal and Image Processing
Applications (ICSIPA), pp. 31-36, 2013.
18.
Joel, T. and Sivakumar, R., Despeckling of Ultrasound Medical Images: A

Survey, Journal of Image and Graphics, vol. 1, no. 3, pp. 161-165, September
2013.
19.
Zhang, J., Wang, C., and Cheng, Y., Comparison of Despeckle Filters for Breast
Ultrasound Images, Springer, Circuits, Systems, and Signal Processing, pp. 1-24,
2014.
20.
Mittal, D., Kumar, V., Saxena, S. C., Khandelwal, N., and Kalra, N.,
Enhancement of the ultrasound images by modified anisotropic diffusion
method, Springer, Medical and biological engineering and computing, vol. 48,
no.12, pp. 12811291, 2010.
21.
Chan, T. and Vese, L., Active contours without edges, IEEE Transactions on
Image Processing, vol. 10, no. 2, pp. 266277, 2001.
22.
Wang, W., Zhu, L., Qin, J., Chui, Y. P., Li, B. N., and Heng, P. A., Multi scale
geodesic active contours for ultrasound image segmentation using speckle
reducing anisotropic diffusion, Elsevier, Optics and Lasers in Engineering, vol.
54, pp.105116, 2014.
23.
Yu, Y., Acton, S. T., Speckle reducing anisotropic diffusion, IEEE Trans Image
Process, vol. 11, no.11, pp. 12601270, 2002.
24.
Huang, Q., Bai, X., Li, Y., Jin, L., and Li, X., Optimized graph-based
segmentation for ultrasound images, Elsevier, Neurocomputing, vol. 129, pp.
216224, 2014.
25.
Torbati, N., Ayatollahi, A., and Kermani, A., An efficient neural network based
method for medical image segmentation, Computers in Biology and Medicine,
Elsevier, vol.44, pp. 7687, 2014.
26.
Wu, W. J., Lin, S. W., and Moon, W. K., Combining support vector machine
with genetic algorithm to classify ultrasound breast tumor images, Elsevier,
Computerized Medical Imaging and Graphics, vol. 36, no.8, pp. 627 633, 2012.
27.
Lin, C. M., Hou, Y. L., Chen, T. Y., and Chen, K. H., Breast Nodules ComputerAided Diagnostic System Design Using Fuzzy Cerebellar Model Neural
Networks, IEEE Trans. on fuzzy systems, vol. 22, no. 3, pp. 693-699 ,2014.
28.
Huang, L., Shi, J., Wang, R., and Zhou S., Shearlet-based Ultrasound Texture
Features for Classification of Breast Tumor, IEEE 7th International Conference
on Internet Computing for Engineering and Science (ICICSE), pp. 116-121, 2013.
29.
American College of Radiology, Breast Imaging Reporting and Data System, 5th
ed.,
Reston,
VA,
https://shop.acr.org/Default.aspx?TabID=55&ProductId=66931383 , 2013.
30.
Moon, W. K., Lo, C. M., Cho, N., Chang, J. M., Huang, C. S., Chen , J. H., and
Chang, R. F., Computer-aided diagnosis of breast masses using quantified BIRADS findings, Elsevier Computer methods and programs in biomedicine, , vol.
111, no.1, pp. 8492, 2013.
31.
Gomez, W., Pereira, W., and Infantosi, A. F. C., Analysis of co-occurrence

texture statistics as a function of gray-level quantization for classifying breast
ultrasound, IEEE Trans. Med. Image, vol. 31, no.10, pp. 18891899, 2012.
32.
Yang, M. C., Moon, W. K., Wang, Y. C. F., Bae, M. S., Huang, C. S., Chen, J.
H., and Chang, R. F., Robust Texture Analysis Using Multi-Resolution GrayScale Invariant Features for Breast Sonographic Tumor Diagnosis, IEEE Trans.
Med. Imag., vol. 32, no. 12, pp. 2262- 2272, 2013.
33.
Alam, S., Feleppa, E., and Rondeau, M., Ultrasonic multi-feature analysis
procedure for computer-aided diagnosis of solid breast lesions, Ultrasonic
imaging, vol. 38, no. 1, pp. 1738, 2011.
34.
Uniyal, N., Eskandari, H., Abolmaesumi, P., Sojoudi, S., Gordon, P., Warren, L.,
Rohling, R.N., Salcudean, S.E., and Moradi, M., "Ultrasound RF Time Series for
Classification of Breast Lesions," IEEE Transactions on Medical Imaging, vol.34,
no.2, pp.652-661, 2015.
35.
K. Nie, J.H. Chen, H.J. Yu, Y. Chu, O. Nalcioglu, M.Y. Su, Quantitative analysis
of lesion morphology and texture features for diagnostic prediction in breast MRI
, Academic Radiology, vol. 15, pp.15131525, 2008.
36.
Rangayyan R.M., Mudigonda N.R., Desautels J.E.L., Boundary modelling and

shape analysis methods for classification of mammographic masses, Medical and
Biological Engineering and Computing, vol. 38, pp. 487496, (2000).
37.
Shen W.C., Chang R.F., Moon W.K., Chou Y.H., C.S. Huang, Breast ultrasound
computer-aided diagnosis using BI-RADS features, Academic Radiology, vol. 14
pp. 928939, 2007.
38.
Bergstra J, Casagrande N, Erhan D, Eck D, Kgl B., Aggregate features and

AdaBoost for music classification, Machine learning, vol.65, no. 2-3, pp. 473-84,
2006.
39.
Create
gray-level
co-occurrence
matrix
from
image,
http://www.mathworks.com/help/images/ref/graycomatrix.html, Retrieved April
15, 2016
40.
Gmez W., Leija L., and Daz-Prez A., Mutual information and intrinsic
dimensionality for feature selection, in Proc. 7th Int. Conf. Elect. Eng., Comput.
Sci. Automatic Control, Tuxtla Gutirrez,Mexico, pp. 339344, 2010.
41.
Pereira W. C., Alvarenga A. V., Infantosi A. F., Macrini L., and Pedreira C. E.,
A non-linear morphometric feature selection approach for breast tumor contour
fromultrasonic images, Comput. Biol.Med., vol. 40, no. 1112, pp. 912918,
2010.

:
: .
.
) (JNCR 468 9
.8004
.
.
-: .

.

.

.
.
) (ROIs
). (individual ROIs
.

.
.
.
.
.
99 5 85
) (GLCM
. .
.
.

905
68 89 .
8098 8095 .
(.)SVM
96.49
94.84 .
98.56
90.05 98.40 98.83 98.86 95.80 .

.

.

Bdair T MasterThesis PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bdair T MasterThesis PDF

Uploaded by

Copyright:

Available Formats

BREAST CANCER CLASSIFICATION USING THE SPATIAL

DISTRIBUTION OF ULTRASOUND-BASED FEATURES

Dr. Mohammad Daoud

The Deanship of Graduate Studies

German Jordanian University

To my Parents, Wife, Brothers, and Children.

I extend my endless gratitude to Almighty ALLAH for blessing me in completing this

CHAPTER ONE: INTRODUCTION ....................................................................... 1

CHAPTER TWO: LITERATURE REVIEW ....................................................................... 6

CHAPTER THREE: MATERIAL AND METHODS............................................ 22

3.2.5 Convexity ................................................................................................ 24

CHAPTER FOUR: SIMULATION AND EXPERIMENT RESULTS ................. 52

4.7 Summary ........................................................................................................ 58

CHAPTER FIVE: DISCUSSION, CONCLUSIONS, AND FUTURE WORKS .. 60

The maximum and minimum diameters of a tumor

The Convex Hull that includes the tumor

The Bounding Box around tumor

The Best Fit Ellipse

The eight neighborhoods around pixel P(x, y)

The undulation characteristics features [37]

The Inner and Outer bands of a tumor

The posterior acoustic area of a tumor

The four directions used in GLCM at distance 1 pixel

The main steps of the proposed tumor classification method

Texture features extracted from GLCM

The expressions used to calculate GLCM features

Summary of all extracted features in this thesis

Summary of the experiments results

conventional morphological analysis, conventional texture analysis, and the combined

The ionizing radiation of

The main contributions of this thesis can be summarized as follows:

An extended review of previous ultrasound-based breast tumor classification

Implement and evaluate conventional texture, morphological, and combined

To improve the accuracy of texture-based breast tumor classification, the tumors

An improved method for breast tumor classification is introduced. In this method,

1.4 Thesis Organization

In this chapter, we present a review of previous studies in four fields related to

2.1 Preprocessing algorithms for enhancing ultrasound images

Filtering techniques can be grouped into linear and

2.2 Ultrasound Image Segmentation

A Multi-scale geodesic active contour for ultrasound image segmentation is proposed in

In addition to ACM methods, graph-based (GB) segmentation algorithms have been

2.3 Feature Extraction and Selection

Four important factors should be considered to select a good feature

combination: discrimination, reliability, independence and optimality [5]. The most

independent of the scanning angle. The area of lesion is considered as a morphologic

2.4 Breast Cancer Classification

Following the feature extraction, a fuzzy cerebellar model neural network

classifier (FCMNN) is implemented. The FCMNN can be viewed as a generation of a

A computer-aided diagnosis (CAD) system for classifying breast masses based on

3.1 Data acquisition

3.2 Morphological Features

3.2.1 Tumor Area and Perimeters

3.2.2 Form Factor

Figure 3.1 The maximum and minimum diameters of a tumor.

3.2.4 Aspect Ratio

Figure 3.2 The Convex Hull that includes the tumor.

is the area of the smallest rectangle that contains the tumor as

shown in Figure 3.3.

Figure 3.3 The Bounding Box.

3.2.8 Best Fit Ellipse Features

Figure 3.4 Best Fit Ellipse.