Professional Documents
Culture Documents
By
Tariq Bdair
Thesis submitted in partial fulfillment of the requirements for the degree of M.Sc. in
Computer Engineering
At
July, 2016
DEDICATIONS
{
}
, 9
ACKNOWLEDGMENTS
TABLE OF CONTENTS
Page
DEDECATIONS ........................................................................................................ i
ACKNOWLEDMENTS ........................................................................................... ii
TABLE OF CONTENTS .........................................................................................iii
LIST OF FIGURES ................................................................................................. vi
LIST OF TABLES .................................................................................................. vii
ABSTRACT ...........................................................................................................viii
REFERENCES........................................................................................................ 63
ARABIC ABSTRACT ............................................................................................ 67
LIST OF FIGURES
Figure
Description
Page
3.1.
24
3.2
25
3.3
26
3.4
27
3.5
The minor and major axes of the best fit ellipse and Theta
28
3.6
30
3.7
31
3.8
32
3.9
34
3.10
36
3.11
GLCM calculation
37
3.12
46
3.13
The ROIs in malignant tumors that voted correctly with tumor class 49
3.14
The ROIs in benign tumors that voted correctly with tumor class
50
LIST OF TABLES
Table
Description
Page
Table I
38
Table II
40
Table III
43
Table IV
59
ABSTRACT
BREAST CANCER CLASSIFICATION USING THE SPATIAL
DISTRIBUTION OF ULTRASOUND-BASED FEATURES
By
Tariq Bdair
Thesis supervisor: Dr. Mohammad Daoud
Breast cancer is one of the major causes of death in women across the globe. According
to the statistics from Jordan National Cancer Registry (JNCR), 864 Jordanian women and
9 Jordanian men were diagnosed with breast cancer in 2008. Detection of breast cancer in
earlier stages can reduce the treatment cost and minimize the death rate. Many
technologies have been used to classify breast cancer in early stages, among which is
ultrasound. In fact, there is an increasing interest in using ultrasound imaging to detect
abnormalities in dense breasts due to its safety, portability and low cost. Several studies
have been reported in the literature about the detection and classification of breast tumors
based on texture and morphological features extracted from ultrasound images. However,
to best of our knowledge, previous studies did not analyze the spatial distribution of the
ultrasound-based texture features with the goal of improving breast cancer classification.
This thesis investigates the use of an improved texture analysis technique for increasing
the accuracy and specificity of breast tumor classification using two-dimensional (2D)
ultrasound images. Most conventional texture analyses classify the tumor using texture
features extracted from the entire tumor region. Instead of analyzing the entire tumor, our
improved technique divides the tumor into non-overlapping ROIs and extracts texture
features from the individual ROIs. The spatial distribution of the ROIs is analyzed to
improve the accuracy of tumor classification. In particular, the analysis indicate that the
ROIs located at bright regions of the image, which are usually close the tumor
boundaries, are more sensitive to the class of the tumor compared with the ROIs located
at the dark regions. Therefore, our improved classification technique selects the ROIs in
the tumor that can improve the classification accuracy. The texture features of each
selected ROI were analyzed using a classifier to determine the tumor class. The
classification indicators obtained from the individual ROIs were combined using a voting
mechanism to classy the tumor as benign or malignant.
An extensive study was employed to evaluate the performance of the improved texturebased classification technique. In particular, the study employed an extend set of 19
morphological features, 5 textures features, and 25 Gray Level Co-occurrence Matrix
(GLCM) features, which were reported in several previous studies, for classifying breast
tumors. The morphological features quantify the shape of the tumor as well as the
characteristics of the area surrounding the tumor boundaries. The texture features quantify
the statistics of image pixels to classify the tumor. The texture features were extracted for
both the entire tumor, as reported in previous studies, and the individual ROIs, as
suggested in this thesis. The improved texture analysis technique as well as the
CHAPTER ONE
INTRODUCTION
1.1 Problem Description
Breast cancer is the most common cause of death in women across the globe [1]. In fact,
10% of women in Europe and 12.5% of women in the United States suffer from this
disease [2]. According to the statistics from Jordan National Cancer Registry (JNCR), 864
Jordanian women and 9 Jordanian men were diagnosed with breast cancer in 2008, with a
percentage of 18.8% of the total new cancer cases [3]. Moreover, according to JNCR,
breast cancer is ranked first among all cancers in women, accounting for 36.7% of all
cancers in female. The disease is considered the leading cause of cancer deaths among
women in Jordan [3]. Early detection of breast cancer increases the survival rates, and
improves the treatment outcome [3].
The most common imagining modalities for diagnosing breast cancer are mammography
and ultrasound. Mammography provides an accurate modality for breast screening. The
study reported in [4] indicated that mammography enables 18% to 30% mortality
reduction rates. However, mammography is characterized by low specificity, which leads
to unnecessary (65-85%) biopsies operations [5]. It also has low accuracy when used to
detect breast cancers in women with dense breast.
mammography can increase the health risk for both the patients and physicians [5].
Ultrasound imaging has been one of the most successful diagnostic methods for breast
cancer detection, but due to its operator-dependency, the interpretation of ultrasound
images depends on the experience of the radiologist [5]. In fact, the interpretation of
breast cancer images is operator-dependent due to many facts, including: breast cancer
masses have complex shapes and appearance; their patterns can change from patient to
patient; ultrasound image usually are of low contrast and have a lot of noises and speckles
[5].
To improve the accuracy of breast cancer diagnosis and reduce the operator dependency;
computer-aided diagnostic (CAD) systems have been proposed to analyze breast
ultrasound images and provide a second opinion to the radiologists. In general CAD
systems use four steps to classify and detect breast cancer: (1) Image preprocessing:
because of the low contrast and high speckle in ultrasound images, preprocessing
techniques are used to enhance the quality of the image and decrease ultrasound speckles
without degrading the important features of the image. (2) Image segmentation: divide
the image into non-overlapping regions and separate the tumor from background tissue
then define a region of interest (ROI) in the image to extract cancer-detection features. (3)
Feature extraction and selection: extract a group of features that can be used to classify
benign and malignant tumors accurately. Typically, large number of features can be
extracted from the ultrasound image, and hence selecting a set of features that enables
accurate tumor classification is an important task. (4) Classification: after features
extracting and selecting the ultrasound-based features, an intelligent classification
algorithms are applied to analyze the features and classify breast tumors as benign or
malignant [5].
1.2 Motivation
Ultrasound imaging is an important complement to mammography for breast cancer
screening. Due to the advantages of low cost, portability, safety, and real-time,
ultrasound has been widely used for breast cancer detection [5][6-8]. Recent studies
show that the use of ultrasound images can distinguish between the benign and
malignant tumors with high accuracy [5, 9, and 10]. In addition, ultrasound images
can increase the accuracy of breast cancer detection by 17% [5, 11] and decrease the
unneeded biopsies by 40%, which saves $1 billion in the United States every year [5,
12]. Breast ultrasound (BUS) images provide an important complementary to
mammography for many reasons; it is safe, and fast and more accurate than
mammography in detecting abnormalities in dense breasts for women younger than
35 years [13] [14].
1.3 Contribution
CHAPTER TWO
LITERATURE REVIEW
Noises could have additive natures such as thermal and electronic noises, or could have
multiplicative natures. Speckle is a form of noises that have multiplicative nature, and its
pattern is formed based on constructive or destructive interferences of the backscatters
echoes. It has been known that the additive nature noises, such as thermal and electronic
noises, are trivial and could be negligible compared to the multiplicative speckle noise
[16]. The common distribution model that is used to model speckle is Rayleigh
distribution [19]. Due to the limited dynamic ranges of the monitors, the envelop-detected
echo signal is compressed with logarithmic transformation. After logarithm
transformation, the multiplicative speckle noise model is converted to additive noise
model, so that the filters used for additive noise can be applied to logarithm-transformed
ultrasound images [19].
Speckle reduction might cause the losing of important details in the ultrasound image.
Powerful speckle reduction techniques are designed to suppress speckle and preserve the
important information and edges in the image [20]. Speckle reduction methods are
divided into three groups: (1) filtering methods; (2) wavelet methods; and (3)
compounding approaches [6].
nonlinear filters. The linear filters are divided into: (1) mean filter and (2) adaptive mean
filter. In the mean filter, each pixel is replaced by the average intensities of its
neighborhood pixels; this filter leads to smoothing and blurring the image .The mean
filter is suitable for additive Gaussian noises. Because of the fact that the speckle noise is
multiplicative noises, the mean filter is not good approach in such cases [6]. On the other
hand, the nonlinear filters include the median filter, Wiener filter and Anisotropic
Diffusion Filter. The Median filter is used when impulsive noise affected the image [6].
The median filter preserves the edges and produces less blurring image. However, the
drawback of using the median filter is the extra computation time that is needed to sort
the intensity value for each set [18]. The Wiener filter aims to decrease the amount of the
noise by comparing it with noiseless signal. It performs smoothing of the image based on
local variance computation. The smoothing is less when the local variance is large, while
the Wiener filter performs more smoothing when the variance is small. The result of the
Wiener filter is better than the linear filter as it preserves the edges and the important
information of the image, but it needs more computation time than linear filter [18]. The
Anisotropic Diffusion Filter is firstly proposed by Perona and Malik [20]. It has been
considered as one of the most popular filtering techniques used for imaging [20]. It
removes the speckles and enhances the edges at the same time. Anisotropic diffusion is
suitable with additive Gaussian noise and difficult with multiplicative noise. The filter
techniques are simple and fast, but they are sensitive to the size and shape of the filtering
window. The smoothing of the filter will decrease and the speckle noise cannot be
reduced effectively if the window size is too small, while over-smoothing will occur if the
window size is too large. Considering the shape of the window, the square window,
which is the most common window shape, will lead to corner rounding of the rectangular
features in the image [6].
The active contour model (ACM), known as snake, is a segmentation algorithm that
works by minimizing the energy associated with the current contour. The contour energy
is composed of internal and external energies. The snake model modifies its shape
iteratively to approximate the desired contour. During the deformation process, the force
is calculated from the internal energy and external energy. The external energy that
derived from image features and aims to converge the contour of the desired object
boundary. The internal energy is derived from the contour model and aims to control the
shape and the regularity of the contour [5]. The ACM algorithms can be categorized into
three classes: edge-based, region-based, and hybrid models.
Edge-based approaches achieve stable segmentation results when image gradients are
given with small variations. However, if the target boundary is not well-defined or
contains weak parts, the edge-based active contours could converge to wrong solutions.
To solve the above problem, region-based ACM algorithms have been proposed. The
target of these algorithms is to represent the region inside and outside the evolving curve
with region descriptors, such as probability distribution and local mean intensities. Active
contour without edges (ACWE) that was proposed by Chan and Vese Ref [21], is an
example of region-based ACMs. It works well with blurry boundaries and homogeneous
intensity. However, this algorithm achieves limited performance when the image contains
inhomogeneous intensities. To overcome this problem, active contour models based on
local statistics are proposed. Besides, region-based models with local statistics are usually
very sensitive to the size of the local window.
Hybrid approaches, such as phase-based active contour (PBAC) model, are proposed to
overcome problems that produced by the two previous approaches [1]. The PBAC model
integrates both the boundary and region information. Such a hybrid model can attract the
curve to strong edge points owing to the presence of the edge-based term while evolving
the curve by using the region-based term when the edge information is weak. In the study
presented in [1], the region-scalable fitting (RSF) energy model is used to model the
region-based energy in the image. Such an approach is robust against intensity
inhomogeneity, and unlike the traditional edge indicator that are based on the intensity
gradient magnitude, the RSF energy employs phase asymmetry to form a new phasebased edge indicator that is invariant to intensity and independent of image contrast [1].
interpolated and passed to next finer scale as initial contour. In addition to that, the
boundary shape similarity between different scales is used as an additional constraint to
guide the contour evolution. By incorporating this boundary shape similarity constraint
into traditional geodesic active contour (GAC) model within a multi scale framework,
they can successfully segment the target objects in ultrasound images with both high
speckle noise and low image contrast [22].
The artificial neural network (ANN) has been widely used in medical image
segmentation. Examples of the segmentation approaches that are based on ANN include
the Multilayer perceptron (MLP), self-organizing maps (SOM), Hopfield, and pulse
coupled neural networks [25]. The SOM is one of the best neural networks used for the
segmentation task. This neural network is unsupervised and employs competitive learning
to discover the topological structures hidden in the input image. Two advantages of the
SOM-based segmentation methods are the unsupervised training and fast learning. The
disadvantages of the segmentation methods that use this neural network are: (1) they need
high dimensional input space with empirical features for the best performance, and (2)
they cannot segment images with heavy noise successfully [25].
A Bayesian neural network (BNN) is used for image segmentation. At first, a radial
gradient index filtering technique was used to locate the ROIs in the image, where the
centers of the ROIS are recorded as points of interest. A region growing algorithm was
then used to determine candidate lesion margins. The lesion candidates were segmented
and detected by the BNN. However, the algorithm would fail if the lesion was not
compact and round-like [5].
In artificial neural network (ANN) and support vector machine (SVM) algorithms, the
training time of the algorithm and the performance of the classification are highly
affected by the dimensionality of the feature space. Thus, the selection and the extraction
of a suitable feature combination is a crucial task in computer aided diagnosis (CAD)
systems.
Most of the texture features are calculated from the entire image or ROIs based in pixel
gray level values [5]. An auto-covariance coefficient is a basic and traditional texture
feature which can reflect the inner-pixel correlation within an image. The variance, autocorrelation, or average contrast feature is defined as the ratio of the variance, autocorrelation coefficients, or intensity average inside the lesion to that outside the lesion.
The larger the ratio is, the lower the possibility of the tumor being malignant. Distribution
distortion of wavelet coefficients feature is defined as the summation of differences
among the real distribution of wavelet coefficients in each high-frequency sub-band and
distribution of the expected Laplacian distribution, this feature can reflect the margin
smoothness [5].
The posterior acoustic behavior, minimum side difference (MSD), or posterior acoustic
shadow feature is defined as the difference between the gray scale histograms of the
regions inside the lesion and posterior to the lesion. Homogeneity of the lesion feature is
the Boltzmann/Gibbs entropy over the gray scale histogram relative to the maximum
entropy. The higher the entropy is, the more homogeneous the lesion is [5].
Unlike texture features which are extracted from the rough ROIs, the morphologic
features focus on some global characteristics of the lesion, such as the shape and margin
[5]. The speculation feature is the ratio of low-frequency component to high-frequency
component; the larger the value is the lower possibility of the tumor being malignant is.
The depth to width ratio (or width to depth ratio) feature is one of the most effective
distinguishing features mentioned in many papers. Malignant lesions tend to have the
width-to-depth ratio bigger than 1, while benign lesions usually have this ratio to be
smaller than 1. The branch pattern feature is the number of local extremes in the lowpass-filtered radial distance graph. Malignant lesions tend to have high values of the
branch pattern feature. The margin sharpness, margin echogenicity, and angular variance
in margin features are computed by dividing the lesion into N sectors and comparing the
mean gray levels of the pixels in the inner and outer shells in each sector. The margin
echogenicity feature is the mean gray level difference of the inside and outside of the
sector. The three features described above (margin sharpness, margin echogenicity,
angular variance) have been proven to be significantly different by the student t-test when
they are used to distinguish benign and malignant lesions. To get the number of
substantial protuberances and depressions (NSPD) feature, the breast lesion is delineated
by a convex hull, and concave polygon. Ideally, a malignant breast lesion has a larger
NSPD value. The elliptic-normalized circumference (ENC) feature is defined as the
circumference ratio of the lesion to its equivalent ellipse, and it represents the
anfractuosity of a lesion which is a characteristic of malignant lesions Malignant lesions
tends to have a higher value of the ENC feature. The elliptic-normalized skeleton (ENS)
feature is defined as the number of skeleton points normalized by the circumference of
the equivalent ellipse of the lesion. The computation cost of this feature is relatively high.
Malignant lesion tends to have a higher value of ENS. The NSPD, LI, ENC and ENS
features capture mainly the contour and shape characteristics of the lesion. The long axis
to short axis ratio (L:S) feature is the ratio of the long- to short-axes, where the long and
short axes are determined by the major and minor axes of the equivalent ellipse.
Therefore, the L:S ratio is different from the traditional depth/width ratio because it is
An interesting study about breast tumor classification is presented in [22]. In this study, a
hybrid segmentation method, which combined the gradient vector flow (GVF) and
geodesic active contour (GAC) models, is used to outline the tumor region in the
ultrasound image. A set of six novel features are extracted from the ultrasound image to
quantify the texture, region, and shape of the tumor. These six features, which are the
eccentricity, solidity, deference area hull rectangular, deference area mass rectangular,
cross correlation left, and cross correlation right, were analyzed using an SVM classifier
to characterize the tumor as benign or malignant. The experimental results show that the
classification method achieved an accuracy of 95%, sensitivity of 90.91%, specificity of
97.87%, positive predictive value of 96.77%, negative predictive value of 93.88%, and
Matthews correlation coefficient of 89.71%.
In another study [26], a genetic algorithm was combined with SVM to classify ultrasound
breast images. The ultrasound images were preprocessed by applying an anisotropic
diffusion filter and binary thresholding to obtain a binary image. The binary image was
combined with the original image and processed using a level set algorithm to segment
the tumor. A set of auto-covariance texture features and morphologic features were
extracted from the combined image. These features were analyzed using a genetic
algorithm to determine the significant features and the near-optimal parameters for an
SVM, which was used to identify the tumor as benign or malignant. According to the
experimental results reported in the study, the accuracy of the proposed system for
classifying breast tumors was 95.24%.
A novel ultrasound-based diagnostic method for breast cancer classification was proposed
in [27]. The method, which is called a fuzzy cerebellar model neural network (FCMNN),
is based on intelligent classification to distinguish benign and malignant breast tumors. In
this method, the auto covariance textural features were extracted from the ultrasound
image.
fuzzy neural network that incorporates a learning mechanism imitating the cerebellum of
a human being. In contrast to a fuzzy neural network, the FCMNN is structured with
layers in the input space. It is often referred to as an associative neural network, where
only a small subset addressed by the input vector determines instantaneous output. The
FCMNN has several advantages such as good generalization and rapid learning speed and
convergence. The accuracy of the proposed FCMNN classifier was between 90% and
92%.
Huang et al. [28] applied Shearlet theory for breast cancer classification. Shearlet theory
is composite wavelets version of the traditional wavelet transform, which is designed to
identify anisotropic and directional information at deferent scales. Shearlet has strong
localization property and directional sensitivity. After decomposition the ultrasound
image by shearlet, the shearlet-based texture features were extracted from the horizontal
and vertical cone as following: (1) the entropy, correlation, contrast and angular second
moment calculated from the first layer horizontal shearlet coefficients; (2) the mean,
variance, energy from both the first and third horizontal and vertical layer coefficients; (3)
the maximal shearlet coefficients in each column of the coefficient Metrix; and (4)
Adaptive Boosting (AdaBoost) algorithm was used to distinguish the benign tumors from
malignant tumors. The results show that the classification accuracy, sensitivity,
specificity and precision of shearlet based method are 88.0%, 0.839, 93.2% and 94.0%,
respectively.
segmented using a level set method as follows. At the beginning, a sigmoid filter was
applied to the image to improve the its contrast, and then the gradient magnitude filter
was applied to compute the gradient of the image. Then, the sigmoid filter was applied
again to the gradient magnitude image for contrast enhancement. Finally, the level set
method, which was proposed to model a complex shape with changing topology, was
applied to for outlining the contour of the tumor. After image segmentation, six
quantitative features sets, including shape, orientation, margins, lesion boundary, echo
pattern, and posterior acoustic features, were extracted from the ultrasound image. In this
CAD system, any tumor with one or more malignant findings was classified as malignant.
Only the tumors that had no malignant findings and had at least one benign finding were
classified as benign. Based on the experimental results, the area under the curve of the
proposed CAD system were 0.96.
The Gomez et al. [31] combined the co-occurrence texture statistic features with gray
level quantization to classify breast ultrasound images. In their study, 22 texture features
were extracted from the gray level co-occurrence matrix (GLCM) at four different
orientations (0, 45, 90 and 135) and ten distances (1 pixel, 2 pixel,.10 pixels) . All
these calculations were performed for six gray level of quantization (8, 16, 32, 64, 128,
and 256) of the ultrasound image. The study included 436 breast ultrasound images. To
reduce the dimensionality of the feature space, the texture descriptors were averaged over
all orientations for the same distance. Moreover, the feature space was ranked using the
mutual information with minimal redundancy maximum relevance (MI-mRMR)
technique. The classification of the features was performed using the Fisher linear
discriminant analysis (FLDA). The study investigated the effect of quantization on the
classification performance, with the goal of determining the most useful texture features
and the effect of averaging the GLCM features to reduce the feature space dimensionality.
The results show that: (1) the averaging decreases the performance; (2) the results
obtained using the GLCM features without averaging indicate that the orientation of 90
and the distances larger than 5 pixels achieve have good classification of breast lesions;
(3) among 22 texture features, there are nine features that appeared repeatedly
independently of the GLCM orientations and distances, overall quantization levels, where
these nine features quantify the contrast, correlation cluster prominence, cluster shade,
difference variance, and inverse difference moment; (4) the gray-level quantization does
not improve or worsen the discrimination power of texture features. The achieved area
under the ROC curve (AUC), accuracy, sensitivity, specificity, positive predictive value,
and negative predictive value were equal to 0.87, 83.05%, 87.02%, and 88.11%, 86.82%,
and = 80.10%, respectively.
The study reported in [32] proposed a CAD system based on gray-scale invariant features
via ranklet transform. The Ranklet transform is an image process method that is
characterized by multi-resolution and orientation-selective approach, which can be used
as a rank descriptor of the pixels within a local region. It deals with the rank of the pixels
rather than their gray-scale intensity values as in the wavelet transform. In this study, the
ultrasound image is decomposed into ranklets based on multi-resolution and orientationselective properties of the ranklet transform. The gray-scale invariant GLCM texture
features were extracted from the transformed image. These features were calculated at
deferent resolutions (scales) and deferent orientations. Finally, an SVM classifier was
used to classify the tumor as benign or malignant. To evaluate the robustness and
effectiveness of the proposed method, an extensive experimental evaluation was
performed, in which three ultrasound image databases acquired with three different
sonographic breast ultrasound platforms were classified. The achieved area under the
curve for the three ultrasound image databases obtained via ranklet transform are 0.918,
0.943, and 0.934, respectively.
Shankar et al. [6] applied spectral analysis for breast cancer classification. In particular,
the breast ultrasound images were analyzed to estimate the compound probability density
function (PDF) that model the tumor (healthy and malignant) in the ultrasound image. A
combination of Nakagami parameter with the K distribution (that is sensitive to the
presence of speculations), was used to model the breast lesions in the ultrasound image.
The estimated PDFs were then employed to classify the tumors in breast ultrasound
images. Using this technique, good tumor classification was achieved with an area under
the ROC curve of 0.955.
Alam et al. [33] employed a multi features approach for breast tumor classification. In
particular, a combination of quantitative acoustic parameters, calculated using spectrum
analysis of ultrasound radio-frequency (RF) echo signals, and morphometric features,
computed based on the lesion shape, has been used to classify the solid breast lesions in
breast ultrasound images. The acoustic features contained echogenicity, heterogeneity
and shadowing. The morphometric features included area, location, aspect ratio and
boundary roughness of the lesions. The classification analysis reported in the study was
based on the logistic regression (LR) For 130 patient cases; the LR-based analysis
achieved an area under the ROC curve of 0.9470.045.
An innovative approach for breast cancer classification has been proposed in [34] based
on time series analysis of the ultrasound radio-frequency (RF) signals received from the
tumor. In this approach, a set of features were extracted from the time series of ultrasound
RF signals. These features were then classified using a machine learning framework. The
features extracted using the RF time series analysis, single-frame RF spectral analysis,
and B-mode texture were ranked based on their importance using two different feature
ranking approaches. In both feature-ranking algorithms, all the three best performing
features were from the RF time series group. The RF time series features were processed
using two classifiers: the SVM and the Random Forests. Using the best three RF time
series features, accurate breast tumor classification can be achieved with areas under the
curve of 0.86 and 0.81 using the SVM and Random Forests classifiers, respectively The
study indicated that the ultrasound RF time series features can enable breast cancer
classification using a small set of raw ultrasound data without the need for additional
instrumentation.
CHAPTER THREE
MATERIAL AND METHODS
In this chapter, we describe our contributions in the field of breast cancer classification
using ultrasound images. Section 3.1 describes the data collection process and the
ultrasound image database used in the analysis. Sections 3.2 and 3.3 provide a description
of extended sets of morphological features and texture features, respectively, that have
been suggested in previous studies for breast cancer classification. These features are
implemented and applied in this thesis to enable breast cancer classification as described
in Section 3.4. In Section 3.5, our improved breast cancer classification algorithm is
proposed. In this algorithm, the tumor is divided into non-overlapping ROIs. The spatial
distribution of the ROIs is analyzed and employed to select the ROIs for tumor
classification. The selected ROIs are then classified individually and combined using a
voting mechanism to determine the class of the tumor.
Form Factor =
(1)
3.2.3 Roundness
The Roundness is defined as the ratio between the area and the maximum diameter of
tumor, and computed be expressed as [26]:
Roundness =
(2)
The maximum diameter is defined as maximum distance between two pixels in the tumor
boundary that passes along the center of the tumor as shown in Figure 3.1.
Aspect_Ratio =
(3)
The minimum diameter is defined as minimum distance between two pixels in the tumor
boundary that passes along the center of the tumor as shown in Figure 3.1.
3.2.5 Convexity
The Convexity is defined as the ratio between the convex hull perimeter and the tumor
perimeter, and is computed using following formula [26]:
Convexity =
(4)
The convex hull is defined as the area of the smallest polygon that includes the tumor, as
illustrated in Figure 3.2.
3.2.6 Solidity
The solidity is a scalar quantifying the proportion of the pixels in the convex hull that are
also in the tumor area [26], and it can be computed using following formula:-.
Solidity =
(5)
where Convex_Area is the area of the convex hull and N is the number of tumors in the
database.
3.2.7 Extent
The Extent feature is a scalar that quantifies the ratio of the tumor area to the area of the
smallest bounding box that includes the tumor [26]. This feature is computed as follows:
Extent =
where
(6)
Many features can be extracted from Best Fit Ellipse as listed below:
The length of the major axis of the ellipse, as shown Figure 3.5.
The length of the minor axis of the ellipse, as shown Figure 3.5.
Ellipse compactness: defined as the overlap region between the tumor and ellipse.
Ellipse theta: is the angle of the major axis of the ellipse, as illustrated in Figure
3.5.
Figure 3.5 the major and minor axes of the Best Fit Ellipse and Theta.
where
NRL =
(7)
NRL entropy :
where
ENRL = -
* log2(
(8)
(9)
NRL variance :
NRL =
( - NRL )
(10)
where N is the number of pixels located at the tumor boundary and H is the number of
NRL probability values.
Compactness = 1 -
(11)
The compactness of the circle is equal to 0. The maximum compactness value, which is
computed for complex shapes, is equal to 1.
(12)
The value of the distance map at a pixel P is iteratively computed as shown in the
following equation:
Distance (P) = Min {Distance (N8 (P))} + 1
(13)
Figure 3.7 (b) and Figure 3.7 (c) illustrate the distance map computation for the tumor
shown in Figure 3.7 (a) [37]. After computing the distance map, the maximum inscribed
circle located inside the tumor is identified. By computing the maximum inscribed circle,
the lobulate areas included in the tumor and excluded from the inscribed circle are
identified, as shown in Figure 3.7 (d). However, only the large lobulate areas should be
considered. Therefore, if the maximum distance within a lobulate area is less than four
pixels, the area should be ignored. Finally, the number of significant lobulate areas is
defined as the undulation characteristics features (MU) as shown in Figure 3.7 (f).
These figures are adopted from [37]. Figure 3.7 (a) A malignant lesion. (b) The distance map of
(a). (c) The distance map of mass is represented by the gray scales. The black color denotes the
farthest distance to the lesion boundary. (d) Five undulation characteristics are explored. (e) The
local maxima could be used to detect the angular characteristics. (f) Three angular characteristics
are explored.
outside the tumor and located within a distance k from the tumor boundary. In this thesis,
the inner and outer bands are computed using a distance k of 3 pixels. This value of k has
also been employed by a previous study [30].
Figure 3.8. The Inner and Outer bands of a tumor computed at a distance k = 3.
The average intensity for the inner and outer bands can be calculated as below:
(14)
(15)
and
the inner and outer bands, respectively. Finally, the lesion boundary LB feature is defined
as below:LB = avg_outer avg_inner
(16)
(17)
where NR is the number of the pixels in the tumor, and I(P) is the gray-level intensity of a
pixel P located inside the tumor. To find the posterior acoustic features (PS), the region
under the tumor is found. To find the area under the tumor we find the posterior area
width (pw), which is equal to two thirds of the tumor mass width (mw), and posterior area
height (ph) which is equal to the tumor height (mh), but should not exceed 100pixels
[30]. The posterior area is shown in Figure 3.9. Then, the average gray-level intensity of
the posterior acoustic area is defined as:
(18)
where NPA is the number of the pixels in the posterior acoustic area, and I(P) is the graylevel intensity of posterior acoustic pixel P.
The difference between Eq. 17 and Eq. 18 is used to calculate the posterior acoustic
characteristic:PS =
EP
(19)
Figure 3.9. A tumor with width mw and height mh and the posterior acoustic area with height ph.
(20)
Where I (P) is the gray-level intensity of a pixel P located inside the tumor and NBP is the
number of the 25% brighter pixels, k is a dynamic threshold, for example, when the
threshold is set to k=51, the group of brighter pixels contain 28.22% of tumors pixels.
The contrast feature, EPC, is defined as:EPC =
(21)
(22)
where I(P) is the gray-level intensity of pixel P, NTissue is the number of pixels in the
surrounding tissues at a distance k = 10.
The average intensity difference between the surrounding tissues and the region under the
tumor is calculated as below:PSDiff =
Where
(23)
is defined by Eq. 18
And the average intensity difference between the tumor and the surrounding tissues is
defined by:EPDiff =
EP
(24)
shown in Figure 3.10. The number of grays levels is another parameter that used to
calculate GLCM. The number of grays levels is the value that used to determine the
number of gray-levels that will be used when scaling the grayscale values of the input
image I. For example, if the number of level is set to 8, then we scale the values of
intensities of the image I to be between 1 and 8. In addition to that, the number of graylevels determines the size of the gray-level co-occurrence matrix that will be generated,
so if we set the number of level equals to 8 then, a 8x8 matrix will be generated. Finally,
the intensity limit parameter determines the maximum and minimum intensity that will be
used when dividing the intensities between levels. The default values for the intensity
limits is the maximum and minimum intensity in the image. A simple example of
computing the GLCM matrix is presented in Figure 3.11 Ref [39]. Element (1,1) in the
GLCM contains the value 1 because there is only one instance in the image where two,
horizontally adjacent pixels have the values 1 and 1. Element (1,2) in the GLCM contains
the value 2 because there are two instances in the image where two, horizontally adjacent
pixels have the values 1 and 2.
Figure 3.10. 3x3 matrix shows the four directions used in GLCM at distance 1 pixel.
Figure 3.11 shows how GLCM is calculated for the 4-by-5 image I.
In fact, the GLCM matrix represents the joint frequencies of all combinations of gray
levels i and j that are separated by a distance d and along the direction . The GLCM can
be defined as below [31]:
where
{[
and
]}
(25)
and . is the number of the pixel pairs that satisfy the conditions.
The GLCM is used to calculate texture features that quantify the pixel statistics In fact, a
set of GLCM texture features can be extracted for each GLCM matrix computed at each
combination of direction and distance using the expressions given in Table I and Table II
[31]:-
Feature
Table I
TEXTURE FEATURES EXTRACTED FROM GLCM
Equation
Autocorrelation
Contrast
Correlation I
Correlation II
Cluster Prominence
Cluster Shade
Dissimilarity
Energy
Entropy
Homogeneity I
Homogeneity II
Maximum probability
Sum of squares
Sum average
Sum entropy, F
Sum variance
Difference variance
Difference entropy
Table II
NOTATION AND EXPRESSION USED FOR CALCULATING the GLCM
FEATURES
Notation
Meaning
th entry of the co-occurrence probability matrix.
Gray-Level quantization.
Mean value of
HX
HY
HXY
HXY1
HXY2
defined in
Table II were calculated for each ROI at four different directions ( = 0, 45, 90, 135),
four distances (d=1, 2, 3, 4 pixels), and one quantization level of L= 32. Using this
configuration, a set of 400 GLCM features was extracted from each ROI. At ROI
boundaries, the used distance can affect the result of GLCM, as the GLCM ignores the
pixels outside the ROI boundaries from the calculation.
To generate the textures features that is listed in Table I and Table II. First, we calculated
the GLCM matrix at four distances, d = (1, 2, 3 and 4) pixels. For every distance, four
directions of theta, = (0, 45, 90 and 135), were used. The number of levels is set to
32, so we scale the values of intensities within the image I to be between 1 and 32. Now,
suppose that we have two images t and s, image t has a maximum and minimum intensity
values equals to 194 and 2 respectively, and image s has a maximum and minimum
intensity values equals to 234 and 10 respectively. In the first image t, every level size
will contain 6 intensity values, for example the intensity of values from194 to 189 will be
at level number 32, and level number 31 will contain the intensity values from 188 to 183,
and so one until the first level that will contain the intensity of values from 7 to 2. For the
second image s, every level size will contain 7 intensity values, for example the intensity
of values from 234 to 229 will be at level number 32, and level number 31 will contain
the intensity values from 228 to 223, and so one until the first level that will contain the
intensity of values from 16 to 10. To prevent such variations in intensity values that used
in each level, we set the intensity limits values that will be used in our calculation to 255
and 0. Thereafter, the extracted ROI is normalized to the gray-level range [0, 255] for
stretching the dynamic range of all images to the same scale. So that every level from the
32 levels will contain 8 gray intensity values, the first level has the values from 0 to 7,
and the last level will have the values from 248 to 255.
Based on above settings, we calculate the 25 GLCM features listed in Table I and Table
II. These features were calculated at four distances d = (1, 2, 3 and 4) at four different
direction = (0, 45, 90 and 135). A total of 400 (25x4x4) textures features were
extracted for every ROI in the ultrasound image.
3.3.6 Summary of the features
Based on reference [29], malignant tumors tend to have irregular shapes while benign
tumors tend to be oval shapes. Also, malignant tumors are not parallel in their orientation
while benign tumors have parallel orientation. Angular, speculated margins are usually
found in malignant tumors, and circumscribed, microlobulated margins are found in
benign tumors. Lesion boundaries are echogenic halo for malignant tumors, and abrupt
interface lesions boundaries for benign tumors. Malignant tumors have hypo-echoic
appearance, while benign tumors have hyper-echoic and anechoic appearance. Finally,
shadowing is found in malignant tumors, while there is no shadow below benign tumors.
Table III summarizes all features that are used in this thesis, 19 morphological features, 5
textures features, and 25 GLCM features.
TABLE III
SUMMARY OF ALL EXTRACTED FEATURES IN THIS THESIS
Category
Features
Morphology
Tumor area
Tumor perimeter
Form Factor
Roundness
Aspect Ratio
Convexity
Solidity
Extent
The length of the major axis of the ellipse
The length of the minor axis of the ellipse
The ratio between the major and minor axis
The ratio of the ellipse perimeter and the tumor
perimeter
Ellipse Compactness: the overlap between the
ellipse and the tumor
Texture
GLCM Features
Autocorrelation
Contrast
Correlation I
Correlation II
Cluster Prominence
Cluster Shade
Dissimilarity
Energy
Entropy
Homogeneity I
Homogeneity II
Maximum probability
Sum of squares
Sum average
Sum entropy, F
Sum variance
Difference variance
Difference entropy
Information measure of correlation I
Information measure of correlation II
Inverse difference normalized
Inverse difference moment normalized
Mean value of
.
used. The advantage of using manual segmentation is that the classification will be more
accurate. In this thesis we used manual segmentation algorithm. Each tumor in our study
was segmented manually by an expert radiologist. After segmenting the tumor, the best fit
ellipse and bounding rectangle which include the tumor were determined. After that, the
morphological features were extracted.
To extract the GLCM texture features, the region of interest (ROI), which is a small
image region that includes the tumor, is selected. The bounding rectangle that includes
the tumor can be used as ROI [26]. This bounding box, denoted here as region of interest
(ROI), was cropped for computing the GLCM features. The size of the ROI that includes
the entire tumor depends on the width and height of the tumor, so that the window size
that is used to calculate the GLCM is equal to the width and height of the tumor. Finally,
the GLCM texture features were extracted using the expressions given in Table I and
Table II.
After features extractions, a support vector machine (SVM) classifier is used to classify
the tumor as either benign or malignant. The SVM is a robust data classification
algorithm and has been used in many fields during last years [26]. The aim of an SVM is
to find a hyperplane to separate the training data with a maximal margin using a kernel
functions, such as radial basis kernel functions (RBF) [26]. The RBF is the most widely
used kernel. Redundant features can increase the computation time and affect the
classification accuracy. Thus, an efficient and robust feature selection method that
reduces the effect of noisy as well as irrelevant and redundant data is required [26]
Figure 3.12 A diagram showing the main steps of the proposed tumor classification method.
The large number of features extracted from the tumor could have a large degree of
irrelevant and redundant information, which may reduce the accuracy of tumor
classification. An irrelevant feature does not contribute to distinguish data of different
classes and can be deleted without affecting the classification accuracy. On the other
hand, a redundant feature implies the co-existence of another feature with relevant
content, and hence the removal of one of them will not affect the classification
performance. To eliminate the irrelevant and redundant feature, a feature selection phase
is employed to remove the irrelevant and redundant features while maintaining acceptable
classification accuracy [31]. One of the most commonly used algorithms for feature
selection is mutual information (MI), which ranks the features in a manner that meets the
minimal-Redundancy-Maximal-Relevance (mRMR) criterion [40]. Pereira et al. [41] has
shown that MI is very helpful to rank the features extracted from breast ultrasound
images.
In this thesis, the MI is used to rank all features based on mRMR method, where the first
feature in the ranked features set has the maximum relevancy to the target class and the
last features in the set has the minimum relevancy to the target class. Mutual information
(MI) measures the degree of dependency between the features. The minimal redundancy
condition selects the features such that they are mutually exclusive to each other [31]. In
this study, we have selected the best 49 features to carry out the classification.
Bergstra et al. [38] proposed a voting-based algorithm that classifies musical genre and
artist from an audio waveform. Their work can be summarized as follow: For each
iteration t, the algorithm invoke a weak learning algorithm that returns a classifier h(t)
and computes its coefficient (t). The output of h(t) is a vector contains values of 1 and -1
over k classes. If h(t) = 1, then the classifier votes for the class , whereas h (t) = -1
means that the classifier votes against class . Then for all classes, the values of h(t) and
(t) are multiplied by each other. Then, the results from all iterations are added to each
other. Finally, the class that receives the maximum number of votes is selected as the
outcome of classification.
In this thesis, we used the concept of voting along with the spatial distribution of 1-mm2
ROIS to improve the classification accuracy. In particular, each tumor was divided into a
set of non-overlapping 1-mm2 ROIs. The ROIs that contribute to the tumor classification
were selected based on their spatial location. The selected ROIs are then combined using
a voting mechanism to accurately determining the class of the entire tumor. The steps of
the proposed algorithm are as follows. First, the textures features are extracted from all
individual ROIs inside the tumor region in the ultrasound image. Then, the MI algorithm
was applied to reduce the features space, from a total of 400 GLCM textures features that
have been extracted in previous steps to 49 GLCM texture features to represent each ROI.
Then, we perform a ten-fold-cross-validation by dividing the ROI features into two subdatasets. The first subset contains 90% of the data and will be used as training dataset,
and the second one (which contains the remaining 10% of the data) will be used for
validation. Both datasets include benign and malignant cases chosen using random
selection. Next, we trained the SVM using the training data set. After the training phase,
we used the validation dataset to evaluate the performance of the classifier. For every
tumor in validation dataset, we classified the ROIs for that tumor using the well-trained
SVM and we studied the ROIs that predict the correct tumor class and those that predict
the wrong class. The process of selecting 90% of the tumors for training and the
remaining 10% for validation has been repeated for 10 trials. As a result, we have noticed
that the ROIs that produced incorrect predictions are located in the dark regions of the
tumor, close to the center of the tumor. However, the ROIs that predict correctly the class
of the tumor are located in bright regions of the tumor, close to the tumor boundaries as
shown in Figures 3.13 and 3.14.
Figure 3.13. Shows the distribution of ROIs in malignant tumor that voted with the correct tumor
class.
Figure 3.14. Shows the distribution of ROIs in benign tumor that voted with the correct tumor
class.
To evaluate our finding, we calculated the average intensity of each tumor, and then we
excluded the ROIs that have average intensity less than 2% from the average tumor
intensity. Then, the remaining ROIs have been divided into training and validation data.
Finally, we used the well-trained SVM and the voting technique to classify the tumors
into benign and malignant tumors. The SVM and voting technique are applied by
classifying each ROI individually, and then calculating the number of ROIs that voted
correctly. If the number of ROIs that correctly classify the tumor is greater than half the
total number of ROIs in the tumor, then we supposed that the SVM classifier predicts the
tumor class correctly. Otherwise, the SVM is considered to misclassify the tumor.
Our contribution in the field of ultrasound breast cancer classification can be summarized
by the following points. First, the spatial distribution of ROIs was studied. In particular,
the ROIs located at bright regions, which are usually close to the tumor boundaries, are
more accurate than the ROIs located in the dark regions. Second, a new technique called
ROIs reduction is proposed to reduce the number of ROIs that will be used in the
classification phase. Third, the voting technique is employed to classify the breast tumors
into benign and malignant based on the classification results of the individual 1-mm2
ROIS.
CHAPTER FOUR
SIMULATION AND EXPERIMENT RESULTS
In this chapter, I will present the experiment results that are done in my thesis. In section
4.1, I will present the data acquisition. In section 4.2, I will present the classification
results obtained using the conventional morphological features. Section 4.3 provides the
results obtained using a combination of conventional morphological and textures features.
To increase the classification accuracy, we also combined the gray level co-occurrence
matrix (GLCM) features with the morphological and textures features in section 4.4. In
section 4.5 the gray level co-occurrence matrix (GLCM) features from multiples ROIs for
each tumor were extracted, then the classification results are registered. The classification
results obtained using the proposed method is presented in section 4.6. We show that the
proposed method outperforms the previous studies. Finally, the summary of this chapter
is presented in section 4.7.
The ultrasound image database used in this study is composed of 105 BUS images. These
images were acquired during routine breast diagnostic procedures at the Jordan
University Hospital, Jordan during the period between 2012 and 2014. The image dataset
has been acquired by our medical collaborator, Dr. Mahsen Al-Najar, and provided to me
by my supervisor, Dr. Mohammad Daoud. The image set was composed of 41 malignant
tumors and 64 benign cases. All breast ultrasound images had known ground-truth class
label obtained based on biopsy analysis. Each breast tumor was manually segmented by
our medical collaborator. All images were resampled to have the same pixel size of 0.1
mm x 0.1 mm.
In
particular, in each fold the data was distributed randomly, and then it was divided into
two datasets, training dataset that contains 90% of the data and validation data that
remaining contains 10%. The result showed that from 19 morphological features that
were extracted for each case in this study, 10 features can represent data with high
accuracy. The ten selected features are:- Form Factor, Roundness, Aspect Ratio,
Convexity, Solidity, Extent, Tumor area, the ratio between the major and minor axis of
the best fit ellipse (Ellipse_ab), The ratio of the best fit ellipse perimeter and the tumor
perimeter (Ep_Tp), and normalized radial length variance (NRL_Variance). The results
obtained using the selected features are as follows: accuracy = 92.56%, Specificity =
94.80%,
Sensitivity = 89.40 %,
positive predictive value (PPV) = 95.48 %, negative predictive value (NPV) 94.04%, and
Matthews correlation coefficient = 87.53%.
90 and 135). The above settings generate GLCM features matrix that consists of 400
texture features (25x4x4). The generated matrix was then concatenated with the 19
morphological and 5 textures features for each tumor in the database. Now we have a
total of 424 features that represent each tumor. The MI algorithm was applied to rank the
features. The best 29 features were selected. Then, an SVM classifier was used. We run
SVM classifier for 50 trials; in each trial the ten-fold cross-validation procedure method
was applied. The classification results are as follows: accuracy = 94.46 %, Specificity =
95.20 %,
4.5 The classification results obtained by dividing the tumor into 1-mm2
without Applying the ROI Reduction and Voting Mechanism
In previous experiment, we used the conventional minimum bounding box to define the
ROI. The bounding box is the area of the minimum rectangle that contains the tumors as
shown in Figure 3.3. In fact, the GLCM features were extracted from this ROI that
approximately matches the size of the tumor.
In this section, however, we divided the tumor into a set of non-overlapping ROIs. The
size of each ROI is equal to 1x1 mm (10x10 pixels). Hence, every tumor contained a set
of ROIs. These ROIs will be used to calculate GLCM features. As in previous
experiments, the ROIs are normalized to the gray-level range [0, 255] to normalize the
dynamic range to the same scale. The quantization level is set to 32 levels. Then, the 25
GLCM features, defined in section 3.3.5, were extracted from each ROI inside the tumor
at four distances d= (1, 2, 3, and 4) and four directions = (0, 45, 90 and 135) . The
above settings generate GLCM features matrix consist of 400 texture features (25x4x4),
and every tumor will be represented by a matrix with size of Nx400, where N is the
number of ROIs inside the tumor and 400 is the number of GLCM features. Then, the MI
algorithm was applied to rank the features. The best 49 features were selected to represent
the data in the next steps. Then, an SVM classifier was used. We run the SVM classifier
for 50 trials, in each trial the ten-fold cross-validation procedure was applied. The
classification results obtained by classifying the individual ROIs are as follows: accuracy
= 90.05 %, Specificity = 92.43 %,
(PPV) = 89.92 %, negative predictive value (NPV) 90.46 %, and Matthews correlation
coefficient = 80.40 %.
4.6 Results obtained using the proposed ROIs reduction and Voting-based
Method
In section, the tumor classification performance obtained using the proposed is evaluated.
As in last section, we divided each tumor into a set of non-overlapping ROIs with a size
of 1x1 mm (10x10 pixels). The ROIs were normalized to the gray-level range [0, 255].
The quantization level was set to 32 levels. Then, the GLCM features for each ROI inside
the tumor were calculated at four distances d= (1, 2, 3, and 4) and four directions = (0,
45, 90 and 135). Now we have GLCM features matrix consist of 400 texture features
(25x4x4), and every tumor will be represented by a matrix with size of Nx400, where N is
the number of ROIs and 400 is the number of GLCM features. Then, the MI algorithm
was applied to rank the features. The best 49 features were selected to represent the data.
Until now we just applied the same steps that used in previous studies, but in the next
steps we will modified to match the proposed method. Based on our study on the spatial
distribution ROIs, we have noticed that the ROIs that incorrectly predict the tumor class
are located in the dark regions, while the ROIs that correctly predict the tumor are located
in brightest regions. Therefore, we have calculated the average intensity for the tumors,
and then we excluded the ROIs that have average intensity less than 2% of the average
tumor intensity. This step is called ROIs reduction. Then, the 105 tumors were divided
into training and validation datasets. During the training phase, the ROIs of the tumors in
the training set are used to train the SVM. In the testing phase, the ROIs of each tumor
were classified using the trained SVM classifier and the voting technique was applied to
determine the class of the tumor. In particular, we calculated the number of ROIs that
voted correctly and incorrectly; if the number of correctly-classified ROIs is greater than
50% of the total number of tumor ROIs, and then the classifier is considered to correctly
predict the tumor class. Otherwise, the tumor is considered to be incorrectly classified.
We run SVM classifier for 50 trials, and in each trial the ten-fold cross-validation method
is used. In each fold, the tumors are distributed randomly, and then it divided them into
two datasets, training dataset that containing 90% of the tumors and validation dataset
that contains the remaining 10%. The tumor classification results are as follows: the
accuracy is equal to 96.89%, the specificity equals to 98.28%, the sensitivity equals to
94.72%, the Positive Predictive Value equals to 97.92%, the Negative Predictive Value
equals to 97.11%, and the Matthews Correlation Coefficient equals to 93.95%.
4.7 Summary
To evaluate our proposed method, we compare its performance with conventional
morphological, texture, and GLCM analysis reported in previous studies. First, we
evaluated the classification performance obtained using the morphological features only.
Then we added the textures features and we had noticed that the accuracy of the
classification has increased. Then the GLCM features were added to the set of the
features, the classification results showed that accuracy is better than previous methods.
In the fourth experiment, we divided the tumor into multiple ROIs. Then, the GLCM
features were extracted. Then SVM is used as a classifier to classify the individual ROIS.
The accuracy in this experiment is worse than before. Finally, we evaluated the
performance of our proposed method, which that can be summarized by two main steps;
ROIs reduction and ROI voting. The experimental results showed that our proposed
algorithm outperforms the accuracy, specificity, and sensitivity of the conventional
morphological, texture, and GLCM analysis, as summarized in Table IV.
Method
Table IV
SUMMARY OF THE EXPERIMENTS RESULTS
Accuracy
Specificity Sensitivity PPV
Conventional
NPV
MCC
92.56
94.80
89.60
93.41
94.12
85.70
93.60
96.40
89.40
95.48
94.04
87.53
94.46
95.20
93.35
94.06
96.11
89.30
90.05
92.43
88.13
89.92
90.46
80.40
96.89
98.28
94.72
97.92
97.11
93.95
Morphological Features
Combined Conventional
Morphological and
Texture Features
Combined conventional
Morphological, Texture,
and GLCM Features
Classification results
obtained by dividing the
tumor into ROIs and
classifying each ROI
using the GLCM
Features
Tumor classification
results obtained using
the proposed Method
CHAPTER FIVE
DISCUSSION, CONCLUSIONS, AND FUTURE WORKS
In this chapter, we conclude the martial of this thesis with a discussion of what we have
proposed in Section 5.1. The conclusions are provided in Section 5.2. Suggestions for
future work are summarized in section 5.3.
5.1 Discussion
Breast cancer is a major cause of death in women all around world, and especially in
Jordan. Mortality rate caused by this disease can be reduced by early detection of breast
cancer. Ultrasound images are considered as one of the most widely used technology to
detect abnormalities in dense breasts.
eliminate the ROIs that are located at the dark region in the tumor. The individual ROIs
are classified using a well-trained SVM, and a voting technique is used to combine the
votes of the individual ROIs and determine the tumor class. The experimental result
showed that the accuracy of the proposed method outperforms the conventional analysis
reported in previous studies.
One important contribution of the thesis is the analysis of the spatial distribution of the
ROIs. Our analyses indicate that the ROIs located in dark regions are less accurate in
prediction the tumor class from the ROIs in brightest regions.
5.2 Conclusions
This thesis presented a study of the spatial distribution of features inside the tumor. This
thesis showed that the distribution of ROIs features inside the tumor can affect the
accuracy of classification. Also, we proposed a new method for tumor classification that
divides the tumor into ROIs, performs ROIs reduction based on the spatial location of the
ROIs, and combines the votes of the individual ROIs to determine the class of the tumor.
It has been shown in this thesis that the proposed method outperforms conventional
morphological and texture analyses that were reported in previous studies.
REFERENCES
1.
Cai, L. and Wang Y., A Phase-Based Active Contour Model for Segmentation of
Breast Ultrasound Images, IEEE 6th International Conference on Biomedical
Engineering and Informatics, pp. 91-95, 2013.
2.
Bothorel, S., Meunier, B. B., and Muller, S. A., Fuzzy logic based approach for
semi logical analysis of micro calcification in mammographic images, Intell.
Syst., vol. 12, pp. 819848, 1997.
3.
4.
5.
Cheng, H. D., Shan, J., Ju, W., Guo, Y., and Zhang, L., Automated breast cancer
detection and classification using ultrasound images: A survey, Elsevier Pattern
Recognition, vol. 43, no.1, pp. 299 - 317, 2010.
6.
Shankar, P. M., Piccoli , C. W., Reid, J. M., Forsbergand F., and Goldberg, B. B.,
Application of the compound probability density function for characterization of
breast masses in ultrasound B scans, Phys. Med. Biol., vol. 50, no.10, pp. 2241
2248, 2005.
7.
Taylor, K., Merritt, C., Piccoli, C., Schmidt, R., Rouse, G., Fornage, B., Rubin, E.,
Georgian-Smith, D., Winsberg, F., Goldberg, B., and Mendelson E., Ultrasound
as a complement to mammography and breast examination to characterize breast
masses, Ultrasound in Medicine and Biology, vol. 28, no.1, pp. 1926, 2002.
8.
Zhi, H., Ou, B., Luo, B., Feng, X., Wen, Y., and Yang, H., Comparison of
ultrasound elastography, mammography, and sonography in the diagnosis of solid
breast lesions, Journal of Ultrasound in Medicine, vol. 26, no.6, pp. 807815,
2007.
9.
Sahiner, B., Chan, H. P., Roubidoux, M. A., Hadjiiski, L. M., Helvie, M. A.,
Paramagul, C., and Blane, C., Malignant and benign breast masses on 3D US
volumetric images: effect of computer-aided diagnosis on radiologist accuracy,
Radiology, vol. 242, no. 3, pp. 716724, 2007.
10.
Chen, C. M., Chou, Y. H., Han, K. C., Hung, G. S., Tiu, C. M., Chiou, H. J., and
Chiou, S. Y., Breast lesions on sonograms: computer-aided diagnosis with nearly
setting- independent features and artificial neural networks, Radiology, vol. 226,
no.2, pp. 504514, 2003.
11.
Drukker, K., Giger, M. L., Horsch, K., Kupinski, M. A., Vyborny, C. J., and
Mendelson, E. B., Computerized lesion detection on breast ultrasound, Medical
Physics, vol. 29, no. 7, pp. 14381446, 2002.
12.
Huang, Y. L., Chen , D. R., and Liu Y. K., Breast cancer diagnosis using image
retrieval for different ultrasonic systems, IEEE International Conference on
Image Processing, vol. 5, pp. 25982960, 2004.
13.
Zakeri, F. S., Behnam, H., and Ahmadinejad N., Classification of Benign and
Malignant Breast Masses Based on Shape and Texture Features in Sonography
Images, Springer Journal of medical systems, vol. 36, no.3, pp. 16211627,
2012.
14.
Chen, D. R., Chang, R. F., Kuo, W. J., Chen, M. C., and Huang, Y. L , Diagnosis
of breast tumors with sonographic texture analysis using wavelet transform and
neural networks, Ultrasound Med. Biol., vol. 28, no.10, pp. 1301 1310, 2002.
15.
Jiang, P., Peng, J., Zhang, G., Cheng, E., Megalooikonomou, V., and Ling H.,
Learning-Based Automatic Breast Tumor Detection and Segmentation in
Ultrasound Image, IEEE 9th International Symposium on In Biomedical Imaging
(ISIB), pp. 1587-1590, 2012.
16.
Park, J., Kang, J. B., Chang, J. H., and Yoo, Y., Speckle Reduction Techniques
in Medical Ultrasound Imaging, Biomedical Engineering Letters, vol. 4, no.1, pp.
32-40, 2014.
17.
Uddin, M. S., Tahtali, M., Lambert, A. J., and Pickering, M. R., Speckle
Reduction for Ultrasound Images Using Nonlinear Multi-Scale Complex Wavelet
Diffusion, IEEE International Conference on Signal and Image Processing
Applications (ICSIPA), pp. 31-36, 2013.
18.
19.
Zhang, J., Wang, C., and Cheng, Y., Comparison of Despeckle Filters for Breast
Ultrasound Images, Springer, Circuits, Systems, and Signal Processing, pp. 1-24,
2014.
20.
Mittal, D., Kumar, V., Saxena, S. C., Khandelwal, N., and Kalra, N.,
Enhancement of the ultrasound images by modified anisotropic diffusion
method, Springer, Medical and biological engineering and computing, vol. 48,
no.12, pp. 12811291, 2010.
21.
Chan, T. and Vese, L., Active contours without edges, IEEE Transactions on
Image Processing, vol. 10, no. 2, pp. 266277, 2001.
22.
Wang, W., Zhu, L., Qin, J., Chui, Y. P., Li, B. N., and Heng, P. A., Multi scale
geodesic active contours for ultrasound image segmentation using speckle
reducing anisotropic diffusion, Elsevier, Optics and Lasers in Engineering, vol.
54, pp.105116, 2014.
23.
Yu, Y., Acton, S. T., Speckle reducing anisotropic diffusion, IEEE Trans Image
Process, vol. 11, no.11, pp. 12601270, 2002.
24.
Huang, Q., Bai, X., Li, Y., Jin, L., and Li, X., Optimized graph-based
segmentation for ultrasound images, Elsevier, Neurocomputing, vol. 129, pp.
216224, 2014.
25.
Torbati, N., Ayatollahi, A., and Kermani, A., An efficient neural network based
method for medical image segmentation, Computers in Biology and Medicine,
Elsevier, vol.44, pp. 7687, 2014.
26.
Wu, W. J., Lin, S. W., and Moon, W. K., Combining support vector machine
with genetic algorithm to classify ultrasound breast tumor images, Elsevier,
Computerized Medical Imaging and Graphics, vol. 36, no.8, pp. 627 633, 2012.
27.
Lin, C. M., Hou, Y. L., Chen, T. Y., and Chen, K. H., Breast Nodules ComputerAided Diagnostic System Design Using Fuzzy Cerebellar Model Neural
Networks, IEEE Trans. on fuzzy systems, vol. 22, no. 3, pp. 693-699 ,2014.
28.
Huang, L., Shi, J., Wang, R., and Zhou S., Shearlet-based Ultrasound Texture
Features for Classification of Breast Tumor, IEEE 7th International Conference
on Internet Computing for Engineering and Science (ICICSE), pp. 116-121, 2013.
29.
American College of Radiology, Breast Imaging Reporting and Data System, 5th
ed.,
Reston,
VA,
https://shop.acr.org/Default.aspx?TabID=55&ProductId=66931383 , 2013.
30.
Moon, W. K., Lo, C. M., Cho, N., Chang, J. M., Huang, C. S., Chen , J. H., and
Chang, R. F., Computer-aided diagnosis of breast masses using quantified BIRADS findings, Elsevier Computer methods and programs in biomedicine, , vol.
111, no.1, pp. 8492, 2013.
31.
32.
Yang, M. C., Moon, W. K., Wang, Y. C. F., Bae, M. S., Huang, C. S., Chen, J.
H., and Chang, R. F., Robust Texture Analysis Using Multi-Resolution GrayScale Invariant Features for Breast Sonographic Tumor Diagnosis, IEEE Trans.
Med. Imag., vol. 32, no. 12, pp. 2262- 2272, 2013.
33.
Alam, S., Feleppa, E., and Rondeau, M., Ultrasonic multi-feature analysis
procedure for computer-aided diagnosis of solid breast lesions, Ultrasonic
imaging, vol. 38, no. 1, pp. 1738, 2011.
34.
Uniyal, N., Eskandari, H., Abolmaesumi, P., Sojoudi, S., Gordon, P., Warren, L.,
Rohling, R.N., Salcudean, S.E., and Moradi, M., "Ultrasound RF Time Series for
Classification of Breast Lesions," IEEE Transactions on Medical Imaging, vol.34,
no.2, pp.652-661, 2015.
35.
K. Nie, J.H. Chen, H.J. Yu, Y. Chu, O. Nalcioglu, M.Y. Su, Quantitative analysis
of lesion morphology and texture features for diagnostic prediction in breast MRI
, Academic Radiology, vol. 15, pp.15131525, 2008.
36.
37.
Shen W.C., Chang R.F., Moon W.K., Chou Y.H., C.S. Huang, Breast ultrasound
computer-aided diagnosis using BI-RADS features, Academic Radiology, vol. 14
pp. 928939, 2007.
38.
39.
Create
gray-level
co-occurrence
matrix
from
image,
http://www.mathworks.com/help/images/ref/graycomatrix.html, Retrieved April
15, 2016
40.
Gmez W., Leija L., and Daz-Prez A., Mutual information and intrinsic
dimensionality for feature selection, in Proc. 7th Int. Conf. Elect. Eng., Comput.
Sci. Automatic Control, Tuxtla Gutirrez,Mexico, pp. 339344, 2010.
41.
Pereira W. C., Alvarenga A. V., Infantosi A. F., Macrini L., and Pedreira C. E.,
A non-linear morphometric feature selection approach for breast tumor contour
fromultrasonic images, Comput. Biol.Med., vol. 40, no. 1112, pp. 912918,
2010.
:
: .
.
) (JNCR 468 9
.8004
.
.
-: .
.
.
.
.
) (ROIs
). (individual ROIs
.
.
.
.
.
.
99 5 85
) (GLCM
. .
.
.
905
68 89 .
8098 8095 .
(.)SVM
96.49
94.84 .
98.56
90.05 98.40 98.83 98.86 95.80 .
.
.