You are on page 1of 42

ABSTRACT

Biomedical Image Processing is a growing and demanding field. It

comprises of many different types of imaging methods likes CT scans, X-Ray and

MRI. These techniques allow us to identify even the smallest abnormalities in the

human body. The primary goal of medical imaging is to extract meaningful and

accurate information from these images with the least error possible. Out of the

various types of medical imaging processes available to us, MRI is the most

reliable and safe. Brain tumors are the most common malignant neurologic tumors

with the highest mortality and disability rate. Because of the delicate structure of

the brain, the clinical use of several commonly used biopsy diagnosis is limited for

brain tumors. In this project, SVM Classification is used to differentiate whether

it is malignant or benign brain tumors.


ஆய்வுசுருக்கம்

உயிர் வேதியியல் பட செயலாக்கமானது ேளர்ந்து ேரும் மற்றும்

வகாரும் துறையில் உள்ளது. CT ஸ்கான்ஸ், எக்ஸ்-வே மற்றும் எம்.ஆர்.ஐ.

வபான்ை பல ேறகயான இவமஜிங் முறைகள் இதில் அடங்கும். இந்த

நுட்பங்கள் மனித உடலில் உள்ள மிகச்ெிைிய இயல்புகறள கூட

அறடயாளம் காண அனுமதிக்கின்ைன. மருத்துே படங்களின் முதன்றம

வ ாக்கம், இந்த படங்களிலிருந்து அர்த்தமுள்ள மற்றும் துல்லியமான

தகேறல குறைந்தபட்ெம் பிறைகறளத் தீர்ப்பது ஆகும். எங்களுக்கு

கிறடக்கும் பல்வேறு ேறகயான மருத்துே இவமஜிங் செயல்முறைகளில்

MRI மிகவும் ம்பகமானது மற்றும் பாதுகாப்பானது. மூறள கட்டிகள் மிகவும்

சபாதுோன வீாியம் மிகுந்த ேம்பியல் கட்டிகளாகும், அதிக உயிாிைப்பு

மற்றும் இயலாறம வீதம். மூறளயின் சமன்றமயான கட்டறமப்பின்

காேணமாக, சபாதுோகப் பயன்படுத்தப்படும் பலபாதிப்பு வ ாயைிதலின்

மருத்துே பயன்பாடு மூறளக் கட்டிகளுக்கு மட்டுப்படுத்தப்பட்டுள்ளது.

கதிாியக்கேி அடிப்பறடயாகக் சகாண்ட துல்லியமற்ை கண்டைிதலுக்கு ஒரு

ேளரும் நுட்பமாகும்
CHAPTER 1

INTRODUCTION

1.1 Brain

The brain is an organ that serves as the center of the nervous system in all
vertebrate and most invertebrate animals. The brain is located in the head, usually
close to the sensory organs for senses such as vision. The brain is the most
complex organ in a vertebrate's body. In a human, the cerebral cortex contains
approximately 10–20 billion neurons, and the estimated number of neurons in the
cerebellum is 55–70 billion. Each neuron is connected by synapses to several
thousand other neurons. These neurons communicate with one another by means of
long protoplasmic fibers called axons, which carry trains of signal pulses called
action potentials to distant parts of the brain or body targeting specific recipient
cells.

Physiologically, the function of the brain is to exert centralized control over the
other organs of the body. The brain acts on the rest of the body both by generating
patterns of muscle activity and by driving the secretion of chemicals called
hormones. This centralized control allows rapid and coordinated responses to
changes in the environment. Some basic types of responsiveness such as reflexes
can be mediated by the spinal cord or peripheral ganglia, but sophisticated
purposeful control of behavior based on complex sensory input requires the
information integrating capabilities of a centralized brain.

A brain tumor occurs when abnormal cells form within the brain. There are two
main types of tumors: malignant or cancerous tumors and benign tumors.
Cancerous tumors can be divided into primary tumors, which start within the brain,
and secondary tumors, which have spread from elsewhere, known as brain
metastasis tumors. All types of brain tumors may produce symptoms that vary
depending on the part of the brain involved. These symptoms may include
headaches, seizures, problems with vision, vomiting and mental changes. The
headache is classically worse in the morning and goes away with vomiting. Other
symptoms may include difficulty walking, speaking or with sensations. As the
disease progresses, unconsciousness may occur.

1.2 MRI

Magnetic resonance imaging (MRI) is a medical imaging technique used in


radiology to form pictures of the anatomy and the physiological processes of the
body in both health and disease. MRI scanners use strong magnetic fields,
magnetic field gradients, and radio waves to generate images of the organs in the
body. MRI does not involve X-rays or the use of ionizing radiation, which
distinguishes it from CT or CAT scans and PET scans. Magnetic resonance
imaging is a medical application of nuclear magnetic resonance (NMR). NMR can
also be used for imaging in other NMR applications such as NMR spectroscopy.

While the hazards of X-rays are now well-controlled in most medical


contexts, MRI may still be seen as a better choice than a CT scan. MRI is widely
used in hospitals and clinics for medical diagnosis, staging of disease and follow-
up without exposing the body to radiation. However, MRI may often yield
different diagnostic information compared with CT. There may be risks and
discomfort associated with MRI scans. Compared with CT scans, MRI scans
typically take longer and are louder, and they usually need the subject to enter a
narrow, confining tube. In addition, people with some medical implants or other
non-removable metal inside the body may be unable to undergo an MRI
examination safely.

MRI was originally called NMRI (nuclear magnetic resonance imaging) and is
a form of NMR, though the use of 'nuclear' in the acronym was dropped to avoid
negative associations with the word. Certain atomic nuclei are able to absorb and
emit radio frequency energy when placed in an external magnetic field. In clinical
and research MRI, hydrogen atoms are most often used to generate a detectable
radio-frequency signal that is received by antennas in close proximity to the
anatomy being examined. Hydrogen atoms are naturally abundant in people and
other biological organisms, particularly in water and fat. For this reason, most MRI
scans essentially map the location of water and fat in the body. Pulses of radio
waves excite the nuclear spin energy transition, and magnetic field gradients
localize the signal in space. By varying the parameters of the pulse sequence,
different contrasts may be generated between tissues based on the relaxation
properties of the hydrogen atoms therein.

1.3 Overview of the Project

Recent advances in image acquisition, standardization and image analysis have


facilitated the development of radiomics, which converts medical images into
mineable high-throughput features for use in diagnoses. In 635 radiomics features
extracted from computed tomography (CT) were successfully used to predict
distant metastasis (DM) for lung adenocarcinoma patients. A radiomics descriptor
called the Co-occurrence of Local Anisotropic Gradient Orientations (CoLlAGe)
has also been exploited to distinguish different molecular sub-types of breast
cancer. In 671 radiomics features were extracted to noninvasively estimate
isocitrate dehydrogenase 1 (IDH1) mutations in gliomas. In high-throughput
texture features extracted from dynamic contrast-enhanced magnetic resonance
imaging (MRI) were used to predict the response of breast cancer to neoadjuvant
chemotherapy. During disease diagnosis, these methods not only use some features
that can be directly observed from images, such as location and shape, but also
extract various deep image features to make decisions, achieving encouraging
diagnostic performance.

Though radiomics methods have achieved interesting results in the analysis of


medical images, several challenges still exist in their key steps, including feature
extraction, feature selection, and classification. First, gray-level co-occurrence
matrix (GLCM)-based and gray-level run-length matrix (GLRLM)-based texture
features are often used for radiomics analysis. These texture features represent the
statistical relations between a collection of adjacent pixels in a particular direction
(0°, 45°, 90° and 135°), but they do not consider the relations among multiple
neighboring pixels in different directions simultaneously; i.e., they neglect the
local image structures that play a critical role in the subsequent classification. More
importantly, the extraction of radiomics features currently depends on individual
contexts and varies from disease to disease. Features designed for one specific
disease are generally not applicable to the diagnosis of other diseases.

Second, there is a large amount of redundancy in the extracted high-throughput


features, which increases the risk of over fitting in the subsequent classification.
The p-value comparison and correlation analysis methods have been widely used
for feature selection in radiomics. These methods analyze the effect of each feature
on the classification result and then select the most discriminative features to
employ in performing classifications. However, since most of these methods
evaluate each feature independently, the effect of combinations of features on the
final classification is ignored. Hence, it is difficult to select the most effective
features. Third, because different medical image modalities provide
complementary information on lesions, a good classification model should be able
to effectively combine features from multi-modal images to further improve the
classification accuracy. Some existing methods directly sum multi-modal features
and then input them into the classifier; as a result, the intrinsic relationships among
the features are ignored.

Although many efforts have been made to independently improve each step in
radiomics analyses, a complete framework that can systematically address the
challenges in radiomics is still lacking. In recent years, using sparse representation
as the driving force for addressing image processing, data analysis and pattern
recognition problems has received growing research attention. In these
applications, sparse representation has the following advantages. First, sparse
representation normally exploits adaptive learning dictionaries rather than the
traditional analytically designed dictionaries with fixed basis (e.g., discrete-cosine-
transform, wavelet) to represent images, and it therefore provides the ability to
extract or represent various small textures and details, especially for singular lines
and surfaces. Furthermore, these singular features in images usually play decisive
roles in image classification. Second, sparse representation considers the natural
signal as a linear combination of a few atoms from an over complete dictionary.
Therefore, based on effective sparse coding algorithms, sparse representation can
accurately and efficiently select the most essential features used to express data
and delete redundant information.
CHAPTER 2

LITERATURE REVIEW

2.1 R. J. Gillies, P. E. Kinahan, and H. Hricak, “Radiomics: images are more


than pictures, they are data,” Radiology, vol. 278, no. 2, pp. 565-580, 2015.

In the past decade, the field of medical image analysis has grown
exponentially, with an increased number of pattern recognition tools and an
increase in data set sizes. These advances have facilitated the development of
processes for high-throughput extraction of quantitative features that result in the
conversion of images into mineable data and the subsequent analysis of these data
for decision support; this practice is termed radiomics. This is in contrast to the
traditional practice of treating medical images as pictures intended solely for visual
interpretation. Radiomic data contain first-, second-, and higher-order statistics.
These data are combined with other patient data and are mined with sophisticated
bioinformatics tools to develop models that may potentially improve diagnostic,
prognostic, and predictive accuracy. Because radiomics analyses are intended to be
conducted with standard of care images, it is conceivable that conversion of digital
images to mineable data will eventually become routine practice. This report
describes the process of radiomics, its challenges, and its potential power to
facilitate better clinical decision making, particularly in the care of patients with
cancer.

2.2 H. J. Aerts, E. R. Velazquez, R. T. Leijenaar, C. Parmar, P. Grossmann, S.


Carvalho, J. Bussink, R. Monshouwer, B. Haibe-Kains, D. Rietveld et al.,
“Decoding tumour phenotype by noninvasive imaging using a quantitative
radiomics approach,” Nat. Commun, vol. 5, pp. 4006, 2014.
Human cancers exhibit strong phenotypic differences that can be visualized
noninvasively by medical imaging. Radiomics refers to the comprehensive
quantification of tumour phenotypes by applying a large number of quantitative
image features. Here we present a radiomic analysis of 440 features quantifying
tumour image intensity, shape and texture, which are extracted from computed
tomography data of 1,019 patients with lung or head-and-neck cancer. We find that
a large number of radiomic features have prognostic power in independent data
sets of lung and head-and-neck cancer patients, many of which were not identified
as significant before. Radiogenomics analysis reveals that a prognostic radiomic
signature, capturing intratumour heterogeneity, is associated with underlying gene-
expression patterns. These data suggest that radiomics identifies a general
prognostic phenotype existing in both lung and head-and-neck cancer. This may
have a clinical impact as imaging is routinely used in clinical practice, providing
an unprecedented opportunity to improve decision-support in cancer treatment at
low cost.

2.3 T. P. Coroller, P. Grossmann, Y. Hou, E. R. Velazquez, R. T. Leijenaar, G.


Hermann, P. Lambin, B. Haibe-Kains, R. H. Mak, and H. J. Aerts, “CT-based
radiomic signature predicts distant metastasis in lung adenocarcinoma,”
Radiother Oncol, vol. 114, no. 3, pp. 345-350, 2015.

Radiomics provides opportunities to quantify the tumor phenotype non-


invasively by applying a large number of quantitative imaging features. This study
evaluates computed-tomography (CT) radiomic features for their capability to
predict distant metastasis (DM) for lung adenocarcinoma patients. Although only
basic metrics are routinely quantified, this study shows that radiomic features
capturing detailed information of the tumor phenotype can be used as a prognostic
biomarker for clinically-relevant factors such as DM. Moreover, the radiomic-
signature provided additional information to clinical data. With high-throughput
computing, it is now possible to rapidly extract innumerable quantitative features
from tomographic images (computed tomography [CT], magnetic resonance [MR],
or positron emission tomography [PET] images). The conversion of digital medical
images into mineable high-dimensional data, a process that is known as radiomics,
is motivated by the concept that biomedical images contain information that
reflects underlying pathophysiology and that these relationships can be revealed
via quantitative image analyses. Although radiomics is a natural extension of
computer-aided diagnosis and detection (CAD) systems, it is significantly different
from them. CAD systems are usually standalone systems that are designated by the
Food and Drug Administration for use in either the detection or diagnosis of
disease.

2.4 Rietveld, M. M. Rietbergen, B. Haibe-Kains, P. Lambin, and H. J. Aerts,


“Radiomic feature clusters and prognostic signatures specific for lung and
head & neck cancer,” Sci Rep, vol. 5, pp. 11044, 2015.

Radiomics provides a comprehensive quantification of tumor phenotypes by


extracting and mining large number of quantitative image features. To reduce the
redundancy and compare the prognostic characteristics of radiomic features across
cancer types, we investigated cancer-specific radiomic feature clusters in four
independent Lung and Head &Neck (H) cancer cohorts (in total 878 patients).
Radiomic features were extracted from the pre-treatment computed tomography
(CT) images. Consensus clustering resulted in eleven and thirteen stable radiomic
feature clusters for Lung and H cancer, respectively. These clusters were validated
in independent external validation cohorts using rand statistic (Lung RS = 0.92, p <
0.001, H RS = 0.92, p < 0.001). Our analysis indicated both common as well as
cancer-specific clustering and clinical associations of radiomic features. Strongest
associations with clinical parameters: Prognosis Lung CI = 0.60 ± 0.01, Prognosis
H CI = 0.68 ± 0.01; Lung histology AUC = 0.56 ± 0.03, Lung stage AUC = 0.61 ±
0.01, H HPV AUC = 0.58 ± 0.03, H stage AUC = 0.77 ± 0.02. Full utilization of
these cancer-specific characteristics of image features may further improve
radiomic biomarkers, providing a non-invasive way of quantifying and monitoring
tumor phenotypic characteristics in clinical practice.

2.5 P. Prasanna, P. Tiwari, and A. Madabhushi, “Co-occurrence of local


anisotropic gradient orientations (collage): a new radiomics descriptor,” Sci
Rep, vol. 6, pp. 37241, 2016.

In this paper, we introduce a new radiomic descriptor, Co-occurrence of


Local Anisotropic Gradient Orientations (CoLlAGe) for capturing subtle
differences between benign and pathologic phenotypes which may be visually
indistinguishable on routine anatomic imaging. CoLlAGe seeks to capture and
exploit local anisotropic differences in voxel-level gradient orientations to
distinguish similar appearing phenotypes. CoLlAGe involves assigning every
image voxel an entropy value associated with the co-occurrence matrix of gradient
orientations computed around every voxel. The hypothesis behind CoLlAGe is that
benign and pathologic phenotypes even though they may appear similar on
anatomic imaging, will differ in their local entropy patterns, in turn reflecting
subtle local differences in tissue microarchitecture. We demonstrate CoLlAGe’s
utility in three clinically challenging classification problems: distinguishing (1)
radiation necrosis, a benign yet confounding effect of radiation treatment, from
recurrent tumors on T1-w MRI in 42 brain tumor patients, (2) different molecular
sub-types of breast cancer on DCE-MRI in 65 studies and (3) non-small cell lung
cancer (adenocarcinomas) from benign fungal infection (granulomas) on 120 non-
contrast CT studies. For each of these classification problems, CoLlAGE in
conjunction with a random forest classifier outperformed state of the art radiomic
descriptors (Haralick, Gabor, Histogram of Gradient Orientations).

2.6 J. Yu, Z. Shi, Y. Lian, Z. Li, Y. Gao, Y. Wang, L. Chen, and Y. Mao,
“Noninvasive IDH1 mutation estimation based on a quantitative radiomics
approach for grade II glioma,” Eur Radiol, pp. 1-14, 2016.

The status of isocitrate dehydrogenase 1 (IDH1) is highly correlated with the


development, treatment and prognosis of glioma. We explored a noninvasive
method to reveal IDH1 status by using a quantitative radiomics approach for grade
II glioma. Noninvasive IDH1 status estimation can be obtained with a radiomics
approach. Automatic and quantitative processes were established for noninvasive
biomarker estimation. High-throughput MRI features are highly correlated to IDH1
states. Area under the ROC curve of the proposed estimation method reached 0.86.
Although radiomics can be applied to a large number of conditions, it is most well
developed in oncology because of support from the National Cancer Institute
(NCI) Quantitative Imaging Network (QIN) and other initiatives from
the NCI Cancer Imaging Program. As described in subsequent sections of this
article, the potential of radiomics to contribute to decision support in oncology has
grown as knowledge and analytic tools have evolved. Quantitative image features
based on intensity, shape, size or volume, and texture offer information on tumor
phenotype and microenvironment (or habitat) that is distinct from that provided by
clinical reports, laboratory test results, and genomic or proteomic assays.

2.7 H. J. Aerts, P. Grossmann, Y. Tan, G. R. Oxnard, N. Rizvi, L. H.


Schwartz, and B. Zhao, “Defining a radiomic response phenotype: a pilot
study using targeted therapy in NSCLC,” Sci Rep, vol. 6, pp. 33860, 2016.
Medical imaging plays a fundamental role in oncology and drug
development, by providing a non-invasive method to visualize tumor phenotype.
Radiomics can quantify this phenotype comprehensively by applying image-
characterization algorithms, and may provide important information beyond tumor
size or burden. In this study, we investigated if radiomics can identify a gefitinib
response-phenotype, studying high-resolution computed-tomography (CT) imaging
of forty-seven patients with early-stage non-small cell lung cancer before and after
three weeks of therapy. On the baseline-scan, radiomic-feature Laws-Energy was
significantly predictive for EGFR-mutation status (AUC = 0.67, p = 0.03), while
volume (AUC = 0.59, p = 0.27) and diameter (AUC = 0.56, p = 0.46) were not.
Although no features were predictive on the post-treatment scan (p > 0.08), the
change in features between the two scans was strongly predictive (significant
feature AUC-range = 0.74–0.91). A technical validation revealed that the
associated features were also highly stable for test-retest (mean ± std:
ICC = 0.96 ± 0.06). This pilot study shows that radiomic data before treatment is
able to predict mutation status and associated gefitinib response non-invasively,
demonstrating the potential of radiomics-based phenotyping to improve the
stratification and response assessment between tyrosine kinase inhibitors (TKIs)
sensitive and resistant patient populations.

2.8 J. R. Teruel, M. G. Heldahl, P. E. Goa, M. Pickles, S. Lundgren, T. F.


Bathen, and P. Gibbs, “Dynamic contrast-enhanced mri texture analysis for
pretreatment prediction of clinical and pathological response to neoadjuvant
chemotherapy in patients with locally advanced breast cancer,” Nmr in
Biomedicine, vol. 27, no. 8, pp. 887-896, 2014.

The aim of this study was to investigate the potential of texture analysis,
applied to dynamic contrast-enhanced MRI (DCE-MRI), to predict the clinical and
pathological response to neoadjuvant chemotherapy (NAC) in patients with locally
advanced breast cancer (LABC) before NAC is started. Fifty-eight patients with
LABC were classified on the basis of their clinical response according to the
Response Evaluation Criteria in Solid Tumors (RECIST) guidelines after four
cycles of NAC, and according to their pathological response after surgery. T1 -
weighted DCE-MRI with a temporal resolution of 1 min was acquired on a 3-T
Siemens Trio scanner using a dedicated four-channel breast coil before the onset of
treatment. Each lesion was segmented semi-automatically using the 2-min post-
contrast subtracted image. Sixteen texture features were obtained at each non-
subtracted post-contrast time point using a gray level co-occurrence matrix.
Appropriate statistical analyses were performed and false discovery rate-based q
values were reported to correct for multiple comparisons. Statistically significant
results were found at 1-3 min post-contrast for various texture features for the
prediction of both the clinical and pathological response. Our results suggest that
texture analysis could provide clinicians with additional information to increase the
accuracy of prediction of an individual response before NAC is started.

2.9 V. Kumar, Y. Gu, S. Basu, A. Berglund, S. A. Eschrich, M. B. Schabath,


K. Forster, H. J. Aerts, A. Dekker, D. Fenstermacher et al., “Radiomics: the
process and the challenges,” Magn Reson Imaging, vol. 30, no. 9, pp. 1234,
2012.

Radiomics refers to the extraction and analysis of large amounts of advanced


quantitative imaging features with high throughput from medical images obtained
with computed tomography, positron emission tomography or magnetic resonance
imaging. Importantly, these data are designed to be extracted from standard-of-care
images, leading to a very large potential subject pool. Radiomics data are in a
mineable form that can be used to build descriptive and predictive models relating
image features to phenotypes or gene-protein signatures. The core hypothesis of
radiomics is that these models, which can include biological or medical data, can
provide valuable diagnostic, prognostic or predictive information. The radiomics
enterprise can be divided into distinct processes, each with its own challenges that
need to be overcome: (a) image acquisition and reconstruction, (b) image
segmentation and rendering, (c) feature extraction and feature qualification and (d)
databases and data sharing for eventual (e) ad hoc informatics analyses. Each of
these individual processes poses unique challenges. For example, optimum
protocols for image acquisition and reconstruction have to be identified and
harmonized. Also, segmentations have to be robust and involve minimal operator
input. Features have to be generated that robustly reflect the complexity of the
individual volumes, but cannot be overly complex or redundant.

2.10 S. Das and U. R. Jena, “Texture classification using combination of LBP


and GLRLM features along with KNN and multiclass SVM classification,” in
Proc. IEEE Confer. CCIS, Mathura, 2016, pp. 115-119.

The paper presents a unique combination of texture feature extraction


techniques which can be used in image texture analysis. Setting the prime objective
of classifying different texture images, the Local Binary Pattern (LBP) and a
modified form of Gray Level Run Length Matrix (GLRLM) are implemented
initially. The next phase involves use of combination of the former two methods to
extract improved features. The feature vectors were obtained by defining the
features on the transformed images. These texture features are classified using two
classification algorithms, KNN and multiclass SVM. The results of above feature
extraction techniques with individual classifiers have been compared. The
comparison yields that the combination of LBP and GLRLM texture features
shows better classification rate than the features obtained from individual feature
extraction techniques. Among the classifiers, Support Vector Machine has better
CHAPTER 3

SYSTEM ANALYSIS

3.1 Proposed System

In this system, we describe the procedures of the proposed sparse


representation-based radiomics system illustrates the overall framework of the
proposed method, which comprises four important components: image
segmentation, feature extraction, feature selection and multi-feature collaborative
classification. In the training stage, first for each imaging modality, a portion of the
training images is selected to train the feature extraction dictionary. Second, a
patch-based sparse representation method is used to extract the texture features
over the training dictionary. Third, iterative sparse representation-based feature
selection is performed to select some discriminative features. Last, multi-feature
collaborative classification is wrapped into the weight training framework to
evaluate the predictive accuracy of candidate subset of weights using leave-one-out
cross validation (LOOCV).

In the testing stage, the training feature extraction dictionary, the selected
feature index, the training weights and the selected features of the training samples
itself are used to identify testing images. More technical details on the analysis
components are given in the following sections. Segmentation is a critical step in
radiomics because the segmented volumes are directly converted into quantitative
high-throughput features. Various previously reported deep learning-based
segmentation methods have achieved state-of-the-art segmentation performance. In
this paper, a convolutional neural network (SVM)-based method that we proposed
in our previous work is utilized to segment MRI images. As tumor segmentation is
not a major concern in this work, we present the network structure and
implementation process of our SVM-based classification.

3.2 Block Diagram

Feature Feature
Input Image Preprocessing
Extraction Selection

Output SVM
Image Classification

Fig 3.1 Block Diagram

This block diagram consists of input image ,preprocessing, feature


selection, feature selection, output image ,SVM Classification. The input image is
trained and tested. In preprocessing, the median filter is used. In Feature extraction,
sift is used .Feature Selection is the important process in preprocessing.

3.3 Flow Chart

This Flow chart consists of input data which explores the data
and the preprocessing step is done. Feature extraction and feature
selection process is done.
Fig 3.2 Flow Chart

3.4 Pre-Processing

Data pre-processing is an important step for any data analysis problem. It is


often appropriate to prepare your data in such way to best expose the structure of
the problem to the machine learning algorithms that you intend to use.
Preprocessing involves a number of activities like Data cleaning to fill in missing
values, smooth noisy data, identify or remove outliers, and resolve inconsistencies;
Data integration for using multiple databases, data cubes, or files; Data
transformation which is used for normalization and aggregation; Data reduction for
reducing the volume but producing the same or similar analytical results; Data
discretization which includes part of data reduction where replacing numerical
attributes with nominal ones takes place. The simplest method to evaluate the
performance of a machine learning algorithm is to use different training and testing
datasets. In this paper the available data is split into a training set and a testing set.
(70% training, 30% test). The first set is used to Train the algorithm and
predictions are made on the second part. Then we evaluate the predictions against
the expected results.
3.5 Support Vector Machine (SVM)

“Support Vector Machine” (SVM) is a supervised machine learning


algorithm which can be used for both classification and regression challenges.
However, it is mostly used in classification problems. In this algorithm, we plot
each data item as a point in n-dimensional space (where n is number of features
you have) with the value of each feature being the value of a particular coordinate.
Then, we perform classification by finding the hyper-plane that differentiate the
two classes very well.

SVM introduced by Cortes is generally used for classification purpose.


SVMs are efficient learning approaches for training classifiers based on several
functions like polynomial functions, radial basis functions, neural networks etc. It
is considered as a supervised learning approach that produces input-output
mapping functions from a labeled training dataset. SVM has significant learning
ability and hence is broadly applied in pattern recognition.

SVMs are universal approximators which depend on the statistical and


optimizing theory. The SVM is particularly striking the biological analysis due to
its capability to handle noise, large dataset and large input spaces.

The fundamental idea of SVM can be described as follows:


o Initially, the inputs are formulated as feature vectors.
o Then, by using the kernel function, these feature vectors are
mapped to separate the classes of training vector

A global hyper plane is sought by the SVM in order to separate both the
classes of examples in training set and avoid over fitting. This phenomenon of
SVM is more superior in comparison to other machine learning techniques
which are based on artificial intelligence. The mapping of the input-output
functions from a set of labeled training data set is generated by the supervised
learning method called SVM.
In a high dimensional feature space, SVM uses a hypothesis space of linear
functions which are trained with a learning technique from optimization theory that
employs a learning bias derived from statistical learning theory. In Support Vector
machines, the classifier is created using a hyper-linear separating plane. It provides
the ideal solution for problems which are not linearly separated in the input space.
The original input space is non-linearly transformed into a high dimensional
feature space, where an optimal separating hyper plane is found and the problem is
solved. A maximal margin classifier with respect to the training data is obtained
when the separating planes are optimal.

For binary classification SVM determines an Optimal Separating Hyperplane


(OSH) which produces a maximum margin between two categories of data. To
create an OSH, SVM maps data into a higher dimensional feature space and carries
out this nonlinear mapping with the help of a kernel function. Then, SVM builds a
linear OSH between two classes of data in the higher feature space. Data vectors
that are closer to the OSH in the higher feature space are known as Support Vectors
(SVs) and include all data necessary for classification.

3.5.1 Applications

SVMs can be used to solve various real world problems:

SVMs are helpful in text and hypertext categorization as their application


can significantly reduce the need for labeled training instances in both the
standard inductive and transductive settings.
Classification of images can also be performed using SVMs. Experimental
results show that SVMs achieve significantly higher search accuracy than
traditional query refinement schemes after just three to four rounds of relevance
feedback. This is also true of image segmentation systems, including those
using a modified version SVM that uses the privileged approach as suggested
by Vapnik.
Hand-written characters can be recognized using SVM
The SVM algorithm has been widely applied in the biological and other
sciences. They have been used to classify proteins with up to 90% of the
compounds classified correctly. Permutation tests based on SVM weights have
been suggested as a mechanism for interpretation of SVM models.
Support vector machine weights have also been used to interpret SVM models
in the past. Posthoc interpretation of support vector machine models in order to
identify features used by the model to make predictions is a relatively new area
of research with special significance in the biological sciences.
CHAPTER 4

SOFTWARE DESCRIPTION

4.1 MATLAB Environment

The name MATLAB stands for Matrix Laboratory. MATLAB was written
originally to provide easy access to matrix software developed by the c projects.
MATLAB is a high-performance language for technical computing. It integrates
computation, visualization, and programming in an easy-to-use environment where
problems and solutions are expressed in familiar mathematical notation.

MATLAB is an interactive system whose basic data element is an array that


does not require dimensioning. This allows you to solve many technical computing
problems, especially those with matrix and vector formulations, in a fraction of the
time it would take to write a program in a scalar non-interactive language such as
C or FORTRAN. The name MATLAB stands for matrix laboratory. MATLAB
was originally written to provide easy access to matrix software developed by the
LINPACK and EISPACK projects. Today, MATLAB engines incorporate the
LAPACK and BLAS libraries, embedding the state of the art in software for matrix
computation MATLAB has evolved over a period of years with input from many
users. In university environments, it is the standard instructional tool for
introductory and advanced courses in mathematics, engineering, and science. In
industry, MATLAB is the tool of choice for high-productivity research,
development, and analysis.

Starting MATLAB
When start MATLAB, a special window called the MATLAB desktop appears.The
desktop is a window that contains other windows. The major tools within or
accessible from the desktop are:

 Command window
 Command history
 The workspace
 The current directory
 The help browser
 The start button

Command Window

Use the command window to enter variables and run functions and M-files.

Command History

Lines you enter in the Command Window are logged in the Command
History Window. In the Command history, you can view previously used
functions, and copy and execute selected lines. To save the input and output from a
MATLAB session to a file, use the function.

Help Browser

Use the Help browser to search and view documentation for all your Math
Works products. The Help browser is a Web browser integrated into the MATLAB
desktop that displays HTML documents.

Current Directory
MATLAB file operations use the current directory and the search path as
reference points. Any file you want to run must either be in the current directory or
on the search path.

Workspace

The MATLAB workspace consists of set of variables (named arrays).To


view the workspace and information about each variables, use the Workspace
browser, or use the functions who and whose.To delete the variables from the
workspace, select the variable and select delete from the Edit menu. Alternatively,
use the clear function.

Fig5.3 The Graphical Representation of Matlab

QUITTING MATLAB
1) To end your MATLAB session, type quit in the Command Window, or
select File →Exit MATLAB in the desktop main menu All the commands were
executed in the Command Window.The problem is that the commands entered in
the Command Window cannot be saved and executed again for several times.
Therefore, a different way of executing repeatedly commands with MATLAB is:

To create a file with a list of commands,

 save the file,


 run the file

If needed, corrections or changes can be made to the commands in the file. The
files that are used for this purpose are called script files or scripts for short.This
section covers the following topics:

 M-File Scripts
 M-File Functions

M-file Scripts

A script file is an external file that contains a sequence of MATLAB


statements. Script files have a filename extension .m and are often called M-files.
M-files can be scripts that simply execute a series of MATLAB statements, or they
can be functions that can accept arguments and can produce one or more outputs.

M-file Functions

As mentioned earlier, functions are programs (or routines) that accept


input arguments and return output arguments. Each M-file function (or function or
M-file for short) has its own area of workspace, separated from the MATLAB base
workspace.
Image Representation

An image is stored as a matrix using standard Matlab matrix conventions.


There are five basic types of images supported by Matlab:

1. Indexed images

2. Intensity image

3. Binary images

4. RGB images

5. 8-bit images

MATLAB handles images as matrices. This involves breaking each pixel of


an image down into the elements of a matrix. MATLAB distinguishes between
color and grayscale images and therefore their resulting image matrices differ
slightly.

A color is a composite of some basic colors. MATLAB therefore breaks


each individual pixel of a color image (termed ‘true color’) down into Red, Green
and Blue values. What we get as a result, for the entire image, are 3 matrices, one
representing each color. The three matrices are stacked next to each other creating
a 3 dimensional m by n by 3 matrixes.
Fig5.4 Color Image Representation and RGB matrix

A grayscale image is a mixture of black and white colors. These colors, or


as some may term as ‘shades’, are not composed of Red, Green or Blue colors. But
instead they contain various increments of colors between white and black.
Therefore to represent this one range, only one color channel is needed. Thus we
only need a 2 dimensional matrix, m by n by 1. MATLAB terms this type of
matrix as an Intensity Matrix, because the values of such a matrix represent
intensities of one color.For an image which as height of 5 pixels and width of 10
pixels the resulting matrix would be a 5 by 10 matrix for grayscale image.
Fig5.5 Grayscale Image Representation

Image Processing Toolbox

Color Transformation Functions

First group of operations is responsible for changes and information


concerning color transformation of images. Couples of functions do not change
anything in the picture but they are crucial when it comes to gain information
about it, without need of opening the actual object of interests. Is bw returns value
1 if the image is black white, and value 0 otherwise. Some operations have sense
only when executed on binary graphic files. For example adjusting contrast,
brightness or other changes, usually made on colorful pictures, would not work
with black and white images. Function is gray (A), similarly to previous one,
checks color map of the image. As the name suggests, this time function returns
value 1 if the picture is grayscale and value 0 otherwise. It may also become useful
while deciding if some operations can be performed on the file. Is RGB informs if
examined file is the RGB image. These three functions are essential when it comes
to deciding about changing the colormap or color system. Knowing if the picture is
black and white, grayscale or RGB determines what transformations can be done to
the file. There would be no point in trying to make some changes to the image, if
they are inoperative for some color models or maps.

Another smaller group of functions are those responsible for picture


enhancements. Imadjust adjusts image intensity values. As an additional parameter
user is allowed to specify two squared brackets ranges. Pixels that do not belong to
those ranges are clipped. That is how this procedure increases contrast of the input
image. Other function that is responsible for contrast changes is Imcontrast, which
creates ready-built contrast adjustment tool. It takes opened picture as an object of
contrast customization. Unfortunately, this tool works only with grayscale images.

Spatial Transformation Functions

Spatial transformation functions are separate group that is responsible for


all changes concerning size, rotating and cropping an image. A simple and
effective command is Imresize. It takes two arguments in the round brackets – the
name of the picture and after the comma, a value that stands for multiplier. If this
number is between zero and one, then the result image is smaller. Respectively, if
this constant is greater than one, therefore the output picture is bigger than the
original.

Open, Save, Display Functions

This group of Image Processing Toolbox handles basic operations like


opening, closing, displaying and saving the image file. In addition Matlab’s library
contains couple of useful commands. Imread deals with reading image from
graphics file. As a para-meter in the brackets it takes the name of the file and its
extension. Among supported formats are bmp, gif, jpeg, png and tiff. Imread
returns a two-dimensional array if the image is grayscale and a three-dimensional
one, if the picture is color. The function mentioned above, allows also reading an
indexed image and an associating colormap with it. In order to do that, instead of
giving one variable as a result, user needs to put second variable that will stand for
the map, just after the comma in square brackets.

All optional parameters, for other file formats can be found in the Matlab
documentation. A very useful function exists in Image Processing Toolbox.
Imshow is responsible for displaying an image. It works with black and white,
grayscale and color pictures. Simply, matrix that includes a graphic file or just a
filename can be treated as a parameter for this procedure. Alternatively, the image
can be displayed with its colormap. Map should be given after the comma in the
brackets, along the filename. The shown picture has to be in the current directory
or specified by the path to the file.

Other Functions

Last paragraph of this chapter will describe miscellaneous functions from


Image Processing Toolbox and Mat lab’s main library. Often used Imfinfo displays
various information about the image. Among all the data fields, returned by this
procedure there are nine of them that are the same with every file format. Those
are:

• Filename – contains name of the image;

• FileModDate – last date of modification;

• FileSize – an integer indicating the size of the file, in bytes;

• Format – graphic file extension format;


• FormatVersion – number or string describing the file format version;

• Width – width of the image in pixels;

• Height – height of the image in pixels;

• Bit Depth – number of bits per pixel;

• Color Type – indicates type of the image, either ‘TrueColor’ for RGB image,
‘grayscale’ for grayscale image or ‘indexed’ for an indexed image.

Image Processing Toolbox function Impixel may become helpful when pixel color
values (red, green and blue) are required. For the most up-to-date information
about system requirements, see the system requirements page, available in the
products area at the Math Works Web site (www.mathworks.com).
CHAPTER 5

RESULT AND DISCUSSION

The input image is trained and tested.

The tested image is given in preprocessing.

In preprocessing ,Median filter is used.

In feature extraction,sift is used.

In Classification,SVM Classification is used.

5.1 Database Image

5.2 Input Image

 At first, select the tumor input image for further process.


5.4 Preprocessing Image

 Pre-processing is a common name for operations with images at the lowest


level of abstraction both input and output are intensity images.
 The aim of pre-processing is an improvement of the image data that
suppresses unwanted distortions or enhances some image features important for
further processing.
 Image pre-processing methods use the considerable redundancy in images.
 Neighboring pixels corresponding to one object in real images have
essentially the same or similar brightness value.
 Thus, distorted pixel can often be restored as an average value of
neighboring pixels.

5.5 Feature Extraction


 In feature extraction, each sub-dictionary in the feature extraction dictionary
is trained from the corresponding class of images.
 Hence, the feature extraction dictionary contains some distinctive structural
features; thus, the sparse representation coefficients of the testing image are forced
to exhibit some differences in their distributions.
 The atoms in the dictionaries represent the fine textural details of the image,
and the linear combination of these small details makes up the entire tumor region.
 Hence, through the use of atoms to sparsely represent the testing tumor
image and by comparing the significant differences among the atoms used to
construct the testing tumor images, the classification of the testing tumor images is
determined.

5.6 Feature Selection and Weight Calculation


 Among the extracted features, many are highly redundant. This high
redundancy is primarily due to two aspects.
 On the one hand, different classes of images, such as PCNSL images and
GBM images, commonly contain some of the same texture information; thus, some
features of different classes of images will be very similar and meaningless for
classification.
 On the other hand, due to the correlations between features, not all features
are crucial in classification.
 Redundant features both increase the computational complexity and have a
negative effect on classification.
5.7 Classification
 An SVM classifies data by finding the best hyperplane that separates all data
points of one class from those of the other class.
 The best hyperplane for an SVM means the one with the largest margin
between the two classes.
 Margin means the maximal width of the slab parallel to the hyperplane that
has no interior data points.
 Tumor segmentation is a fundamental step in radiomics, and all the tumors
in our experiments are automatically segmented using SVM.
 To obtain accurate segmentation of the tumor region, all images were skull-
stripped in advance.
 In addition, for a few images with smaller tumor volumes, a region of
interest was determined by an experienced doctor in advance, and the segmentation
method was then used to segment the tumor.
 It can be seen that the segmented tumor region contains both enhancing
tumor and peritumoral edema.
5.8 Output

 After the classification process the type of tumor is detected.


CHAPTER 6

CONCLUSION AND FUTURE ENHANCEMENT

7.1 Conclusion

This paper proposes a novel radiomics framework for the differentiation of


PCNSL and GBM and for IDH1 mutation estimation based on sparse
representation. This framework includes sparse representation-based feature
extraction, iterative sparse representation-based feature selection and multi-feature
collaborative sparse representation classification. Compared with traditional
radiomics methods, sparse representation-based feature extraction can finely and
effectively quantify the high-throughput texture features of images, as it
investigates the texture characteristics of patches of the image. More importantly,
after effective feature selection, a multi-feature collaborative sparse representation
classification method is designed to further improve the differentiation accuracy by
combining multi-modal image features at the sparse representation coefficient
level. The results show that the proposed method yields improved sensitivity,
specificity, and accuracy over previously reported state-of-the-art methods. It is
worth emphasizing that the proposed method is highly robust, due to its automatic
diagnosis; manual intervention is not required at any point in the entire process.

7.2 Future Enhancement

In future work, we will improve the classification dictionary through


dictionary learning and further validate the proposed method on a larger dataset.
REFERENCES

1. R. J. Gillies, P. E. Kinahan, and H. Hricak, “Radiomics: images are more


than pictures, they are data,” Radiology, vol. 278, no. 2, pp. 565-580, 2015.
2. H. J. Aerts, E. R. Velazquez, R. T. Leijenaar, C. Parmar, P. Grossmann, S.
Carvalho, J. Bussink, R. Monshouwer, B. Haibe-Kains, D. Rietveld et al.,
“Decoding tumour phenotype by noninvasive imaging using a quantitative
radiomics approach,” Nat. Commun, vol. 5, pp. 4006, 2014.
3. T. P. Coroller, P. Grossmann, Y. Hou, E. R. Velazquez, R. T. Leijenaar, G.
Hermann, P. Lambin, B. Haibe-Kains, R. H. Mak, and H. J. Aerts, “CT-
based radiomic signature predicts distant metastasis in lung
adenocarcinoma,” Radiother Oncol, vol. 114, no. 3, pp. 345-350, 2015.
4. C. Parmar, R. T. Leijenaar, P. Grossmann, E. R. Velazquez, J. Bussink, D.
Rietveld, M. M. Rietbergen, B. Haibe-Kains, P. Lambin, and H. J. Aerts,
“Radiomic feature clusters and prognostic signatures specific for lung and
head & neck cancer,” Sci Rep, vol. 5, pp. 11044, 2015.
5. P. Prasanna, P. Tiwari, and A. Madabhushi, “Co-occurrence of local
anisotropic gradient orientations (collage): a new radiomics descriptor,” Sci
Rep, vol. 6, pp. 37241, 2016.
6. J. Yu, Z. Shi, Y. Lian, Z. Li, Y. Gao, Y. Wang, L. Chen, and Y. Mao,
“Noninvasive IDH1 mutation estimation based on a quantitative radiomics
approach for grade II glioma,” Eur Radiol, pp. 1-14, 2016.
7. H. J. Aerts, P. Grossmann, Y. Tan, G. R. Oxnard, N. Rizvi, L. H. Schwartz,
and B. Zhao, “Defining a radiomic response phenotype: a pilot study using
targeted therapy in NSCLC,” Sci Rep, vol. 6, pp. 33860, 2016.
8. J. R. Teruel, M. G. Heldahl, P. E. Goa, M. Pickles, S. Lundgren, T. F.
Bathen, and P. Gibbs, “Dynamic contrast-enhanced mri texture analysis for
pretreatment prediction of clinical and pathological response to neoadjuvant
chemotherapy in patients with locally advanced breast cancer,” Nmr in
Biomedicine, vol. 27, no. 8, pp. 887-896, 2014.
9. V. Kumar, Y. Gu, S. Basu, A. Berglund, S. A. Eschrich, M. B. Schabath, K.
Forster, H. J. Aerts, A. Dekker, D. Fenstermacher et al., “Radiomics: the
process and the challenges,” Magn Reson Imaging, vol. 30, no. 9, pp. 1234,
2012.
10.S. Das and U. R. Jena, “Texture classification using combination of LBP and
GLRLM features along with KNN and multiclass SVM classification,” in
Proc. IEEE Confer. CCIS, Mathura, 2016, pp. 115-119.
11. M. Aharon, M. Elad and A. Bruckstein, “K-SVD: An algorithm for
designing overcomplete dictionaries for sparse representation,” IEEE Trans.
Image Process, vol. 54, no. 11, pp. 4311-4322, Nov 2006.
12. X. Zhu, X. Li, S. Zhang, C. Ju and X. Wu, “Robust joint graph sparse
coding for unsupervised spectral feature selection,” IEEE Trans. Neural
Netw. Learn. Syst, vol. 28, no. 6, pp. 1263-1275, June 2017.
13. S. Cai, L. Zhang, W. Zuo and X. Feng, “A probabilistic collaborative
representation based approach for pattern classification,” in Proc. IEEE
Confer. CVPR, 2016, pp. 2950-2959.
14. S. Pereira, A. Pinto, V. Alves, and C. A. Silva, “Brain Tumor Segmentation
Using Convolutional Neural Networks in MRI Images,” IEEE Trans. Med
Imaging, vol. 35, no. 5, pp. 1240-1251, May 2016.
15. M. Elad and M. Aharon, “Image denoising via sparse and redundant
representations over learned dictionaries,” IEEE Trans. Image Process, vol.
15, no. 12, pp. 3736-3745, 2006.

You might also like