Professional Documents
Culture Documents
Interdisciplinary Project Report
Interdisciplinary Project Report
2 Background
Various medical procedures are needed for various forms of brain tumors. The tumor mass must
first be recognized and segmented in conventional computer-aided diagnosis systems before it can
be classified into other categories. Following tumor mass segmentation, feature extraction and
classification are applied to the segmented area.
Brain and Central Nervous System tumors are the third most common cancer occurring among
teenagers and young adults (ages 15-38) [Cheng et al. (2016)]. In order to prevent prevent these
tumors as soon as possible, attempts have been made to create a tumor detection system regardless
of the tumor’s location, shape, and intensity, however without success [Siegel (2016)].
Detecting a tumor’s type plays a very important role for cancer patients because misdiagnosis of
brain tumor types will prevent effective response to medical intervention and therefore decrease
their chances of survival. This process is currently performed manually by doctors, being a very
tedious and prone to errors task. Furthermore, doctors require a lot of training in order to be
able to correctly identify brain tumors. It would be of great help for them to make use of Deep
Learning solutions to automate this task and simplify their work.
Attempts to perform this task have already been made by other researchers, both using classical
image processing and neural networks [Cheng et al. (2015), He et al. (2015), Pereira et al. (2016)].
We will focus on the later approach and use Convolutional Neural Networks, since the data is in
image format and this type of neural network performs great on image classification.
3 Experiments
The experimental part consisted of multiple steps which were performed and were structured in
the following way:
1. Data exploration
2. Data preprocessing
3. Training and testing classification networks
4. Data expansion and hyperparameter optimization
5. Demo web app
1
3.1 Data exploration
The data used consisted of two Kaggle datasets (1 ,2 ).
The initial idea was to use only the first dataset for all of the experiments. However, when trying
to improve the model’s results, I wanted to check whether that could be achieved by training it
on more data. I was able to find a very similar dataset, containing the same kind of images and
classified into the same 4 categories. I then merged the two datasets by combining the images
from each of the four possible classes.
The images have 4 possible classes (meningioma tumor, glioma tumor, pituitary tumor or no tu-
mor), which are the target variables that we want to classify later.
The class distribution shows that the notumor class is slightly misrepresented (16% of the images
have these class), whereas the other three classes appear in 28% of the data each.
Both datasets consist of images containing slices of MRI brain scans of different patients. Ex-
ploratory analysis reveals that the initial dataset contains 3267 images, having different resolutions.
A sample image can be seen below.
A further analysis of the resolutions of the images shows that the most common resolution is 512
by 512 pixels, and occurs in 31.17% of all the images. The minimum resolution is 167x167, while
the maximum resolution is 1375x1446. Only 35.64% percent of all the images have a "square"
resolution (width=height).
For contrast analysis, the Michelson contrast was used(3 ). The analysis shows that all images have
a constrast of 1.
2
After analyzing all the resolutions, we choose to resize all the images to 167x167 pixels, which is
the smallest image resolution present in our dataset.
Other steps applied were splitting the data into training, validation and testing (70/10/20) and
rescaling the pixel values between 0 and 1.
The metric chosen for evaluating the performance of the models was accuracy, and below we can
see the accuracy scores for the test data. My goal while carrying out the experiments was to try
and get the highest possible test accuracy. Among the first experiments, we can see the the CNN
with 3 layers performed the best, and this is why I have chosen this network for hyperparameter
optimization and applied it on the expanded dataset, described in the next section.
3
• optimizer
• number of epochs
• number of layers
The best accuracy I obtained was 92.13%.
4 Conclusions
To conclude, while working on this project I combine the knowledge obtained while taking the
Brain Modeling course and the curriculum of my Data Science Master’s. I successfully developed
an AI-based web application that can classify 3 types of brain tumors (or no tumor) with an
accuracy of 92.2%.
References
Cheng, J., Huang, W., Cao, S., Yang, R., Yang, W., Yun, Z. and Feng, Q. (2015), ‘Enhanced
performance of brain tumor classification via tumor region augmentation and partition’, PLoS
ONE 10.
4 https://flask.palletsprojects.com/en/2.2.x/
4
Cheng, J., Yang, W., Huang, M., Huang, W., Jiang, J., Zhou, Y., Yang, R., Jie, Z., Feng, Y.,
Feng, Q. and Chen, W. (2016), ‘Retrieval of brain tumors by adaptive spatial pooling and fisher
vector representation’, PLoS ONE 11.
He, K., Zhang, X., Ren, S. and Sun, J. (2015), ‘Deep residual learning for image recognition’.
URL: https://arxiv.org/abs/1512.03385
Pereira, S., Pinto, A., Alves, V. and Silva, C. A. (2016), ‘Brain tumor segmentation using convolu-
tional neural networks in mri images’, IEEE Transactions on Medical Imaging 35(5), 1240–1251.
Siegel, R. (2016), ‘Cancer statistics 2016’, CA: A Cancer Journal for Clinicans, 66.