Professional Documents
Culture Documents
1
ashwingshan98@gmail.com
2
anuragprabhu122@gmail.com
1. INTRODUCTION
Cancer is one of the foremost dangerous diseases in human life. The foremost necessary task of the lungs
is to require the element to the body and to get rid of CO2 from the body throughout the very important
activities. Lung cancer happens as a result of the uncontrolled proliferation of tissues and cells within the
lungs. Once these lots age uncontrolled in their setting it will unfold injury-encompassing problems. Lung
cancer is that the 1st sort of cancer that causes death among males and therefore the second sort of cancer
among females close to 1.3 million folks die each year within the world thanks to lung cancer. In Turkey,
30-40 thousand folk’s area unit diagnosed with lung cancer each year [2]. Once cancer cells develop,
however, this orderly method breaks down. As cells become a lot of and a lot of abnormal, previous or
broken cells survive after they ought to die, and new cells kind after they aren't required. These further
cells will divide no end and should kind growths refer to a tumor. This tumor starts spreading to different
parts of the body [6].
Tumors are of two types’ benign and malignant where benign (non-cancerous) is mass of cell which
cannot spread to other parts of the body and malignant (cancerous) is the growth of cell which can spread
to other parts of the body this spreading of infection is called metastasis. There is various type of cancer
like Lung cancer, leukemia, and colon cancer, etc. The incidence of lung cancer has significantly
increased since the early 19th century. There is a various cause of lung cancer like smoking, exposure to
radon gas, secondhand smoking, and exposure to asbestos, etc. Lung cancer is of two types small cell
lung cancer (SCLC) and non-small cell lung cancer (NSCLC). Non-small cell lung cancer is more
common than SCLC and it generally grows and spreads more slowly. SCLC is almost related to smoking
1
and grows more quickly and form large tumors that can spread widely through the body. With the fast
increase in population rate, the speed of diseases like cancer, chikungunya, cholera, etc., are increasing
[1]. The harmful nodules will be detected at associate degree earlier stage by the radiologist’s
mistreatment computerized tomography (CT) and alternative scanning techniques [19]. They usually
begin within the bronchi close to the center of the chest. Symptoms which will counsel lung cancer
include symptom like shortness of breath with activity, coughing up blood, chronic coughing or
modification in regular coughing pattern, wheezing, pain or pain within the abdomen, weight loss,
fatigue, and loss of appetite, speech defect, dysphasia (difficulty swallowing), Pain in shoulder, chest, arm
[20]. To diagnose lung cancer numerous techniques area unit used like chest X-Ray, CT scan, MRI
through that doctor will decide the situation of tumor supported that treatments area unit given [7].
Challenge facing medical practitioners makes this study of a far larger significance. The challenge of
detecting cancer in its early stages since symptoms appear only in the advanced stages thereby causing the
mortality rate of lung cancer to be the highest among all other types of cancer. The correct designation for
various forms of cancer plays a crucial role to the doctors to help them in determining and selecting the
right treatment. Undeniably, the selections created by the doctors are the foremost necessary factors in
designation however recently, the application of various AI classification techniques are evidenced in
serving to doctors to facilitate their method} process. Possible errors which may occur because of
unskilled doctors are often decreased by mistreatment classification techniques. This system may examine
medical information in an exceedingly shorter time and additional exactly [13].
Feature Extraction relates the total of benefits expected to characterize an enormous arrangement of
information. The planned work uses machine learning strategies for the recognition of tumor cells within
the body a lot of accurately. After performing the feature extraction method, ML techniques are applied to
chosen options for extracting sensitive values type the info given, and acknowledging tumor cells [10].
Pre-diagnosis helps to spot or slim down the likelihood of screening for lung cancer malady. Symptoms
and risk factors (smoking, alcohol consumption, obesity, and hypoglycemic agent resistance) had a
statistically important impact in the pre-diagnosis stage. The lung cancer diagnostic and prognostic issues
are principally within the scope of the wide mentioned classification issues. These issues have attracted
several researchers in machine intelligence, data processing, and statistics fields [11]. Hence the main
goal is to make up a framework that helps the clinical specialists to cross make sure their analyzed results
of foreseen lung cancer because the existing diagnosing method is time-intense, effortful and dear, by and
huge this deep learning-based mostly tool will determine the tumor growth and predict stages. Since this
can be a machine-controlled tool supported by image processing and AI, it minimizes human effort in
predicting the presence of cancer cells from the image [4].
In this paper, we discuss the approach of predicting cancer from CT scanned images by building an
ensemble classifier and the results of the same are analyzed. The rest of the paper is organized as follows:
Section 2 gives the brief detail of previously carried works in the field of prediction, Section 3 broadly
discusses the process of lung cancer prediction. Section 4 gives the detailed analysis of the report to
support the proposed methodology, Section 5 concludes the paper, and Section 6 has the references used
in this paper.
2. LITERATURE SURVEY
Many works have already been proposed for the prediction of cancer by various researchers among them,
Nikita Banerjee et al., [1] Özge Günaydin et al., [2] projected numerous ways for police to work cancer in
early stages. In this paper, machine learning models are used to sight carcinoma nodules. They applied,
K-Nearest Neighbors, Support Vector Machines, Naïve Thomas Bayes, call Trees, and Artificial Neural
Networks machine learning ways to sight anomaly and compared all methods when preprocessing is done
as well as not done.
Syed island Raoof et al., [3] Radhika P R et al., [18] detection, prediction, and diagnosing of carcinoma
has become essential because it expedites and simplifies the resultant clinical board. To erect the progress
and drugs of cancerous conditions machine learning techniques are utilized as a result of their correct
2
outcomes. Various varieties of machine learning algorithms like Naive Thomas Bayes, Support Vector
Machine, provision regression, are applied within the care sector for analysis and prognosis of carcinoma.
Swati Mukherjee et al., [4] the analysis and study of respiratory organ diseases has been the foremost
intriguing investigation zone of doctors from time to this day. To deal with this concern, a diagnosing
system like this will solely facilitate diminish the percentages of obtaining risk to human life by early
discovery of malignant growth. The machine learning approach will offer Associate in nursing a new
chance to enhance call support in carcinoma treatment at a low price. Wasudeo Rahane et al., [5]
Kyamelia Roy et al., [6] Amrit Sreekumar et al., [7] Junjie Zhanga et al., [20] carcinoma detection system
victimization image process and machine learning is employed to classify the presence of carcinoma
during a CT- pictures and blood samples. The CT scan reports are more effective, therefore patient CT
scan images are categorized as normal and abnormal. The abnormal pictures are subjected to
segmentation to specialize in the growth portion. Classification is done on options extracted from the
photographs.
Sanjukta blue blood Jena et al., [8] Öztürk et al., [17] projected a model wherever a 5 sort of feature
extraction techniques were utilized in individual classification formula to predict at that options extraction
technique that machine learning formula is giving a lot of accuracies. Dendi Gayathri Reddy et al., [9]
projected a model that is economical in predicting the stages of respiratory organ malignant neoplastic
disease by applying the ideas of cc algorithms. It is a combination of K-Nearest Neighbors, call Tree, and
Neural Networks models beside cloth ensemble methodology for enhancing the accuracy of the general
prediction. The expected results of the urged model are showing higher accuracy compared to individual
algorithms. M.Siddardha Kumar et al., [10] projected pre-handling procedures are likewise utilized during
this work to urge correct outcomes. In preprocessing technique, the morphological technique has been
utilized to expel the undesirable data from the image. The feature extraction procedure that's accustomed
limit the one in all a sort dataset by manipulating some modified over options. To find feature extraction
of image geometrical and measurable properties, completely different techniques are utilized to
disentangle footage. V.Krishnaiah et al., [11] Muhammad Imran Faisal et al., [12] aim of the paper is to
propose a model for early detection and proper designation of the malady which can facilitate the doctor
in saving the lifetime of the patient. This research paper attempts to evaluate the discriminative power of
several predictors in the study to increase the efficiency of lung cancer detection through their symptoms.
Several classifiers including Decision tree, Multi-Layer Perceptron, Neural Network, and Naïve Bayes are
evaluated on a benchmark dataset obtained from the UCI repository. The performance is also compared
with well-known ensembles such as Random Forest and Majority Voting.
Fenwa et al., [13] proposed a model where a feature like contrast, brightness from image dataset is
extracted using texture-based feature extraction and on those two types of ML algorithm are applied one
is ANN another one is SVM, and then performance has been evaluated on both the algorithm to compare
which algorithm is giving more accuracy. Maisa Daouda et al., [14] analyzing the studies reveals that
neural network strategies are either used for filtering (data engineering) the gene expressions in a very
previous step to prediction; predicting the existence of cancer, cancer sort, or the survivability risk; or for
bunch untagged samples. This paper additionally discusses some sensible problems which will be thought
of once building a neural network-based cancer prediction model. Results indicate that the practicality of
the neural network determines its general design. Palani et al., [15] have proposed IoT based predictive
modeling by mistreatment fuzzy C mean cluster for segmentation and progressive classification formula
mistreatment association rule mining and call tree for classification for classifying the growth sets and
supported the output generated by progressive classification model convolutional neural network has been
applied with alternative options for predicting benign or malignant.
Lynch et al., [16] various machine learning algorithms are implemented for predicting the survivability
rate of a person, performance is measured based on root mean square error. Every model is trained using
10-fold cross-validation because the parameters are preprocessed by distribution default price thus cross-
validation is employed for avoiding overfitting. Şaban Oztürk et al., [17] classification of histopathologic
pictures and identification of cancerous areas is sort of difficult because of image background quality and
determination. The distinction between traditional tissue and cancerous tissue is extremely tiny in some
3
cases. So, the options of the tissue patches within the image have key importance for automatic
classification. Sumathipala et al., [19] planned a model wherever the image information is taken from
LIDC-IDRI, once grouping the image information image filtration has been enforced, filtration is
completed supported the patient United Nations agency went through diagnostic test and module level is
adequate to thirty and so pictures whose module level is adequate to thirty is divided and so logistical
regression and random forest has been applied for prediction. Using these concepts, we introduce a novel
approach to predict cancer using ensemble techniques which are discussed in detail in the next section.
3. METHODOLOGY
In this section we discuss the detailed approach for predicting lung Cancer from CT scanned images by
extracting the region-based features and an ensemble classifier. The blueprint of the process is shown in
Figure 1.
5
Figure 2. Ensemble-Classifier
Random Forest [16] The Random Forest technique generates a variety of call trees throughout coaching
that area unit allowed to separate arbitrarily from a seed purpose. This ends up in a “forest” of arbitrarily
generated call trees whose outcomes area unit ensemble by the Random Forest algorithmic program to
predict additional accurately than one tree will alone. Individual call trees may be fanciful as if-then-else
rules that may be generated from the dataset directly, creating them one amongst the additional human-
understandable techniques. One downside with one call tree is overfitting, creating the predictions appear
excellent on the coaching knowledge, however unreliable in future predictions.
Then the result of the built ensemble classifier and RF is discussed in the next section. The steps involved
in the algorithm can be viewed as follows:
To test the algorithm, using some random data may lead to different results each time tested. This may
mislead the prediction rate of the model. So, to reduce these shortcomings we have used the standard CT
scanned images of lungs [21].
6
4. Results and Analysis
In this section, we discuss and analyze the results of the built ensemble classifier with Random-Forest
based on the different parameters. To analyze the result, we have used the data [21] that consists of CT
scanned images of Lungs. It has 561 images belonging to class 1 and 416 images belonging to class 0
where class 0 refers to Benign and 1 refers to Malignant.
Table 1. contingency table for predicted vs real output (logical details of binary classification)
4.2 Accuracy
Accuracy is the proportion of correct predictions versus the total number of predictions made. Accuracy is
mainly used for measuring the performance of a classifier.
𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑐𝑡𝑖𝑜𝑛𝑠 𝑚𝑎𝑑𝑒
𝑇𝑃+𝑇𝑁
Accuracy = (1)
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
5. Conclusion
In this paper, we propose a novel algorithm for detecting lung cancer in a CT scan by building an
ensemble classifier and then the results are compared with the RF classifier. In Ensemble-Classifier we
included five machine learning models like SVM, LR, MLP, Decision -tree, KNN. The proposed model
gives an overview of the prediction of lung cancer at an early stage. After the prediction of the tumor
whether it is malignant or benign, we then generate a confusion matrix for each machine learning
technique and based on the confusion matrix we calculate accuracy, Recall, precision, and F1 score.
In the future, Deep-Learning techniques can be used for the prediction of carcinoma. More range
of pictures is often thought of like X-ray, CT, MRI, PET which will evoke additional accuracy, thereby
serving to the medical practitioners to supply fast prevention at low value.
References
[1] Nikita Banerjee, Subhalaxmi Das. “Prediction Lung Cancer– In Machine Learning Perspective”
International Conference on Computer Science, Engineering and Applications (ICCSEA) (2020).
7
[2] Ozge Gunaydin, Melike Gunay, Oznur Sengel. “Comparison of Lung Cancer Detection Algorithms”
Scientific Meeting on Electrical-Electronics & Biomedical Engineering, Computer Science (EBBT)
IEEE 2019.
[3] Syed Saba Raoof, M A. Jabbar, Syed Aley Fathima. "Lung Cancer Prediction using Machine
Learning: A Comprehensive Approach" Second International Conference on Innovative Mechanisms
for Industry Applications (ICIMIA 2020).
[4] Swati Mukherjee, Prof. S. U. Bohra. "Lung Cancer Disease Diagnosis Using Machine Learning
Approach" Third International Conference on Intelligent Sustainable Systems [ICISS 2020].
[5] Wasudeo Rahane, Himali Dalvi. “Lung Cancer Detection Using Image Processing and Machine
Learning HealthCare” IEEE International Conference on Current Trends toward Converging
Technologies, Coimbatore, India, IEEE 2018.
[6] Kamelia Roy, Sheli Sinha Chaudhury, Madhurima Burman, Ahana Ganguly, Chandrima Dutta,
Sayani Banik, Rayna Banik. “A Comparative Study of Lung Cancer detection using supervised
neural network” International Conference on Opto-Electronics and Applied Optics (Optronix 2019).
[7] Amrit Sreekumar, Karthika Rajan Nair, Sneha Sudheer, Ganesh Nayar H, and Jyothisha J Nair.
"Malignant Lung Nodule Detection using Deep Learning" International Conference on
Communication and Signal Processing, July 28 - 30, 2020, India.
[8] Sanjukta Rani Jena, Dr. Thomas George, Dr. Narain Ponraj. "Texture Analysis Based Feature
Extraction and Classification of Lung Cancer" International Conference on Electrical, Computer and
Communication Technologies (ICECCT) IEEE 2019.
[9] DendiGayathri Reddy, Emmidi Naga Hemanth Kumar, Desireddy Lohith Sai Charan Reddy, Monika
P "Integrated Machine Learning Model for Prediction of Lung Cancer Stages from Textual data
using Ensemble Method". 1st International Conference on Advances in Information Technology
IEEE 2019.
[10] M.Siddardha Kumar, Prof.Dr.K.Venkata Rao. "Prediction Of Lung Cancer Using Machine Learning
Technique: A Survey" International Conference on Computer Communication and Informatics
(ICCCI -2021), Jan. 27 – 29, 2021, Coimbatore, India.
[11] Krishnaiah, V., G. Narsimha, and Dr N. Subhash Chandra. "Diagnosis of lung cancer prediction
system using data mining classification techniques." International Journal of Computer Science and
Information Technologies 4.1 (2013): 39-45.
[12] Muhammad Imran Faisal, Saba Bashir, Zain Sikandar Khan, Farhan Hassan Khan. "An Evaluation
of Machine Learning Classifiers and Ensembles for Early-Stage Prediction of Lung Cancer" 3rd
International Conference on Emerging Trends in Engineering, Sciences, and Technology (ICEEST
2018).
[13] Fenwa, Olusayo D., Funmilola A. Ajala, and A. Adigun. "Classification of cancer of the lungs using
SVM and ANN." Int. J. Comput. Technol. 15.1 (2016): 6418-6426.
[14] Daoud, Maisa, and Michael Mayo. "A survey of neural network-based cancer prediction models
from microarray data." Artificial intelligence in medicine (2019).
[15] Palani D, K. Venkatalakshmi. "An IoT-based predictive modeling for predicting lung cancer using
fuzzy cluster-based segmentation and classification." Journal of medical systems 43.2 (2019): 21.
[16] Lynch, Chip M., et al. "Prediction of lung cancer patient survival via supervised machine learning
classification techniques." International journal of medical informatics 108 (2017): 1-8.
[17] Ozturk, Şaban, and Bayram Akdemir. "Application of feature extraction and classification methods
for the histopathological image using GLCM, LBP, LBGLCM, GLRLM, and SFTA." Procedia
computer science 132 (2018): 40-46.
[18] Radhika P R, Rakhi.A.S.Nair. "A Comparative Study of Lung Cancer Detection using Machine
Learning Algorithms". International Conference on Electrical, Computer and Communication
Technologies (ICECCT) IEEE 2019.
[19] Sumathipala, Yohan, et al. "Machine learning to predict lung nodule biopsy method using CT image
features: A pilot study." Computerized Medical Imaging and Graphics 71 (2019): 1-8.
8
[20] Zhang, Junjie, et al. "Pulmonary nodule detection in medical images: a survey." Biomedical Signal
Processing and Control 43 (2018): 138-147.
[21] alacrity, Shamballa (2020), “The IQ-OTHNCCD lung cancer dataset”, Mendeley Data, V1, DOI:
10.17632/bhmdr45bh2.1