Professional Documents
Culture Documents
Presented by:
vikas
27291974213
M.Tech (CSE) 3rd Sem.
Introduction
Machine learning is a category of Artificial Intelligence which gives the
capability directly to learn an improvement from experience without
clearly programming the system.
The objective of machine learning in business is not only for effective data
collection, but to make use of the ever increasing amounts being gathered
by manipulating and analyzing it without heavy human input.
The purpose of Machine learning is to understand some knowledge from
data itself.
Machine learning is used to examine the certain patterns to provide good
learning to machines and to handle data in an extra efficient way.
What is Lung Cancer?
Lung cancer is still remained the danger of society and reason for death of
thousands of individuals in about the world.
Lung Cancer is deadly lung tumor considered by unmanageable cell extension in
lung tissues.
This growth can spread to the surrounding tissue or other parts of the body apart
from the lungs by the process of metastasis.
The majority of lung cancer cases (85%) are caused by prolonged tobacco smoking
and approximately 10% to 15% of cases occur in people who have never smoked.
Some cases are caused by inherited factors and radon gas, secondhand smoke or
other forms of air pollution.
Lung Cancer can be seen through chest radiographs and computed tomography
(CT) scans.
Types of Lung Cancer
There are two main types of Lung Cancer:
Small Cell Lung Cancer(SCLC): It often starts in the bronchi, then
quickly grows and spread to other parts of the body, including the lymph
nodes. This type of lung cancer represents fewer than 20 percent of lung
cancers and is typically caused by tobacco smoking. Small cell lung cancer
may be very aggressive and requires immediate treatment.
Non-small cell Lung Cancer(NSCLC): Non-small cell lung cancer is the
most common type of lung cancer. It accounts for nearly nine out of every
10 cases, and usually grows at a slower rate than SCLC. Most often, it
develops slowly and causes few or no symptoms until it has advanced.
Machine Learning Algorithms
Decision Tree is a Supervised Learning Procedure that can be utilized for
both classification and regression issues, but it is mostly preferred to solve
classification.
Decision Tree is a Graphical representation to obtain all possible solutions
based on the given problem.
The decision tree has two nodes which are the decision node and the leaf
node.
The decision node is used to make any decision and has many branches
while the leaf node is the outcome of those decisions which do not have
any further branches.
Support Vector Machine(SVM)
The Support Vector Machine is a supervised machine learning algorithm
that can be used for both classification and regression.
In any case, it is generally utilized in classification problem. The SVM
algorithm aims to create the best line or decision boundary that can
separate n-dimensional space in classes so that we can easily insert new
data points into the correct range in the future.
This best decision range is called the hyperplane. SVM picks the vertex
points/vectors that help to create hyperplanes. These extreme cases are
called as support vectors, and therefore the algorithm is called as support
vector machine.
K-Nearest Neighbor(KNN)
K-Nearest Neighbor (KNN) algorithm utilizes features similarity to
estimate of new data points.
In KNN, the training data (which is well known data) is provided into the
learner. When the test data is introduced for the learner, it tries to compare
both data.
In the KNN algorithm K represents the number of nearest neighbor points
that are voting for the class of new test data.
Md. Badrul Alam Miah and Mohammad Abu Yousuf. Detection of Lung
Cancer from CT Image Using Image Processing and Neural
Network(2015).
Emrana Kabir Hashi use the Decision Tree and K-Nearest Neighbor
(KNN) algorithms to detect the Lung Cancer.
Then, the system calculates and compares the accuracy of KNN and C4.5.
The proposed model obtained the highest accuracy of C4.5 with 90.43%
for predicting the disease.
PIMA Indians Dataset is used in this model.
Nikita Banerjee and Subhalaxmi Das. Prediction Lung cancer-In Machine
Learning Perspective (2020).
In this paper, several algorithms are used for lung cancer prediction such as
KNN, SVM, MLP (Multilayer perceptron) .
It consists of 400 CT Scan lung disease images.
By using various Machine learning algorithms, MLP gives the highest
accuracy with 98%.
Peeris T. M. P. and Brundha “Optimizing Classification Techniques for
lung cancer detection on CT images.” EPRA International Journal of
Multidisciplinary Research (IJMR) Volume: 6, Issue: 3 March 2020,
Detection of Lung Cancer from CT Badrul Alam Miah 2015 CT image Neural Network 96.67%
Image Using Image Processing and
Neural Network
An Expert Clinical Decision Support Emrana Kabir 2017 PIMA Dataset KNN, Decision Tree 90.43%
System to Predict Disease Using Hashi
Classification Techniques
Prediction Lung cancer-In Machine Nikita Bangerjee 2020 CT image SVM, ANN, 96%
Learning Perspective Random Forest
Lung Diseases Classification based on Binila Mariyam 2020 CT Scan KNN,SVM and 98%
Machine Learning Algorithms and Boban MLP
Performance Evaluation
Optimizing Classification Techniques T.Maria Patrica 2020 CT Image Naïve Bayes, KNN Not Mentioned
for lung cancer detection on CT Peeris
images
A KN method for lung cancer Negar Maleki 2020 Data world site Decision tree, KNN 100%
prognosis with the use of a genetic contains 1000 And genetic
algorithm for feature selection samples. algorithm.
Gaps in Literature
From the existing literature, it is revealed that existing machine learning
models suffer from at least one of the following problems:
The Genetic Algorithm suffers from very complex and hence takes a lot
of time in processing with high cost.
Maximum existing researchers have ignored feature selection techniques
using statistical test in the time of training and testing. It is observed that
by using feature selection technique, we can increase the performance of
machine learning models.
Previous researcher states that future works may use the machine learning
classification algorithm and feature selection comparing their
performances with the previous one.
Problem Definition
Most of the researchers have neglected the use of the statistical test for
feature selection. The most of the researcher have focused on the
population-based meta heuristic genetic algorithm. Genetic algorithm is
very complex and computationally costly that is time consuming.
In order to overcome this drawback, a noval machine learning technique
will be designed by using the Chi-Square statistical Test for Feature
Selection method for Lung cancer prediction.
Proposed Methodology
OBJECTIVES
The objective of the given research work is mentioned below:
The chi-square test allows you to solve the problem in feature selection
by testing the relationship between two categorical outcomes features.
A chi-square test is used to check the independence of two features.
To identify the relevant attributes by using chi-square.
The chi-squared test is a difference between the observed value and
expected value. The formula for chi-square is;
Data Pre-processing
Results
Parameter Measures
To evaluate the effectiveness of proposed technique over existing lung cancer prediction which
is done based on following parameters: