You are on page 1of 21

Lung Diseases Prediction with the use of Chi-2 Test

for feature selection

Presented by:
vikas
27291974213
M.Tech (CSE) 3rd Sem.
Introduction
 Machine learning is a category of Artificial Intelligence which gives the
capability directly to learn an improvement from experience without
clearly programming the system.
 The objective of machine learning in business is not only for effective data
collection, but to make use of the ever increasing amounts being gathered
by manipulating and analyzing it without heavy human input.
 The purpose of Machine learning is to understand some knowledge from
data itself.
 Machine learning is used to examine the certain patterns to provide good
learning to machines and to handle data in an extra efficient way.
What is Lung Cancer?
 Lung cancer is still remained the danger of society and reason for death of
thousands of individuals in about the world.
 Lung Cancer is deadly lung tumor considered by unmanageable cell extension in
lung tissues.
 This growth can spread to the surrounding tissue or other parts of the body apart
from the lungs by the process of metastasis.
 The majority of lung cancer cases (85%) are caused by prolonged tobacco smoking
and approximately 10% to 15% of cases occur in people who have never smoked.
 Some cases are caused by inherited factors and radon gas, secondhand smoke or
other forms of air pollution.
 Lung Cancer can be seen through chest radiographs and computed tomography
(CT) scans.
Types of Lung Cancer
There are two main types of Lung Cancer:
 Small Cell Lung Cancer(SCLC): It often starts in the bronchi, then
quickly grows and spread to other parts of the body, including the lymph
nodes. This type of lung cancer represents fewer than 20 percent of lung
cancers and is typically caused by tobacco smoking. Small cell lung cancer
may be very aggressive and requires immediate treatment.
 Non-small cell Lung Cancer(NSCLC): Non-small cell lung cancer is the
most common type of lung cancer. It accounts for nearly nine out of every
10 cases, and usually grows at a slower rate than SCLC. Most often, it
develops slowly and causes few or no symptoms until it has advanced.
Machine Learning Algorithms
 Decision Tree is a Supervised Learning Procedure that can be utilized for
both classification and regression issues, but it is mostly preferred to solve
classification.
 Decision Tree is a Graphical representation to obtain all possible solutions
based on the given problem.
 The decision tree has two nodes which are the decision node and the leaf
node.
 The decision node is used to make any decision and has many branches
while the leaf node is the outcome of those decisions which do not have
any further branches.
Support Vector Machine(SVM)
 The Support Vector Machine is a supervised machine learning algorithm
that can be used for both classification and regression.
 In any case, it is generally utilized in classification problem. The SVM
algorithm aims to create the best line or decision boundary that can
separate n-dimensional space in classes so that we can easily insert new
data points into the correct range in the future.
 This best decision range is called the hyperplane. SVM picks the vertex
points/vectors that help to create hyperplanes. These extreme cases are
called as support vectors, and therefore the algorithm is called as support
vector machine.
K-Nearest Neighbor(KNN)
 K-Nearest Neighbor (KNN) algorithm utilizes features similarity to
estimate of new data points.
 In KNN, the training data (which is well known data) is provided into the
learner. When the test data is introduced for the learner, it tries to compare
both data.
 In the KNN algorithm K represents the number of nearest neighbor points
that are voting for the class of new test data.
Md. Badrul Alam Miah and Mohammad Abu Yousuf. Detection of Lung
Cancer from CT Image Using Image Processing and Neural
Network(2015).

 This paper deals with Image processing and Neural Networks.


 Firstly, Feature extraction is used to select Features which are used to train
and test the neural networks.
 CT scan image is used and achieved the accuracy of 96.67 %.
Emrana Kabir Hashi, MD. Shahid Uz Zaman and MD. Rokibul Hasan . An
Expert Clinical Decision Support System to Predict Disease Using
Classification Techniques (2017).

 Emrana Kabir Hashi use the Decision Tree and K-Nearest Neighbor
(KNN) algorithms to detect the Lung Cancer.
 Then, the system calculates and compares the accuracy of KNN and C4.5.
 The proposed model obtained the highest accuracy of C4.5 with 90.43%
for predicting the disease.
 PIMA Indians Dataset is used in this model.
Nikita Banerjee and Subhalaxmi Das. Prediction Lung cancer-In Machine
Learning Perspective (2020).

 Nikita Banerjee diagnosed the lung Cancer by using various machine


Learning.
 The proposed model consist of preprocessing block, feature extraction
block and classification block.
 CT scan is used in the prediction of cancer.
 By comparing various algorithms, Random Forest has achieved highest
accuracy with 96%.
Binila Mariyam Boban and Rajesh Kannan Megalingam. Lung Diseases
Classification based on Machine Learning Algorithms and Performance
Evaluation (2020).

 In this paper, several algorithms are used for lung cancer prediction such as
KNN, SVM, MLP (Multilayer perceptron) .
 It consists of 400 CT Scan lung disease images.
 By using various Machine learning algorithms, MLP gives the highest
accuracy with 98%.
Peeris T. M. P. and Brundha “Optimizing Classification Techniques for
lung cancer detection on CT images.” EPRA International Journal of
Multidisciplinary Research (IJMR) Volume: 6, Issue: 3 March 2020,

 T. Maria Patrica Peeris highly effective optimizing classification techniques


for lung cancer prediction on CT images approach for the prediction of lung
cancer.
 This research is used to combine KNN and Naïve Bayes algorithm.
 The Naïve Based algorithm at first uses two hidden layers to extract features
from the nearest neighbor. Accordingly, the learning model’s accuracy and
efficiency, classification methods are used to predict lung cancer from the
CT Image.
 The aim of optimization will allow the model to modify the feature
extraction process as the input image given in the network. Given, any
motion of the image, the model will be trained for the purpose of prediction.
Base Paper

Negar Maleki, Yasser Zeinali and seyed Taghi AKhavan Niaki. A KN


method for lung cancer prognosis with the use of a genetic algorithm for
feature selection (2020) Elsevier Ltd.

 Negar Maleki provided a research to help in the detection of lung cancer


disease by making use of genetic algorithm.
 The purpose of using GA was to determine the best combination of
characteristics that minimize the overall mistake of KNN method.
 The Dataset taken from Data world site contains 1000 samples.
 We also implement 10 fold cross- validations for the training set.
 As a result, the accuracy attained was 100%.
Title Author Year Dataset Classification Accuracy
or technique used achieved
CT image

Detection of Lung Cancer from CT Badrul Alam Miah 2015 CT image Neural Network 96.67%
Image Using Image Processing and
Neural Network

An Expert Clinical Decision Support Emrana Kabir 2017 PIMA Dataset KNN, Decision Tree 90.43%
System to Predict Disease Using Hashi
Classification Techniques

Prediction Lung cancer-In Machine Nikita Bangerjee 2020 CT image SVM, ANN, 96%
Learning Perspective Random Forest

Lung Diseases Classification based on Binila Mariyam 2020 CT Scan KNN,SVM and 98%
Machine Learning Algorithms and Boban MLP
Performance Evaluation

Optimizing Classification Techniques T.Maria Patrica 2020 CT Image Naïve Bayes, KNN Not Mentioned
for lung cancer detection on CT Peeris
images

A KN method for lung cancer Negar Maleki 2020 Data world site Decision tree, KNN 100%
prognosis with the use of a genetic contains 1000 And genetic
algorithm for feature selection samples. algorithm.
Gaps in Literature
From the existing literature, it is revealed that existing machine learning
models suffer from at least one of the following problems:
 The Genetic Algorithm suffers from very complex and hence takes a lot
of time in processing with high cost.
 Maximum existing researchers have ignored feature selection techniques
using statistical test in the time of training and testing. It is observed that
by using feature selection technique, we can increase the performance of
machine learning models.
 Previous researcher states that future works may use the machine learning
classification algorithm and feature selection comparing their
performances with the previous one.
Problem Definition
 Most of the researchers have neglected the use of the statistical test for
feature selection. The most of the researcher have focused on the
population-based meta heuristic genetic algorithm. Genetic algorithm is
very complex and computationally costly that is time consuming.
 In order to overcome this drawback, a noval machine learning technique
will be designed by using the Chi-Square statistical Test for Feature
Selection method for Lung cancer prediction.
Proposed Methodology
OBJECTIVES
The objective of the given research work is mentioned below:
 The chi-square test allows you to solve the problem in feature selection
by testing the relationship between two categorical outcomes features.
 A chi-square test is used to check the independence of two features.
 To identify the relevant attributes by using chi-square.
 The chi-squared test is a difference between the observed value and
expected value. The formula for chi-square is;

χ2 = ∑(Observed value – Expected value)2/Expected value

 To propose Feature Selection using chi2 , reduce the complexity and


high cost.
Data world site contains 1000 samples.

Data Pre-processing

Apply Chi- Square Feature Selection Method

Implemented Classification Algorithm


SVM,K-NN, DT

Evaluating the performance

Comparative Analysis Based on Accuracy

Selecting the Optimum k from obtained results for feature


selection

Results
Parameter Measures
To evaluate the effectiveness of proposed technique over existing lung cancer prediction which
is done based on following parameters:

1). Accuracy: How many data points are estimated correctly.


(TP+TN) / TP+TN+FP+FN
2). Precision: Percentage of your results which are relevant.
TP / TP+FP
3). Recall: Percentage of total relevant results correctly.
TP /TP+FN
4).F-Measure: Combination of precision and recall.
2TP /2TP+FP+FN
References
Nikita Banerjee and Subhalaxmi Das. Prediction Lung cancer-In Machine Learning
Perspective.” California Institute Of Technology, on July 04,2020 from IEEE Xplore.
Md. Badrul Alam Miah and Mohammad Abu Yousuf. Detection of Lung Cancer from
CT Image Using Image Processing and Neural Network. 2nd International Conf on
Electrical Engineering and Information & Communication Technology (ICEEICT) 20
IS Jahangirnagar University, Dhaka-1342, Bangladesh, 21-23 May 2015.
Binila Mariyam Boban and Rajesh Kannan Megalingam. Lung Diseases Classification
based on Machine Learning Algorithms and Performance Evaluation. International
Conference on Communication and Signal Processing, July 28 - 30, 2020, India
(IEEE).
Emrana Kabir Hashi, MD. Shahid Uz Zaman and MD. Rokibul Hasan..An Expert
Clinical Decision Support System to Predict Disease Using Classification Techniques.
International Conference on Electrical, Computer and Communication Engineering
(ECCE), February 16-18, 2017, Cox’s Bazar, Bangladesh. (IEEE).
Negar Maleki, Yasser Zeinali and seyed Taghi AKhavan Niaki. A KN method for lung
cancer prognosis with the use of a genetic algorithm for feature selection. (2020)
Elsevier Ltd.
Thank you..!

You might also like