Professional Documents
Culture Documents
Abstract--- Cancer has the highest growth rate among all In the recent times, a number of researchers have employed
diseases globally. Oral cancer is one of the most dangerous machine learning approaches in the Healthcare industry.
cancer which affects and originates from the oral cavity and One of the data mining technologies in use is clustering
neck. Overuse of tobacco and smoking cigarettes are the primary
which helps to detect and classify potential non-cancer and
risk factor for developing oral cancer. Another habit which has a
strong association with oral cancer is the consumption of oral cancer patients. The focus of the present paper is to
alcohol. A large number of patient deaths were recorded from design and execute an oral cancer identification and
oral cancer as a result of lack of its identification and late detection system using Deep Neural based Adaptive Fuzzy
treatment. Researchers in the medical community are making an System (DNAFS). DNAFS uses both fuzzy logic and deep
effort to provide a system for effective diagnosis and prevention neural network based algorithm. The processing is handled
of the serious disease. In the present research, oral cancer
by first initiating data collection by gathering online
patients can be identified through the use of data mining
technology which includes detection, classification and information for the medical data set. This is followed by
clustering. A Deep Neural Based Adaptive Fuzzy System preprocessing which helps in data cleaning, removal of
(DNAFS) is proposed in this paper which uses machine learning unwanted data and empty values. Next, data is clustered by
for the detection and identification of oral cancer. The two Fuzzy C-Means clustering for further processing. Data
techniques which are part of DNAFS are based on fuzzy logic patterns can be identified by feature selection. This process
and neural networks. DNAFS and methods for data mining are
helps highlight similarities and differences. Next, the
explored for the identification of suitable techniques which are
helpful in classifying data efficiently. The stages included in the methods in DNAFS and data mining are explored for
proposed mechanism include data collection, pre processing, identifying appropriate techniques and methods for
Fuzzy C-Means for clustering data, feature selection, classifying data efficiently. Finally, data is classified using
classification and identification. Meaningful relationships can DNAFS, a machine learning technique which is a very
be extracted effectively from data using data mining techniques. popular classifier in the field of machine learning related to
About 96.29 % accurate results are available from experiments.
detection of oral cancer.
There is less than 5 ms incidence of error in the result. The
datasets are required to be investigated further in daily clinical A popular classifier used in a number of fields pertaining
practice. to Medical Research is DNN which is simple as well as easy
Keywords--- Data mining, feature selection, Machine to implement. Neural networks are based on the functions
Learning, Fuzzy C-means, Oral cancer, medical data set. and structure of biological neural networks which consists
of neurons. Fuzzy logic pertains to the classification of huge
I. INTRODUCTION amounts of data for obtaining accurate results. In the current
Cancer leads to the most deaths globally. It is possible to research, a rule-based classification technique is proposed
easily reduce deaths caused by cancer, by detecting it early for detection of oral cancer. This is one of the fastest fuzzy
and enforcing preventive measures. Environmental and techniques based on setting rules for clinical processes and
genetic factors play a significant role in devising new attributes pertaining to differentiation between oral cancer
methods for the detection and prevention of cancer. The and normal cases. In the present study, about 94% accuracy
approach for detecting and preventing cancer which is in classification was obtained.
described here is based on an analysis which uses The organization of the paper is done in the following
association rule mapping. The data for analysis is related to manner: In Section 2, Works by previous authors are
history of addiction, clinical symptoms, survivability and explained in brief. Machine learning techniques which are
co-morbid conditions related to cancer patients. It is possible proposed are presented in section 3. This section presents an
to discover patterns in large sets of data through data mining overview of preprocessing phase of data clustering feature
which is a computational process. Data mining uses a selection and approaches in machine learning in the context
combination of techniques related to database systems, of classification processes. Section 4 covers a description of
statistics, Artificial Intelligence and machine learning. Data experimental results. In section 5, a conclusion is presented
mining helps to extract and transform data in a data set. and future research is proposed.
Large and complex data in medical data sets can be
analyzed using machine learning techniques and algorithms. II. RELATED WORK
Clustering pertains to the assignment of objects to its two
groups known as clusters.
Manuscript received February 01, 2012. (Fill up the Details)
K. Lalithamani, Research Scholar, Bharathiar University,
Coimbatore, Tamil Nadu, India. (e-mail: klalithamani1948@gmail.com)
Dr.A. Punitha, Research Guide, Bharathiar University, Coimbatore,
Tamil Nadu, India. (e-mail: apunithaganesh@yahoo.com)
Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: E11720275S19/19©BEIESP 397 & Sciences Publication
Detection of Oral Cancer Using Deep Neural Based Adaptive Fuzzy System in Data Mining Techniques
There are two categories of clustering algorithms. These Experimental data as well as ANFIS (Adaptive Neuro
are hard clustering and soft or fuzzy clustering. In hard Fuzzy Inference System) values are compared using this
clustering, data is divided into clusters such that every data approach with respect to bell shaped and triangular
element belongs to only one cluster. These algorithms have membership functions [13]. The basis of the model
achieved good results for a number of real world developed here is the first order Takagi-Sugeno and Kang.
applications [1]. In this phase, the extracted feature is This model showed increased accuracy of prediction when
aligned in the form of input to the proposed Fuzzy C-means using the bell membership function. A novel problem
based Radial Basis Function Neural Network for the solving approach is devised for air conditioning systems
classification of brain tumors. K-Means, KNN and Fast through Sugeno and Mamdani type inference models which
Fuzzy C-Means clustering techniques were used to cluster use fuzzy logic. This was discussed by Kaur A et al. [12].
malignant as well as benign tumors. To better understand The primary differences between Sugeno type FIS (Fuzzy
the result, a comparison was made between the Inference System) and mamdani type FIS are mentioned
performances of the Fuzzy C Means, Fast Fuzzy C-Means using this approach. An enhanced form of membership is
and KNN algorithms. Improved classification result was chosen using the study with respect to FIS in the context of
obtained by the hybrid model proposed in the study in the air conditioning system. On the basis of the
comparison with the conventional algorithms in use in the implementation of the inference system, the author made a
past [2]. In the medical science field, the classification and conclusion stating that Sugeno type and Mamdani type
clustering of cancer data has shown good results. Accurate conditioning system performs in a similar manner. However,
results can be obtained through the use of the Fuzzy C- the use of Sugeno type FIS model enables the air
means and K-means algorithms in the proposed solution. conditioning system to operate at full capacity [12]. Tamer
The present paper addresses the problem related to introduced the applications of ANFIS with respect to
classification of cancer data using the methods and medical diagnosis. The ANFIS system of Sugeno type
information derived from the phases of testing and training assists in the prognostication of mycobacterium tuberculosis
[3]. K-means and DBSCAN (Density Based Spatial existence. About 503 different patient records were
Clustering of Applications with Noise) clustering algorithms collected which were part of a private health clinic. There
were compared further for their performances using are 30 different attributes and patient records covering
silhouette scores. The K-means algorithm was initially medical test and demographic data [14]. About 250 records
analyzed with the help of different clusters and distance were used to generate the ANFIS model. Classification of
metrics. Later on, the DBSCAN algorithm was analyzed instances of the proposed model is done with 97% accuracy.
using different minimum points that are required to form The rough set algorithm performs a similar calculation but
clusters and distance metrics. On the basis of this analysis, with only 92% accuracy. This information contributes
the comparison of the performance of the K-means forecasting of patients before medical tests [15].
algorithm was better than DB scan. Specifically, better Multi Ranked Feature Selection Algorithm (MRFSA) is a
values were achieved in terms of execution time and new algorithm for effectively selecting features for use in
clustering accuracy [4]. The way in which groundwater the new system for detecting cancer. A Neuro Fuzzy
affects human health was analyzed by Balasubbramanian Temporal Classification Algorithm as used to test how the
and Umarani [5] using the K-means clustering algorithm. In feature selection algorithm performs with respect to the
this case, the risk factors for content of fluoride in water accuracy of classification. The classification accuracy
were analyzed. Meaningful hidden patterns were revealed obtained in this manner has promising results in comparison
with the help of this analysis which can assist in reasonable with existing works. These insights are based on the results
decisions on a community level. Banu and Jamala [6] pertaining to the same data set [16].
devised an algorithm for predicting Heart Attack through The performance of different feature sets can be
fuzzy C-means. This algorithm uses unsupervised clustering compared and evaluated using a novel Intelligent Correlated
where one specific data object can belong to more than a Fuzzy Neural Network (ICFNN). The combination of
single cluster. In this case, the system can help physicians GLRLM (Gray Level Run Length Matrix), GLCM (Gray
with the efficient diagnosis of heart attack. The bioprofile Level Co-Occurrence Matrix) and intensity based first order
concept was used in the work of Escudero et al [7]. K-means features helps to improve the accuracy of classification. In
algorithm for clustering was used for detecting animal this context, 61 important features from 16 patients and their
disease early and its classification into non-pathologic and 192 images are taken into consideration. These type of
pathologic groups. The clustering algorithm K-means-Mode malignancies in the context of classification assist during the
was proposed by Paul et al. using medical data [8]. prognosis and treatment of oral form of cancer. Results of
Researchers indicated that the medical domain knowledge experimentation indicate the performance of the system of
with respect to clustering tends to improve the algorithmic diagnosis [17]. A number of data mining algorithms for oral
performance. Towards this end, the clustering algorithm cancer data have been applied by Jatasuji and Rajagopalan
performance is evaluated with either Neural Gas [11], Hard [18]. These algorithms have been used to classify the
C-Means[9] or Fuzzy C-Means [9] for the detection of NMDS (Non-Metric Multi Dimensional Scaling) dataset and
tumors. Three different steps are used in this case. In the
first step, the three algorithms are evaluated under the
conditions of noise. Later, tumor segmentation results are
compared with region growing algorithm. In the end, results
are manually compared with segmentation results.
Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: XXXXXXX 398 & Sciences Publication
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-7, Issue-5S3, February 2019
researchers have accessed the performance of those risk factor. The system developed in this way leads to 83.33
algorithms. In this respect, researchers have been able to % specificity and 95.85% sensitivity with respect to risk
achieve about 100% accuracy with the C4.5 algorithm, computation for CAD.
98.7% with Random Forest Tree algorithm, and 99.5%
accuracy with MPNN (Memory Pulsed Neutron-Neutron). III. PROPOSED WORK
These algorithms are also capable of separating oral cancer
3.1 Overview
data set as per non-cancer or cancer patient records.
An approach for detecting and preventing oral cancer was Discovering patterns computationally in large and
introduced by Neha et al. [19]. The basis of this approach complex data sets is referred to as data mining. This method
was association rule mapping. The data analyzed was related is a combination of machine learning, database systems,
to comorbid conditions, clinical symptoms, history of artificial intelligence, and statistics. The objective of the
addiction and survivability of patients diagnosed with cancer process of data mining is the extraction of information from
through the use of association rules. Association rules datasets and applying transformations to obtain a structure
extracted in this way are useful for clinical decision making that is meaningful in future. Large medical data sets are the
for diseases. Fuzzy rule-based classification was used by best candidates for this classification method.
Mohammadpour et al. [20] for the prediction of CAD Several data mining techniques are implemented in
(Computer Aided Design). The authors are of the opinion unison to diagnose and prognose oral cancer for a specific
that fuzzy classification which is based on the if-then rules patient. The exploration of DNAFS and data mining
can achieve an accuracy of 92.8 %. However, only 91.9% methods is done for the identification of appropriate
classification accuracy was obtained using equation. The techniques and methods that efficiently classify data. In the
evaluation of Fuzzy rules took place to obtain a reduction in end, classification is done using Deep Neural Network
space dimension for classifier training. This also helped Based Adaptive Fuzzy System (DNAFS) which is a
improve the overall accuracy. Fuzzy rules which are based machine learning technique. The prognosis of the affected
on medical experts' approach for the detection of CAD were patient determines whether successful treatment for the
formulated by Pal et al. [21]. The method proposed here diseases is possible. In this context, information for
helps doctors during identification and risk prediction with prognosis is generated with the help of a statement of
respect to CAD patients. Ten rules were made and every prognosis.
block of rule depicts a module. Every module has a single
Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: E11720275S19/19©BEIESP 399 & Sciences Publication
Detection of Oral Cancer Using Deep Neural Based Adaptive Fuzzy System in Data Mining Techniques
Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: XXXXXXX 400 & Sciences Publication
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-7, Issue-5S3, February 2019
The algorithm assigns membership to every data point modified. Other tasks involve formulating output for
pertaining to a cluster centre. This assignment is based on DNAFS that matches with the training data. This also helps
the distance between the data find point and the cluster in achieving an improvement in the rate of convergence.
centre. If there is more data in the proximity of the cluster
3.9 Design the DNAFS
centre, then it means that its membership towards that
cluster centre is greater. Initialization
Define number and type of inputs
3.6 Feature selection Define number and type of outputs
In feature selection, only certain parameters belonging to Define number of rules and type of consequents
a set of features are chosen to improve the accuracy of Define objective function and stop conditions
classification. On analyzing, it is found that feature selection Collect data
is a requirement prior to studies connected to medical data Normalize inputs
for oral cancer. The technique of feature selection detects Determine initial rules
attributes which are relevant and result in accuracy of the Initialize network
classifier. Feature selection can be used to identify data Train
patterns and highlight the differences and similarities. This
3.10 Algorithm Deep Neural based Adaptive Fuzzy System
is in contrast with other techniques which use information
belonging to different classes. This is an inference system wherein every rule output is
obtained by applying a linear combination to two variables
3.7 Classification applied for input and adding a constant term to the result. A
One of the important modules of this project is weighted average of rule output is taken to compute the final
classification. The data set applied for input is trained and output.
then fed into the process of classification. In the process of The Deep Neural Based Adaptive Fuzzy system works
classification, the original data is processed from Medical with multiple layers. Every layer is a different function in
data analysis in Deep Neural Based Adaptive Fuzzy System. this classification technique.
In the field of machine learning, one of the simplest
classification algorithms is DNAFS.
3.8 Deep Neural based Adaptive Fuzzy System
A feature set which is extract and select from a given data
set can use techniques such as data mining and machine
learning techniques for the classification of oral cancer.
DNAFS has far surpassing classifier capabilities and is in
wide use in a number of fields connected with medical
research. This is because it is simple and its implementation
is easy. The basis of DNAFS is the function and structure of
biological neural networks whose building blocks are
neurons. A solution is computed using this algorithm similar
to the way in which human brain functions. It is observed Figure 2: Design of Deep Neural based Adaptive
that the implementation of DNAFS is successful in the Fuzzy System
context of abnormalities related to oral cancer. The basis of Step 1: Fuzzification (1st Layer): All nodes in this layer
this learning technique is a rule based system devised using are adapted with respect to unknown function. Every node is
fuzzy logic. This system is capable of predicting oral cancer depicted by i. Membership values are produced by the
in a non-invasive manner on the basis of risk index of oral nodes.
cancer and other clinical parameters. The fastest Deep
Neural Based Adaptive Fuzzy System was proposed in this
context to help predict oral cancer where rules are set Here the input is x which is supplied to the node i. The
through clinical attributes. These attributes are processed associated linguistic variable is Pi which is connected to the
for normal and oral cancer differentiation. As much as node function and the membership function is depicted by μi
96.55% accuracy was obtained with success through the which pertains to Pi . Input parameters, like some
design of the procedure of classification. attributes.
Learning techniques that are neuro adaptive provide a Step 2: we presented simple formulas to compute
method that enables a procedure for fuzzy modeling to help similarity between two training datasets and testing datasets.
learn the data set information. The parameters of the For each formula, we apply an appropriate strategy to
membership functions are computed so that the inference compute the overall score:
system associated with fuzzy helps track the data given for X1=P(A,B,C….). and X2=Q(A,B,C….). Symptoms of
input or output. The parameters which are related to the attributes values
membership functions are modified through the process of
learning. A number of approaches are useful in coping with
real world problems in an efficient manner. This means that
the task related to the learning algorithm pertaining to this
architecture assists in tuning with all parameters that can be
Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: E11720275S19/19©BEIESP 401 & Sciences Publication
Detection of Oral Cancer Using Deep Neural Based Adaptive Fuzzy System in Data Mining Techniques
Perform the Karl Pearson coefficient of correlation patients who are found to have a disease are predicted to be
between X1 and X2. Determine: Pi to the term occurrence False Negative (FN). Patients with the disease and were also
for all the attributes in training datasets X1 and Qi to the found to have the disease are the True Positive (TP). Finally,
term occurrence for all the attributes in test datasets X2. the predicted patients with disease that were found to have
Step 3: Production (2nd Layer): Every node belonging to no disease are the False Positive (FP).
this layer is a fixed node. The node helps calculate firing
strength indicated by wi which pertains to a specific rule.
The output for every node as a product calculated from the
incoming data is given by,
Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: XXXXXXX 402 & Sciences Publication
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-7, Issue-5S3, February 2019
Accuracy(%)
91
90 88 SVM
Bayes Net
Performance of Accuracy (%)
89 86
88 84 DecisionStumb
87 82 AdaBoostM1
Artificial immune system
86 80 KNN
Gaussian mixture models
85 78 DNAFS
Fuzzy C-Means
84 76
83
82
81
Artificial Gaussian Fuzzy C-
immune mixture Means Classification techniques
system models
Clustering techniques
Figure 4: Performance of accuracy result
Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: E11720275S19/19©BEIESP 403 & Sciences Publication
Detection of Oral Cancer Using Deep Neural Based Adaptive Fuzzy System in Data Mining Techniques
4. Godwin Ogbuabor, and Ugwoke, F. N,” Clustering 20. Neha Sharma, Hari Om, “Extracting Significant Patterns
Algorithm For A Healthcare DATASET USING for OralCancer Detection Using Apriori Algorithm”,
SILHOUETTE SCORE VALUE”, International Journal Intelligent Information Management, Vol. 6, pp. 30-37,
of Computer Science & Information Technology 2014.
(IJCSIT) Vol 10, No 2, April 2018. 21. R. A. Mohammadpour, S. M. Abedi, S. Bagheri, and A.
5. Balasubramanian, T., & Umarani, R. (2012, March). An Ghaemian, “Fuzzy rule-based classification system for
analysis on the impact of fluoride in human health assessing coronary artery disease,” Computational and
(dental) using clustering data mining technique. In Mathematical Methods in Medicine, vol. 2015, Article
Pattern Recognition, Informatics and Medical ID 564867, 8 pages, 2015. View at Publisher · View at
Engineering (PRIME), 2012 International Conference on Google Scholar · View at Scopus
(pp. 370-375). IEEE. 22. D. Pal, K. Mandana, S. Pal, D. Sarkar, and C.
6. Banu G. Rasitha & Jamala J.H.Bousal (2015). Perdicting Chakraborty, “Fuzzy expert system approach for
Heart Attack using Fuzzy C Means Clustering coronary artery disease screening using clinical
Algorithm. International Journal of Latest Trends in parameters,” Knowledge-Based Systems, vol. 36, pp.
Engineering and Technology (IJLTET). 162–174, 2012. View at Publisher · View at Google
7. Escudero, J., Zajicek, J. P., & Ifeachor, E. (2011). Early Scholar · View at Scopus.
detection and characterization of Alzheimer's disease in
clinical scenarios using Bioprofile concepts and K-
means. In Engineering in Medicine and Biology Society,
EMBC, 2011 Annual International Conference of the
IEEE (pp. 6470-6473). IEEE.
8. Paul, R., & Hoque, A. S. M. L. (2010, July). Clustering
medical data to predict the likelihood of diseases. In
Digital Information Management (ICDIM), 2010 Fifth
International Conference on (pp. 44-49). IEEE.
9. Barrios JA, Villanueva C, Cavazos A, Colas R (2016)
“Fuzzy C-means Rule Generation for Fuzzy Entry
Temperature Prediction in a Hot Strip Mill”. J Iron Steel
Res Int 23: 116-123.
10. Krishna PG, Bhaskari DL (2016). “Fuzzy C-Means and
Fuzzy TLBO for Fuzzy Clustering.” Proc Second Int
Conf Comp Comm Technol: 479-486.
11. Ghesmoune M, Lebbah M, Azzag H (2016). “A new
growing neural gas for clustering data streams.” Neural
Networks 78: 36-50.
12. Kaur A., Kaur A.(2012), “Comparison of Mamdani-Type
and Sugeno-Type Fuzzy Inference Systems for Air
Conditioning System,” International Journal of Soft
Computing and Engineering, ISSN: 2231-2307, no. 2,
pp. 323-325.
13. Efosa C., Akwukwuma N.( 2013), “Knowledge based
Fuzzy Inference System for Sepsis Diagnosis,”
International Journal of Computational Science and
Information Technology, Vol.1, No. 3, pp. 1-7.
14. Choi H., Yoo H., Jung H., Lim T., Lee K., Ahn K.(
2015), “An ANFIS-based Energy Management Inference
Algorithm with Scheduling Technique for Legacy
Device,” International Conference on Artificial
Intelligence, Energy and Manufacturing Engineering, pp.
80-82.
15. Uc T., Karahoca A., Karahoca D. (2013), “Tuberculosis
disease diagnosis by using adaptive neuro-fuzzy
inference system and rough sets,” Neural Comput &
Applications, Springer, pp. 471-483.
16. R Jaya Suji, SP Rajagopalan,” Multi-ranked feature
selection algorithm for effective breast cancer detection”,
Biomed Res- India 2016 Special Issue Special Section:
Computational Life Science and Smarter Technological
Advancement.
17. R.Jaya Suji , Dr.S.P.Rajagopalan,” An Intelligent Oral
Cancer Diagnosis System using Texture Analysis based
Segmentation and
18. Correlated Fuzzy Neural Classifier Fuzzy Rough Set and
LS-SVM”, International Journal of Applied Engineering
Research ISSN 0973-4562 Volume 10, Number 6
(2015).
19. Jaya Suji. R, Dr.Rajagopalan S.P, “An automatic Oral
Cancer Classification using Data Mining Techniques”,
International Journal of Advanced Research in Computer
and Communication Engineering, Vol.2, No.10, 2013.
Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: XXXXXXX 404 & Sciences Publication