You are on page 1of 6

International Journal of Computer Engineering and Applications,

Volume XII, Special Issue, May 18, www.ijcea.com ISSN 2321-3469

DISEASE PREDICTION BY USING MACHINE LEARNING


Sayali Ambekar and Dr.Rashmi Phalnikar
Department of Information Technology, MIT-COE, Pune

ABSTRACT
The rapid growth in the field of data analysis plays an important role in the healthcare research.
Due to large amount of data growth in biomedical and healthcare field providing accurate
analysis of medical data that has benefits from early detection, patient care, and community
services. Previous system designed to analyze, manage and assimilate data produced by
healthcare systems. Data analysis has been applied to help the disease-related information and
treatment process. In this paper a decision tree is effectively used for predicting the outbreaks of
diseases in society. The paper proposes to experiment with the modified predictive models with
medical data which is related to the symptoms of the disease. For the disease prediction using
unstructured data, we used a convolutional neural network which is based on multimodal disease
risk prediction (CNN-MDRP) algorithm. Users can post their queries in order to seek
information regarding diseases so that user get the proper answer to any kind of question and
solving any problem related to the disease.

Keywords: Healthcare, Decision Tree, Disease Prediction, CNN-MDRP algorithm, Data


mining.

1] INTRODUCTION
Human face lots of problems related to the chronic disease. The main reason behind increase the
chronic disease such as improper living habits, insufficient physical exercise, unhealthy diet, and
irregular sleeping [3]. 80% of people in the United States, spent more amount on the diagnosis of
chronic disease [1]. People give more aid for accurate prediction of disease [1]. In many regions,

Sayali Ambekar and Dr.Rashmi Phalnikar 1


DISEASE PREDICTION BY USING MACHINE LEARNING

different diseases cause due to the environmental factors and lifestyle of people [1]. Sometimes it
may lead to the wrong decision making regarding the disease prediction. Due to preliminary
disease prediction, it can reduce the risk of disease and patient gets diagnosed as early as possible.
For the prediction of disease with the help of IoT device is done so the data collection used sensor-
generated data but the sensors are uncomfortable to the user and required multiple sensors to wear
[3].In the previous year, data mining plays an important role in the healthcare system. In this KDD
process involve, extract undiscoverable knowledge with the help of target data [7]. Data mining
divided into two part, first one is predictive and the second one is descriptive [7].The predictive
part consists of classification and regression, whereas descriptive part consists of clustering and
association rule [7]. Data mining is the way of disease prediction depends on historical records of
patients. Previous work mostly based on disease prediction who is automatically extracting a large
number of features from data for better accuracy of the system [7]. Structured data is widely used
for the disease prediction other than unstructured data. But by the use of a convolutional neural
network, it becomes easy to deal with unstructured data also [1]. The convolutional neural network
is deep learning algorithm that extracts the features automatically from the large dataset and gets
the proper result [1].

2] REVIEW OF LITERATURE
In [1], Chen proposed the disease prediction system on structured and unstructured data
from the hospital. New convolution neural network based multimodel disease risk prediction
algorithm implemented on unstructured data and risk associated with cerebral infarction.
As proposed in [2], the patient risk prediction model of certain disease with the help of
EHR data. By selecting a small amount of query of patients, experts can find the similarity between
the two patients conditions [2].And determined the correct patient disease risk.
Designed health-CPs system [3], handling huge files of various medical data on the cloud
and performed operations such as statistics, monitoring, prediction so that user can often get
various benefits related to the healthcare system and service.
In [4], Chen designed healthcare system which collects patient physiological condition by
the use of sensors in smart washable clothing.
In another study [5], proposed telehealth system which finds the optimal solution to
handling medical data onto a cloud of the assist of optimal big data sharing algorithm.
In paper [8], author proposed a best clinical decision- making system for correct prediction
of disease based on historical data of patients. Author also determined the relation between unseen
pattern and concept related to the multiple diseases. And 2D/3D graphs, pie charts used for the
visualization of medical data.

Sayali Ambekar and Dr.Rashmi Phalnikar


2
International Journal of Computer Engineering and Applications,
Volume XII, Special Issue, May 18, www.ijcea.com ISSN 2321-3469

3] PROPOSED SYSTEM
We predicted the diseases of patient on the basis of symptoms of disease from dataset. On the basis
of previous studies decision tree performance was better than any other machine learning
algorithm. So here we used decision tree to obtain better accuracy of disease.

Fig 3.1.System Architecture

We proposed a Machine Learning i.e. Decision Tree and new convolutional neural network based
unimodal disease risk prediction (CNN-UDRP) for structured data and multimodal disease risk
prediction (CNN-MDRP) algorithm using structured and unstructured data. The CNN-MDRP
performance is better than CNN-UDRP so we used convolutional neural network based
Multimodal disease risk prediction algorithm. In this unstructured text data transforming into the
proper entity based representation. The heart disease prediction is also carried out along with
another disease. The heart disease dataset is obtained from UCI repository which is in well-
formatted data. The decision tree algorithm is applied on heart disease dataset to know that patient
having heart disease or not. Then the user can post their query onto the model so that patient gets
correct answer about the query of disease and symptoms of patients. From this awareness about the

Sayali Ambekar and Dr.Rashmi Phalnikar 3


DISEASE PREDICTION BY USING MACHINE LEARNING

disease is also increasing in society and patient gets proper disease diagnosis at an earlier stage of
the disease.
3.1] Dataset
We are taking symptoms and disease data which is obtained from Github site [6]. In data consists
of structured data, structured data means the data which is in the proper format. Symptoms data
may contain missing data. Most of the existing work is only done on structured data. The heart
disease dataset is obtained from UCI repository [7]. In paper [1], proposed how to work on
unstructured data. Here we performed operations on unstructured data as well.
3.2] Evaluation Method
The experimental result evaluation, we have notation as follows:
TP: True positive (correctly predicted number of instance)
FP: False positive (incorrectly predicted number of instance),
TN: True negative (correctly predicted the number of instances as not required)
FN false negative (incorrectly predicted the number of instances as not required),
On the basis of this parameter, we can calculate four measurements
Accuracy = TP+TN÷TP+FP+TN+FN
Precision = TP ÷TP+FP
Recall= TP÷TP+FN
F1-Measure = 2×Precision×Recall ÷Precision+ Recall.

4] ALGORITHM
4.1] Decision Tree Algorithm:

A decision tree algorithm is supervised learning algorithm. This algorithm mainly used for
classifying the data based on their attribute. In this, the data is divided into the edges and the nodes.
The edges represent the output of the test on an attribute and nodes represent the test on an
attribute. For the building decision tree, we used IF-THEN rule on the decision tree. The main
advantage of this machine learning algorithm is that the decision tree performed better on a big
amount of data. The decision tree handles numerical as well as categorical data. Following steps
are performed to build decision tree:

Step1: Place the attribute of the dataset at the root of the tree.

Step2: Split the data set into subsets should be made in such a way that each subset contains data
with the same value for an attribute.

Step3: Repeat step 1 and step 2 on each subset until you find leaf nodes in all the branches of the
tree.

Sayali Ambekar and Dr.Rashmi Phalnikar


4
International Journal of Computer Engineering and Applications,
Volume XII, Special Issue, May 18, www.ijcea.com ISSN 2321-3469

4.2] CNN-MDRP Algorithm:

Step 1: Select the specific training parameters.

Step 2: We use stochastic gradient method to train parameters, and finally reach the risk
assessment of whether the patient suffers from disease.

5] EXPECTED RESULT

The experiment conducted on the training data consisting of medical symptoms of patients. The
system is uses a decision tree algorithm for prediction of disease based on their patient symptoms.
As compare to pervious paper the performance and accuracy of decision tree algorithm is 70%
accurate for prediction of symptoms from the diseases. As well as the performance of disease
prediction with the help of unstructured data will compare from the previous paper. Also we will
further improve heart disease prediction by the use of decision tree algorithm. The patient can post
the query into the system so that the patient gets proper answer related to the question.
Furthermore, we will be enhancing decision tree algorithm for prediction symptoms based on
disease.

6] CONCLUSION

In this paper proposed a machine learning and new multimodal disease risk prediction algorithm
based on the convolutional neural network (CNN-MDRP) using structured and unstructured data.
We must have to improve the accuracy of the disease prediction of structured and unstructured
data. By giving the input of symptoms we will get accurate disease prediction as output, which will
help us understand the level of disease risk prediction. This system lead in low time consumption
and minimal cost possible for disease prediction. In future scope, we may add more disease into it
so the society gets more benefits about this system.

Sayali Ambekar and Dr.Rashmi Phalnikar 5


DISEASE PREDICTION BY USING MACHINE LEARNING

REFERENCES

[1] M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang, ‘‘Disease prediction by machine learning over
big data from healthcare communities,’’ IEEE Access, vol. 5, no. 1, pp. 8869–8879, 2017.
[2]B. Qian, X. Wang, N. Cao, H. Li, and Y.-G. Jiang,‘‘ A relative similarity
basedmethodforinteractivepatientriskprediction,’’DataMiningKnowl.Discovery, vol. 29, no. 4, pp. 1070–
1093, 2015.
[3]M. Chen, Y. Ma, Y. Li, D. Wu, Y. Zhang, and C. Youn, ‘‘Wearable 2.0: Enable human-cloud
integration in next generation healthcare system,’’ IEEE Commun. , vol. 55, no. 1, pp. 54–61, Jan. 2017.
[4]Y. Zhang, M. Qiu, C.-W. Tsai, M. M. Hassan, and A. Alamri, ‘‘HealthCPS: Healthcare cyber-physical
system assisted by cloud and big data,’’ IEEE Syst. J., vol. 11, no. 1, pp. 88–95, Mar. 2017.
[5]L. Qiu, K. Gai, and M. Qiu, ‘‘Optimal big data sharing approach for telehealth in cloud computing, ’’in
Proc. IEEE Int. Conf. Smart Cloud (Smart Cloud), Nov. 2016, pp. 184–189.
[6] Disease and symptoms Dataset -WWW.Github.com.
[7]Heart disease Dataset-WWW.UCI Repository.com.
[8]Ajinkya Kunjir, Harshal Sawant, Nuzhat F.Shaikh, "Data Mining and Visualization for prediction of
Multiple Diseases in Healthcare, " in IEEE big data analytics and computational intelligence, Oct 2017
pp.23-25.

Sayali Ambekar and Dr.Rashmi Phalnikar


6

You might also like