You are on page 1of 6

BIG DATA ANALYSIS FOR HEART

DISEASE DETECTION SYSTEM


USING MAP REDUCE TECHNIQUE
1 2
G.Vaishali , V.Kalaivani
1
PG Scholar, National Engineering College, Kovilpatti, Tamilnadu,
2
Professor, Department of Computer Science and Engineering, National Engineering College,
Kovilpatti, Tamilnadu
1
vaishvaishu232@gmail.com
2
kalai.nec@gmail.com

Abstract: In today’s world, the enormous information in health care is to be processed in order to identify,
diagnose, detect and prevent the various diseases. Big data analysis is the challenging one because big data
contain large amount of records. It is proposed to develop a centralized patient monitoring system using big
data. In the proposed system, large set of medical records are taken as input. From this medical dataset, it is
aimed to extract the needed information from the record of heart patients using map reduce technique.
Heart disease is a major health problem and it is the leading causes of death throughout the world. Early
detection of heart disease has become an important issue in the medical research fields. For heart disease
detection, some features are analyzed such as RR interval, QRS interval and QT interval. The classification
process states whether the patient is normal or abnormal and in the detection step using map reduce
technique to detect the disease and reduce the dataset. Thus, the proposed system helps to classify a large
and complex medical dataset and detect the heart disease.

Keywords :

Big data, Hadoop and Map Reduce.

I. INTRODUCTION user friendly tool.


In health care field, there are
Nowadays, a large volume of data is available different types of data are available like in
in most of the real time applications. This raw signals, images etc. In medical field, big data is
unstructured data is of no use until it is pre- an booming factor because in this lot of
processed into useful information. It is research work are emerging for the
necessary to analyze this huge amount of data classification of diseases. Big data analytics
and extract useful information from it. For that enables organizations to analyze a mix of
extraction process, data mining technology is structured, semi-structured and unstructured
needed. Nowadays big data is the emerging data in search of valuable business
technology. Big data means when the data information. Big data analytics is the process
which is to be mined varies from a small data of examining large data sets containing a
set to a large dataset. It is proposed to develop variety of data types. In the proposed work,
the big data analysis for medical application UCI repository datasets is used for analysis
using data mining. Data mining plays a vital and some real data datasets are considered
role in big data. from various hospitals and medical care centre.
The term “Big Data” was introduced
in 2005. A massive volume of both structured 1.1 Characteristics of Big Data
and unstructured data and that is so large and it
is difficult to store, analyze, process, share, There are 4V’s in bigdata. They are
visualize and manage with normal database
and software techniques because it is not much Volume
capacity to store large data. Nowadays, many Volume indicates the amounts of data
tools are available for processing big data like generated every second. It mainly referred as
Hadoop, MongoDb, Talend, Tableau, pentaho, quantity of data. The data available through
Google charts, SAP inMemory etc. In the social website
proposed work, use Hadoop tool and it is a
and sensor networks going to cross from Peta source, mining and analysis, user interest
bytes to Zeta bytes. modeling and security. One of the main
Variety characteristics of big data application is
Variety refers to the different types autonomous data sources with distributed and
of data. In the past, researchers only focused decentralized control. In which the authors
on structured data and data are arranged in analyzed some issues in the tier model as data
tabular form like financial data, Structured, sharing and privacy, domain and application
semi-structured, unstructured; Text, image, knowledge. Big data is mainly applicable for
audio, video, record. healthcare system. Kiyana Zolfaghar et al [3]
Velocity discussed about big data driven solutions to
Velocity Refers to the speed of data predict the 30-day risk of readmission for
processed. This is a concept which indicates congestive heart failure (CHF) incident. They
the speed at which the data generated and mainly used MHS (Multicare Health System)
become historical. Big data is able to handle all data set. First they extracted useful factor from
types of data. National Inpatient Dataset(NIS) and augment it
Variability with our patient data set from MHS. Then they
It describes the amount of variance developed scalable data mining models to
used in data that kept within the data bank and predict risk of readmission using the integrated
refers how they are spread out or closely data set. Also they used random forest
clustered within the data set. algorithm because it can work with all types of
predictor variables. In the above paper, they
II. HADOOP discussed 30-day risk solution. For taking the
dataset as whole like rural area, the data
Hadoop tool is an Apache open source processing may vary. Muni Kumar N et al[4]
framework written in java that allows identified the massive shortage of proper
distributed processing of large datasets across healthcare facilities and addressed how to
grouping of computers using simple programs. provide greater access to primary health care
For the proposed work, the hadoop tool is service in rural areas of India. Big data
installed using cygwin terminal for user processing in real time situation is to turn the
friendly environment. Hadoop tool is mainly dream of Svasth Bharath (Healthy India) into
used for processing large amount of data. In reality. They analyzed some key factors to
hadoop, there is distributed storage system for make the performance of health centre better
storing of the data. Hadoop is designed to scale and people live healthier.
up from single server to thousands of systems, The proposed concept enables
each providing local computation and storage. doctors, patients and staff to have role-based
access to information on electronic health
Map Reduce records. They proposed the following seven
big ideas to fix rural health care in India and
Map Reduce is a programming language bridge the gap between quality and
for writing parallel and distributed applications affordability in government hospitals. For
devised at distributed storage level. The processing these large volume of data, they
MapReduce program run on the Hadoop tool used hadoop tool. K. Sharmila et al [9]
in an Apache open source framework. A Map examined and revealed the benefits of hadoop
Reduce program is composed of a Map in the healthcare sector using data mining. The
function procedure that performs mapping apache hadoop has become a worldwide
process and produce intermediate key value adoption and it has brought parallel processing
pair as (k1,va1) -> list(k2,v2) and a Reduce in the hands of average programmer for big
method as (k2,list(v2)) -> (k3, v3) that data. They presented an overview of various
performs that match with intermediate key data mining techniques and application of
value pair and display the count value as diabetic dataset using this platform. This
output. helped us to acquire knowledge about how
hadoop can be implemented to predict the
III LITERATURE SURVEY diabetics and related disease. In the hadoop
tool, they have used map reduce concept.
Map reduce can divide the dataset
Xindong Wu et al [10] proposed into multiple chunks, each will be processed in
HACE theorem that characterized the features parallel among multiple nodes. Sathiyavathi R
of the big data revolution and proposed a big [8] enlightened that the first step is to collect
data processing model from the data mining the data from various sources, prediction
perspective. The data-driven model involves attribute will be identified and then the
demand driven aggregation of information,
respective algorithm should be applied in the functionality, Devi.L [2] et al suggested the
case. It specifies a map function that processes concept as cache manager. Before executing
a key/value pair to generate a set of the actual computing work, task queries the
intermediate key/value pairs and a reduce cache manager. In a data aware cache, cache
function that merges all intermediate values request and cache reply mechanisms are
associated with the same intermediate key. For designed. Implementing cache by extending
improving the above model, Saravana N et al hadoop it improves the completion time of
[7] used predictive analysis algorithm in map reduce jobs. It detects the occurrence of
hadoop/map reduce environment to predict the repeated job in the incremental data process.
diabetes types and the type of treatment to be Also, it stops the repeated work and minimizes
provided. This algorithm included various the processing time so that to provide the
phases like data warehousing, data collection, optimized usage of Map Reduce nodes. The
analysis and submitted the analyzed report. For data aware cache in map reduce framework
diabetic treatment, it is necessary to test the helps to overcome this problem and provide
patterns like plasma, glucose concentration, high efficiency in incremental processing.
serum insulin, diastolic blood pressure,
diabetes pedigree, Body Mass Index (BMI). IV PROPOSED WORK
This system is used to predict and classify the The main objective of the proposed
types of DM and it leads to the improved focus work is to build a Big data Analysis System
on every individual patient health. In another that helps to classify a large and complex
research paper, A.Pradeepa et al [5] presented medical dataset and detect the disease. In the
consequent algorithms corresponding to the proposed system, large set of medical records
map reduce based on roughest theory, they put are considered. From this medical dataset, it is
forward to deal with the massive data. Rough aimed to extract the needed information from
set theory proposed a new mathematical the record of heart disease patients. For this
approach to imperfect knowledge. It explained extraction, features in the data set are analyzed.
the topological operations, interior and closure The analyzed features are classified to detect
called approximations. Rough set has resolved the condition of the heart as normal or
complex problems. It is a powerful abnormal. Also, it is aimed to identify the type
mathematical tool to describe the dependencies of heart disease and reduce the dataset using
among attributes, evaluate the significance of map reduce concept. The goal is to extract the
attributes, and derive decision rules. useful information from large volumes of
This hadoop tool is also used in dataset collected from various sources. In the
enterprise data, Praveen Kumar et al [6] proposed system, it is aimed to take heart
discussed the challenges of processing these disease dataset to classify and detect the
huge chunk of data and have found that none various types of heart disease.
of the existing centralized architecture could
efficiently handle this huge volume of data.
Map reduce is a tool implemented for
managing and processing vast amount of
unstructured data in parallel. Programs which
are map reduces are programmed to manage
the vast amounts of data. This enables parallel
processing of the problem and efficient
computation was possible. For big data
security Bhawna Gupta et al [1] discussed
about some techniques how big data is
analyzed by using the technique of hadoop and
why the big data security analytics is important
to mitigate the security threats to secure the
enterprise data more efficiently. Figure 4.1 Flow diagram for the proposed
There should be number of system
opportunities for big data security analytics to 4.1 Feature Analysis
enter the enterprise security. They would use From the preprocessed data, some
the result for securing and implementing features for disease detection are analyzed. The
preventive measures from threats to enterprise features to be analyzed are RR interval, QRS
data. Some researchers are using network interval, QT interval. In the feature analysis
monitoring tools like Packet pig, Mahout etc. phase, the mean value for all the intervals are
to enhance the security levels. Then for calculated for analyzing the type of heart
improving the efficiency of map reduce diseases
4.2 Reducing Phase The figure 5.2 shows the feature values
In reducing phase, reduce function is such as RR interval, QRS interval, QT
used to merge the values from map function interval for every patients.
into a single result. It reduces a set of
intermediate values which share a key to a
smaller set of values.
4.3 Mapping Phase
In a mapping phase, first mapper
tokenizes the document and emits an
intermediate key-value pair for every record.
After this process each of these elements will
then be sorted by their key.

V IMPLEMENTATION RESULTS
The following Fig 5.1 shows the GUI
of the proposed system of Centralized Patient
Monitoring System using Big Data.

Figure 5.3 Shows the feature value for per


day
Figure 5.3 shows, by clicking per day
button, it shows the mean value for above
selected interval for the patient in the next
frame.

Figure 5.1 Front page of the proposed


system

5.2 Feature Analysis

Fig 5.4 Shows the feature value for per week

Figure 5.4 shows, by clicking per


week button, it shows the mean value for
above selected interval for the patient in the
Fig. 5.2 Displaying the feature values for next frame.
every patient.
5.5 Patient entry form 5.7 Classification Results

Figure 5.7 Classification result

Figure 5.7 shows, by clicking the


button ‘Get status’ it displays the patient
condition and class.

Thus the implementation results show the


details of heart patient, it displays the patient
condition and it displays the classification
result of heart disease of the patient.
Figure 5.5 Patient entry form in the system
6. CONCLUSION
Figure 5.5 shows, by clicking the
insert button, the patient details are entered and Thus the data set are preprocessed and
it is inserted successfully in the database. features are analyzed. The features of heart
disease dataset such as RR interval, QRS
5.6 Identification of Disease interval and QT interval are analyzed. Using
rule based classification, the features are
classified for knowing the patient’s condition
and it displays the type of heart disease. In the
future work, the data set will be reduced using
map reduce technique. This system is expected
to be useful in the medical field for the
physician to easily analyze the heart disease. It
will aid the physicians for taking decision.

7. FUTURE WORK

For future work, we will detect many


type of diseases using the big data set and by
analyzing the data set, the required data will be
predicted from the data set as easily using map
reduce concept.

Figure 5.6 Checking of patient condition 8. REFERENCES

Figure 5.6 shows, by clicking get [1] Bhawna Gupta and Dr. Kiran Jyoti
status button in the frame, it displays the (2014),” Big Data Analytics with Hadoop to
patient condition and it displays the type of analyze Targeted Attacks on Enterprise Data”,
class. (IJCSIT) International Journal of Computer
Science and Information Technologies, Vol. 5,
No.3, pp. 3867-3870.

[2] Devi. L and S. Gowri (2015),” Optimizing


MapReduce Functionality in bigdata using
Cache Manager”, ARPN Journal of
Engineering and Applied SciencesVol. 10,
No.12, pp. 1819-6608.

[3] Kiyana Zolfaghar, Naren Meadem, Ankur


Teredesai, Senjuti Basu Roy, Si-Chi Chin and
Brian Muckian(2013) “Big Data Solutions for
Predicting Risk-of-Readmission for
Congestive Heart Failure”, IEEE International
Conference on Big Data Vol.3, No 6, pp. 64-
71.

[4] Muni Kumar N and Manjula R (2014),


“Role of Big Data Analytics in Rural Health
Care -A Step Towards Svasth Bharath” ,
(IJCSIT) International Journal of Computer
Science andInformation Technologies, Vol.5,
No.6, pp.7172-7178.

[5] Pradeepa A, Dr. Antony Selvadoss


Thanamani (2013),” Hadoop File System and
Fundamental Concept of MapReduce Interior
and Closure Rough Set Approximations”,
International Journal of Advanced Research in
Computer and Communication Engineering
Vol. 2 ,No.10, pp. 5865-5868.

[6] Praveen Kumar and Dr Vijay Singh


Rathore (2014.),” Efficient Capabilities of
Processing of Big Data using Hadoop Map
Reduce” , International Journal of Advanced
Research in Computer and Communication
Engineering Vol. 3, No. 6, pp. 4421-4425.

[7] Saravana N ,M Ramachandran and S


Lavanya Kumar (2015),” Predictive
Methodology for Diabetic Data Analysis in
Big Data” , ScienceDirect - Procedia Computer
Science Vol 50, pp. 203 – 208.

[8] Sathiyavathi R (2015), “A Survey: Big


Data Analytics on Healthcare System” ,
HIKARI Ltd Contemporary Engineering
Sciences, Vol. 8, No. 3, pp. 121 – 125.

[9] Sharmila R and Dr. S. A.Vethamanickam


(2015), “Survey on Data Mining Algorithm
and Its Application in Healthcare Sector Using
Hadoop Platform” , International Journal of

[10] Xindong Wu, Fellow, Xingquan


Zhu,Gong-Qing Wu, and Wei Ding (2014)
“Data Mining with Big Data”, IEEE
Transactions on Knowledge and Data
Engineering, Vol. 26, No 1, pp.97-107.

You might also like