Analysis of Research in Healthcare Data Analytics

ANALYSIS OF RESEARCH IN HEALTHCARE DATA ANALYTICS
INTRODUCTION:
Big Data is a buzzword which is reigning the innovation market from quiet sometime. An
enormous amount of data often referred to as Big Data is getting generated everyday by
diverse segments of industries like business, finance, manufacturing, healthcare, education,
research and development etc. The traditional DBMS’s and RDBMS’s in market are
incapable of storing such vast amount of data. We cannot take full advantage of the hidden
knowledge and information from this data as the traditional data mining algorithms do not
work effectively on this enormous data. So there is need of developing and using effective,
innovative tools and technologies offered by Big Data. “If you want to find out how Big Data
is helping to make the world a better place, there’s no better example than the uses being
found for it in healthcare.”
The healthcare industry has generated enormous amount of data till date which is
measured in petabyte /exabyte scale. With such fast growing rate of growth of data, U.S.
healthcare alone will soon reach the zettabyte (1021 gigabytes) scale. The goal of
healthcare industry is to analyse this big volume of data for unknown and useful facts,
patterns, associations and trends with help of machine learning algorithms, which can give
birth to new line of treatment of diseases. The aim is to provide high quality healthcare at
lower cost to all. This will benefit the community and nation as a whole.
BENEFITS:
The tremendous amount of varied data gives an opportunity to researchers in field Health
informatics professional to use tools and techniques for unlocking the hidden answers.
BDA tools and techniques when applied effectively to this volume data can be beneficial in
following ways:
For individual’s/patients: Generally while deciding any line of treatment for a patient,
historical data(of a set of similar patients) about the symptoms, drugs used,
outcome/response of different patients is taken into account. With help of BDA, the move is
towards formulating a personalised line of treatment for a patient based on his genomic
data, location, weather, lifestyle, medical history, response to certain medicines, allergies,
family history etc. When the genome data is fully explored some kind of relation can be
established between the DNA and a particular disease. Then the specific line of treatment
can be formulated for this subset of individuals. The patients will benefit in following ways:
 Correct and effective line of treatment.
 Better informed health related decisions.
 Preventive steps can be taken in time.
 Continuous health monitoring at patients place using wearable wireless devices.
 Designing personalised line of treatment for patient.
 Increase in life expectancy and quality.
2. For Hospitals: By using effective BDA techniques on the data available the hospitals can
reap following benefits:
 Predict the patients which are likely to stay for longer time or going to be readmitted
after the treatment.
 Identifying patients that are at risk for hospitalization. Thus, health care providers could
develop new healthcare plans to prevent hospitalization.
 Various questions can be answered by analysing the data using BDA tools and techniques
like will a patient respond positively to a particular treatment?, Is a surgery required to be
done on patient and will he/she respond to it?, Is the patient prone to catching disease after
treatment?, what is his likelihood of getting affected from the same disease in near future?
 The hospital authorities can take better informed and managed administrative decisions.
Like if no. of patients are not getting cured early and no. of readmissions are increasing
because patients become ill again after treatment, then diagnose the root cause of problem,
hire more competitive and experienced staff, invest in better drugs/instruments which can
aid in effective treatment, increase the cleanliness of hospital, make the treatment more
timely, engage more staff on floor, plan for more frequent post treatment follow ups etc
3. For Insurance Companies: A large amount of expenditure is done by governments for

giving medical claims to patients. By using BDA we can analyse, identify, predict and
minimize the possible frauds related to medical claims.
4. For Pharmaceutical company: By using BDA techniques effectively, the R&D can help to
produce in a shorter time ,drugs/ instruments/tools that are most effective for treating a
specific disease.
5. For government: The government can use demographic data, historical data of disease
outbreak, weather data, data from social media over disease keywords like cholera, flu etc.
They can analyse this massive data to predict epidemics, by finding correlation between
the weather and likely occurrence of disease. Therefore preventive measures can be used
for avoiding the same. The BDA can help in improving the public health surveillance and
speed up the response to disease outbreaks.
ABSTRACT:
The main aim of this project is to provide a deep analysis on the research field of healthcare
data analytics as well as highlighting some of guidelines and gaps in previous studies. This
study has focused on searching relevant papers about healthcare analytics. The paper has
listed some data analytics tools and techniques that have been used to improve healthcare
performance in many areas such as: medical operations, reports, decision making, and
prediction and prevention system. The health data are obtained from the patients using
web forms and the data collected are converted into an excel file. One category of statistical
dimension reduction techniques is singular value decomposition. Thus, we would reduce
400 numbers in the original matrix to 40 + 10 = 50 numbers in the compressed matrix, a
nearly 90% reduction in information. From the excel file the input is given to R
programming studio for data analysis. In R programming clustering and classification are
processed to define the normal or abnormal conditions. For clustering we used k-means
algorithm and for classification we used support vector machine (SVM). Finally the
output is shown using Shinyapps.io web services.
EXISTING SYSTEM:
In existing system, only wearable sensors with Internet of Things (IoT) based monitoring
alone is implemented. Health care based data analytics has large scope and it’s under
research. Healthcare data has become more complex for the reason that large amount of
data are being available lately, along with the rapid change of technologies and mobile
applications and new diseases have discovered. Therefore, healthcare sectors have
believed that healthcare data analytics tools are really important subject in order to
manage a large amount of complex data, which can lead to improve healthcare industries
and help medical practice to reach a high level of efficiency and work flow accuracy.
Clinical decisions are often made based on doctors’ intuition and experience rather than on
the knowledge-rich data hidden in the database. This practice leads to unwanted biases,
errors and excessive medical costs which affects the quality of service provided to patients.
Most hospitals today employ some sort of hospital information systems to manage their
healthcare or patient data. These systems typically generate huge amounts of data which
take the form of numbers, text, charts and images. Unfortunately, these data are rarely used
to support clinical decision making. Poor clinical decisions can lead to disastrous
consequences which are therefore unacceptable.
DRAWBACKS:
1. In existing system, doctors only manually make the decision which leads to
unwanted biases, errors and excessive medical costs which affects the quality of
service provided to patients.
2. Machine learning techniques in healthcare is not effectively used.
PROPOSED SYSTEM:
Our proposed system improvise healthcare data analytics due to the rapid growth and
evolution of technology. Moreover, it’s also aims to promise professionals of a better
quality of medical results, as well as reduce time needed to analyze healthcare data by
keeping systems up to-date and sorting medical data in a logical structure along with
accessing and retrieving patient’s data fast and smoothly. Also machine learning based
respective patient health prediction is performed and particular solution is been prompted
to the patient within a short period of time.
The proposed system has listed data analytics tools and techniques that have been used to
improve healthcare performance in many areas such as: medical operations, reports,
decision making, and prediction and prevention system. The health data are obtained from
the patients using web forms and the data collected are converted into an excel file. One
category of statistical dimension reduction techniques is singular value decomposition.
Thus, we would reduce 400 numbers in the original matrix to 40 + 10 = 50 numbers in the
compressed matrix, a nearly 90% reduction in information. From the excel file the input is
given to R programming studio for data analysis. In R programming clustering and
classification are processed to define the normal or abnormal conditions. For clustering we
used k-means algorithm and for classification we used support vector machine (SVM).
Finally the output is shown using Shinyapps.io web services.
SYSTEM REQUIREMENT:
SOFTWARE REQUIREMENTS:
■ Operating system : Windows 7 Professional.

■ R Programming
MODULES:
1. Patient health data

2. SVD for data compression
3. R Programming based data analytics
4. Clustering
5. Classification
1. PATIENT HEALTH DATA
E-HEALTHCARE systems significantly facilitate the health condition monitoring, disease

modeling and early intervention, and evidence-based medical treatment. A set of body
sensors is deployed on, in or around the patient to collect the realtime personal health
information (PHI) in terms of both text (i.e. body temperature and blood pressure) and
image (i.e. electrocardiogram (ECG), electroencephalogram (EEG) and endoscopy). Mostly
patients are likely to suffer from not only the disease but the inequitable treatment and
proper solutions. Hence in this module we obtain the patient real time data using Java web
forms and data analytics are performed for predictive solution using R programming.
2. SVD FOR DATA COMPRESSION
If we believed that the first left and right singular vectors, call them u1 and v1, captured all
of the variation in the data, then we could approximate the original data matrix with
X ≈u1v′ 1
Thus, we would reduce 400 numbers in the original matrix to 40 + 10 = 50 numbers in the
compressed matrix, an early 90% reduction in information. Here’s what the original data
and the approximation would look like.
Obviously, the two matrices are not identical, but the approximation seems reasonable in
this case. This is not surprising given that there was only one real feature in the original
data.
3. R PROGRAMMING BASED DATA ANALYTICS
R is a programming language and software environment for statistical analysis, graphics

representation and reporting. A huge amounts of multidimensional data have been
collected in various fields such as marketing, bio-medical and geo-spatial fields. Mining
knowledge from these big data becomes a highly demanding field. However, it far
exceeded human’s ability to analyze these huge data. Unsupervised Machine
Learning or clustering is one of the important data mining methods for discovering
knowledge in multidimensional data.
4. CLUSTERING:
K-Means Clustering:
K Means Clustering is an unsupervised learning algorithm that tries to cluster data based
on their similarity. Unsupervised learning means that there is no outcome to be predicted,
and the algorithm just tries to find patterns in the data. In k means clustering, we have the
specify the number of clusters we want the data to be grouped into. The algorithm
randomly assigns each observation to a cluster, and finds the centroid of each cluster. Then,
the algorithm iterates through two steps:
 Reassign data points to the cluster whose centroid is closest.

 Calculate new centroid of each cluster.
These two steps are repeated till the within cluster variation cannot be reduced any
further. The within cluster variation is calculated as the sum of the euclidean distance
between the data points and their respective cluster centroids.
6. SUPPORT VECTOR MACHINE:
“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be
used for both classification and regression challenges. However, it is mostly
used in classification problems. In this algorithm, we plot each data item as a point in n-
dimensional space (where n is number of features you have) with the value of each feature
being the value of a particular coordinate. Then, we perform classification by finding the
hyper-plane that differentiate the two classes very well.
Support Vector Machine or SVM is a further extension to SVC to accommodate non-linear

boundaries. Though there is a clear distinction between various definitions but people
prefer to call all of them as SVM to avoid any complications.
Support Vectors are simply the co-ordinates of individual observation. Support Vector
Machine is a frontier which best segregates the two classes (hyper-plane/ line).
ARCHITECTURE DIAGRAM:
R-PROGRAMMING
PATIENT
CLUSTERING CLASSIFICATION
RESULT
CONCLUSION
BDA is a process which has many steps like data acquisition, cleaning etc and each step has
a unique challenge. We need to address these challenges and formulate new technological
standards, protocols so that at least technologically we become competent enough to
manage and analyse such volume of complex data. Thus using machine learning techniques
our proposed system predict early symptoms / outcome of an specific disease.
In the future, with even more advancements in the BDA processes we expect that
healthcare cost will come down drastically, life expectancy will increase, and we will see
much healthier population as compared to now with people taking more accountability and
charge of their health using technological advancements. The future of healthcare is
promising.
REFERENCES
[1] Vivek Wadhwa, ”The rise of big data brings tremendous possibilities and frightening
perils”, April 2014. Available:
http://www.washingtonpost.com/blogs/innovations/wp/2014/04/18/therise-of-big-
data-bringstremendous-possibilities-and-frightening-perils/
[2] D. Agrawal et. al, “Challenges and Opportunities with Big Data” , Big Data White Paper-
Computing Research Association, Feb 2012. Available:
http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf
[3] Wullianallur Raghupathi and Viju Raghupathi, “Big data analytics in healthcare: promise
and potential”, Health Information Science and Systems 2014, 2:3,Available:
http://www.hissjournal.com/content/2/1/3
[4] Nambiar, R. ; Cisco Syst., Inc., San Jose, CA, USA ; Bhardwaj, R. ; Sethi, A. ; Vargheese,
R.,”A look at challenges and opportunities of Big Data analytics in healthcare”, IEEE
Conference2013. Available: http://ieeexplore.ieee.org/xpl/login.jsp?
tp=&arnumber=6691753&url=http%3A%2F%2Fieeexplore.ieee.org%2F xpls
%2Fabs_all.jsp%3Farnumber%3D6691753
[5] Ahmed E. Youssef ,” A Framework for Secure Healthcare Systems Based on Big Data
Analytics in Mobile Cloud Computing Environments”, The International Journal of Ambient
Systems and Applications 06/2014,Available:
http://airccse.org/journal/ijasa/papers/2214asa01.pdf
[6] J. Archenaa, E.A. Mary Anita,” A Survey of Big Data Analytics in Healthcare and
Government”, Procedia Computer Science, Elsevier, Volume 50, 2015, Pages 408–413,Big
Data, Cloud and Computing Challenges, Available:
http://www.sciencedirect.com/science/article/pii/S1877050915005220
[7] Matthew Herland,Taghi M Khoshgoftaar and RandallWald, “A review of data mining

using bigdata in health informatics”, Herland et al. Journal of Big Data 2014,Springer, 1:2
Available:http://www.journalofbigdata.com/content/1/1/2
[8] MH Kuo, T Sahama, AW Kushniruk, EM Borycki, DK Grunwell, ―"Health big data

analytics: current perspectives, challenges and potential solutions", International Journal of
Big Data Intelligence ,Vol. 1, Issue 1, pp.114-126.
FEASIBILITY STUDY
A Feasibility Study is the analysis of a problem to determine if it can be solved

effectively. The results determine whether the solution should be implemented. This
activity takes place during the project initiation phase and is made before significant
expenses are engaged.
Definition of Feasibility Study

A feasibility study is an evaluation of a proposal designed to determine the difficulty
in carrying out a designated task. Generally, a feasibility study precedes technical
development and project implementation.
A feasibility study looks at the viability of an idea with an emphasis on identifying potential
problems and attempts to answer one main question: Will the idea work and should you
proceed with it?
Objective
The feasibility study answers the basic questions: is it realistic to address the
problem or the opportunity under consideration? And it produce a final proposal for the
management, this final report might include.
Feasibility Includes:
Project name
Problem or opportunity definition
Project description
Expected benefit
Consequence of rejection
Resource requirements
Alternatives
Other consideration
Theorization
Five Common Factors (TELOS)

1. Technology and system feasibility
2. Economic feasibility
3. Legal feasibility
4. Operational feasibility
5. Schedule feasibility
Technology and System Feasibility

The assessment is based on an outline design of system requirements in terms of
Input, Processes, Output, Fields, Programs, and Procedures. This can be quantified in terms
of volumes of data, trends, frequency of updating, etc. in order to estimate whether the new
system will perform adequately or not this means that feasibility is the study of the based
in outline.
Economic Feasibility
Economic analysis is the most frequently used method for evaluating the
effectiveness of a new system. More commonly known as cost/benefit analysis, the
procedure is to determine the benefits and savings that are expected from a candidate
system and compare them with costs. If benefits outweigh costs, then the decision is made
to design and implement the system. An entrepreneur must accurately weigh the cost
versus benefits before taking an action. Time Based.
Legal Feasibility
Determines whether the proposed system conflicts with legal requirements, e.g. a
data processing system must comply with the local Data Protection Acts.
Operational feasibility
Is a measure of how well a proposed system solves the problems, and takes
advantages of the opportunities identified during scope definition and how it satisfies the
requirements identified in the requirements analysis phase of system development.
Schedule feasibility
A project will fail if it takes too long to be completed before it is useful. Typically this
means estimating how long the system will take to develop, and if it can be completed in a
given time period using some methods like payback period. Schedule feasibility is a
measure of how reasonable the project timetable is. Given our technical expertise, are the
project deadlines reasonable? Some projects are initiated with specific deadlines. You need
to determine whether the deadlines are mandatory or desirable.
R STUDIO:
RStudio is a free and open-source integrated development environment (IDE) for R,

a programming language for statistical computing and graphics. RStudio is an integrated
development environment (IDE) for R. It includes a console, syntax-highlighting editor that
supports direct code execution, as well as tools for plotting, history, debugging and
workspace management.
RStudio is available in open source and commercial editions and runs on the desktop
(Windows, Mac, and Linux) or in a browser connected to RStudio Server or RStudio Server
Pro (Debian/Ubuntu, RedHat/CentOS, and SUSE Linux).
RStudio is written in the C++ programming language and uses the Qt framework for
its graphical user interface.
R is a powerful language and environment for statistical computing and graphics. It is a

public domain (a so called “GNU”) project which is similar to the commercial S language
and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent
Technologies) by John Chambers and colleagues. R can be considered as a different
implementation of S, and is much used in as an educational language and research tool. The
main advantages of R are the fact that R is freeware and that there is a lot of help available
online. It is quite similar to other programming packages such as MatLab (not freeware),
but more user-friendly than programming languages such as C++ or Fortran. You can use R
as it is, but for educational purposes we prefer to use R in combination with the RStudio
interface (also freeware), which has an organized layout and several extra options. This
document contains explanations, examples and exercises, which can also be understood
(hopefully) by people without any programming experience. Going through all text and
exercises takes about 1 or 2 hours. Examples of frequently used commands and error
messages are listed on the last two pages of this document and can be used as a reference
while programming.
Install RStudio
After finishing this setup, you should see an ”R” icon on you desktop. Clicking on this would
start up the standard interface. We recommend, however, to use the RStudio interface. ‡ To
install RStudio, go to:
http://www.rstudio.org/
and do the following (assuming you work on a windows computer): • click Download
RStudio • click Download RStudio Desktop • click Recommended For Your System
•download the .exe file and run it (choose default answers for all questions)
SYSTEM TESTING
Testing is an important step in software development life cycle. The process of

testing takes place at various stages of development in programming. This is a vital step in
development life cycle because the process of testing helps to identify the mistakes and
sends the program for correction.
This process gets repeated at various stages until the final unit or program is found
to be complete thus giving a total quality to the development process. The various levels
and types of testing found in a software development life cycle are as follows:
1. White Box Testing
2. Black Box Testing
3. Unit Testing
4. Regression Testing
5. Integration Testing
6. Smoke Testing
7. Alpha Testing
8. Beta Testing
After getting an idea of what all to be tested by communicating with developers and
others in the design phase of the software development life cycle the testing stage carries
on parallel. Test plan is made ready during the planning stage of testing. This test plan has
details like environment of setting like software, hardware, operating system used, the
scope and limitation of testing, test type and so on. In the next phase the test case is
prepared which has details of each step for module to be checked, input which can be used
for each action are described and recorded for testing. It also has details about what is the
expected outcome or expected result of each action.
The next phase is the actual testing phase. In this phase the testers make testing
based on the test plan and test case made ready and record the output or result resulting
from each module. Thus the actual output is recorded. Then a report is made to find the
error or defect between expected outcome and actual output in each module in each step.
This is sent for rework for developers and testing cycle again continues as above.
White Box Testing
For doing this testing process the person have to acess to the source code of the
product to be tested. So it is essential that the person doing this white box testing have
some knowledge of the program being tested. Though not necessary it would be more
worth if the programmer itself does this white box testing process since this testing
process requires the handling of source code.
Black Box Testing (Functional Testing)
This is otherwise called as functional testing. In contrary to white box testing here
the person who is doing the black box testing need not have the programming knowledge.
This is because the person doing the black box testing would access the output or outcomes
as the end user would access and would perform thorough functionality testing to check
whether the developed module or product behaves in functionality in the way it has to be.
Unit Testing
This testing is done for each module of the program to ensure the validity of each
module. This type of testing is done usually by developers by writing test cases for each
scenarios of the module and writing the results occurring in each step for each module.
Regression Testing
We all know that development life cycle is subjected to continuous changes as per
the requirements of user. Suppose if there is a change in the existing system which has
already been tested it is essential that one has to make sure that this new changes made to
the existing system do not affect the existing functionality. For ensuring this regression
testing is done.
Integration Testing
By making unit testing for each module as explained above the process of integrated
testing as a whole becomes simpler. This is because by correcting mistakes or bugs in each
module the integration of all units as a system and testing process becomes easier. So one
might think why the integration is testing needed. The answer is simple. It is needed
because unit testing ad explained test and assures correctness of only each module. But it
does not cover the aspects of how the system would behave or what error would be
reported when modules are integrated. This is done in the level of integration Testing.
Smoke Testing
This is also called as sanity testing. This is mainly used to identify environmental
related problems and is performed mostly by test manager. For any application it is always
necessary to have the environment first checked for smooth running of the application. So
in this testing process the application is run in the environment technically called as dry
run and checked to find that the application could run without any problem or abend in
between.
Alpha Testing
The above different testing process described takes place in different stages of
development as per the requirement and needs. But a final testing is always made after a
full finished product that is before it released to end users and this is called as alpha
testing. The alpha testing involves both the white box testing and black box testing thus
making alpha testing to be carried out in two phases.
Beta Testing
This process of testing is carried out to have more validity of the software
developed. This takes place after the alpha testing. After the alpha phase also the generally
the release is not made fully to all end users. The product is released to a set of people and
feedback is got from them to ensure the valididy of the product. So here normally the
testing is being done by group of end users and therefore this beta testing phase covers
black box testing or functionality testing only.
Having seen about testing levels and types it does not mean that the system released
is bug free or error free hundred percent. This is because no real system could have null
percentage error. But an important point to bear in mind is that a system developed is a
quality system only if the system could run for a period of time after its release without
error and after this time period only minimal errors are reported. For achieving this testing
phase plays an essential role in software development life cycle.
Testing Methodologies
All testes should be traceable to customer requirements. The focus of testing will shift
progressively from programs, to individual modules and finally to the entire project. To be
more effective, testing should be one, which has highest probability of finding errors. The
testing procedure that has been used is as follows:
Source Code Testing
This examines the logic of the system. If we are getting the output that is required by
the user, then we can say that the logic is perfect.
Specification Testing
We can set with, what program should do and how it should perform under various
condition. This testing is a comparative study of evolution of system performance and
system requirements.
Module Level Testing
In this the error will be found at each individual module, It encourages the
programmer to find and rectify the errors without affecting the other modules.
Unit Testing
Unit testing focuses on verifying the effort on the smallest unit of software module.
The local data structure is examined to ensure that the data stored temporarily maintains
its integrity during all steps in the algorithm’s execution. Boundary conditions are tested to
ensure that the module operates properly at boundaries established to limit.
Output Testing
After performing the validation testing, the next step is output testing of the
proposed system since no system would be termed as useful until it does produce the
required output in the specified format. Output format is considered in two ways, the
screen format and the printer format.
User Acceptance Testing
User acceptance testing is the key factor for the success of any system. The system
under consideration is tested for user acceptance by constantly keeping in touch with
prospective system users at the time of developing and making changes whenever
required. Acceptance testing involves planning and execution of functional tests.
Performance tests and stress tests in order to demonstrate that the implemented system
satisfied its requirements. When custom software its bulid for one customer, a series of
acceptance tests are conducted to enable the customer to validate all requirements. In fact
acceptance cumulative errors that might degrade the system over time will incorporate test
cases developed during integration testing, Additional testing cases are added to achieve
the desired level functional, performance and stress testing of the entire system.
Integration Testing
Integration Testing is the phase of software testing in which individual software

modules are combined and tested as a group. It follows unit testing and precedes system
testing.
Integration testing takes as its input modules that have been checked out by unit testing
groups them in larger aggregates, applies tests, and delivers as its output the integrated
system ready for system testing. In order to ensure that links across screens within a sub-
system or module are established properly, an integration/link testing is to be done. Link
testing will not cover functionality across different sub-systems but ensures the navigation
between screens.
TEST CASE
HA_001 Data I/P are Entered inputs are Pass Initialisation of

been entered. uploaded. data’s successful.
HA_002 Data I/P are Data analysis are pass Analysation

analysed done successful.
HA_003 Data sets are Clustering are done. pass Clustering is

clustered successful.
HA_004 Classification Data’s are classified Pass Classification is

among data. successfully.
HA_005 View the User only allowed Pass Viewed

classified data. successfully
SCREENSHOT:
CODING:
human_details$PARAMETERS[human_details$HEARTBEAT <= 71 ] <- "normal"
human_details$PARAMETERS[human_details$HEARTBEAT >= 72 & human_details$HEARTBEAT < 75 ] <-

"abnormal"
human_details$PARAMETERS[human_details$HEARTBEAT > 74 ] <- "critical"
human_details_samp <-human_details
summary(human_details_samp)
str(human_details_samp)
length(human_details_samp)
Y<-human_details_samp$PARAMETERS <-NULL
human_details1 <-
as.data.frame(subset(human_details_samp$HEARTBEAT,human_details_samp$HEARTBEAT>0))
human_details1
library(ggplot2)
(kmeans_human_details <- kmeans(human_details1 ,3))
kmeans_human_details$cluster
human_details2<-table(human_details$PARAMETERS,kmeans_human_details$cluster)
human_details2
test <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(test)<-c("HEARTBEAT", "BOODY_TEMP")
(kmeans_human_details1 <- kmeans(test ,3))
kmeans_human_details1$cluster
summary(kmeans_human_details1$cluster)
min(kmeans_human_details1$cluster)
max(kmeans_human_details1$cluster)
kmeans1<-kmeans_human_details1$totss
kmeans1
kmeans2<-kmeans_human_details1$withinss
kmeans2
kmeans3<-kmeans_human_details1$betweenss
kmeans3
kmeans4<-kmeans_human_details1$size
kmeans4
kmeans5<-kmeans_human_details1$iter
kmeans5
plot(test,col = kmeans_human_details1$cluster)
points(kmeans_human_details1$centers,
col = 1:5,pch =8, cex=2)

human_details_new<-human_details
head(human_details_new)
summary(human_details_new)
str(human_details_new)
length(human_details_new)
human_details_new1 <-human_details[sample(nrow(human_details)),]
train <- human_details_new[1:800,]
train
test <-human_details_new[801:1692,]
test
require(e1071)
require(caret)
library(caTools)
# split <- sample.split(human_details_new$PARAMETERS,SplitRatio = 0.55)
# train <- subset(human_details_new, split==F)
# test<- subset(human_details_new, split==T)
train$PARAMETERS <- as.factor(unlist(train$PARAMETERS))
train$PARAMETERS
test$PARAMETERS <- as.factor(unlist(test$PARAMETERS ))
test$PARAMETERS
train_class <- train[,-7]
train_class
human_details_new$GENDER <- as.factor(human_details_new$GENDER)
human_details_new$GENDER
human_details_new$GENDER <- as.numeric(human_details_new$GENDER)
human_details_new$GENDER
human_details_new$`PATIENT ID` <- as.factor(human_details_new$`PATIENT ID`)
human_details_new$`PATIENT ID`
human_details_new$`PATIENT ID`<- as.numeric(human_details_new$`PATIENT ID`)
human_details_new$`PATIENT ID`
model <- naiveBayes(train_class, train$PARAMETERS)
summary(model)
pred <- predict(model, test[,-7])
summary(pred)
plot(pred)
confusionMatrix(pred, test$PARAMETERS)
normal<-subset(test,PARAMETERS=="normal")
# normal<-subset(normal, normal$HEARTBEAT > 72 & normal$HEARTBEAT < 74)
# normal
abnormal<-subset(test,PARAMETERS=="abnormal")
critical<-subset(test,PARAMETERS=="critical")
# PATIENT1114<-subset(human_details,`PATIENT ID`=="PATIENT1114")
UI:
library(shiny)
library(ggplot2)
library(shiny)
library(e1071)
library(ggplot2)
library(shiny)
pageWithSidebar(
headerPanel('HEARTA K MEANS clustering and Classifiation DECISION TREE'),
sidebarPanel(
selectInput('HEARTBEAT', 'kmeans Variable', choices = "HEARTBEAT" ),
# selectInput('HEARTBEAT', 'Longitude_GPS Variable', names(human_details),
# selected=names(human_details)[[2]]),
# numericInput("clusters", "Cluster count", 3,
# min = 1, max = 9),
textInput("caption", "Caption:", "Data Summary"),
selectInput("confusionMat", "Choose a dataset:",
choices = ( human_details$PARAMETERS )),

textInput("caption1", "K MEANS PLOT:", "K MEANS PLOT"),
textInput("caption2", "DECISION TREE PLOT:", "DECISION TREE PLOT"),
numericInput("obs", "Number of observations to view:", 10)
),
mainPanel(
h3(textOutput("caption")),
verbatimTextOutput("summary"),
tableOutput("view"),
tableOutput("confusionMat"),
h3(textOutput("caption1")),
plotOutput('plot1'),
h3(textOutput("caption2")),
plotOutput('plot2')
)
SERVER:
library(shiny)
library(datasets)
function(input, output,session) {
datasetInput <- reactive({
switch(input$confusionMat,
"normal" = normal,
"abnormal" =abnormal,
"critical" = critical)
})
output$caption <- renderText({
input$caption
})
output$caption1 <- renderText({

input$caption1
})
output$caption2 <- renderText({
input$caption2
})
output$summary <- renderPrint({
confusionMat<- datasetInput()
summary(pred)
})
output$view <- renderTable({
head(datasetInput(),
n= input$obs)
})
output$confusionMat<- renderTable({
confusionMat<-table(pred, test$PARAMETERS)
Healthy<-subset(normal[-which(normal$PARAMETERS == 72), ])
if(input$confusionMat=="normal")
DATA= "Healthy"
Consultdoctor<-subset(abnormal[-which(abnormal$PARAMETERS > 74), ])
if(input$confusionMat=="abnormal")
DATA="Consultdoctor"
Surgery<-subset(critical[-which(critical$PARAMETERS < 70 ), ])
if(input$confusionMat=="critical")
DATA="Surgery"
# n = ifelse(input$confusionMat=="normal","a","b",)
DATA
})
#input = function(heartA) {
#c<- subset(heartA[-which(heartA$heart_temp == 72 & heartA$heart_temp > 73 &

heartA$heart_temp< 70), ])
#}
# output$confusionMat1<- renderTable({
# confusionMat1<-table(
# #
# #
# # PATIENT1114<-subset(human_details,`PATIENT ID`=="PATIENT1114"),
# # PATIENT11162<-subset(human_details,`PATIENT ID`=="PATIENT11162")
# #
# )
# })
# Combine the selected variables into a new data frame
selectedData <- reactive({
human_details[, c(input$BOODY_TEMP)]
})
clusters <- reactive({
kmeans(selectedData(), input$clusters)
})
output$plot1 <- renderPlot({
# palette(c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3",
# "#FF7F00", "#FFFF33", "#A65628", "#F781BF", "#999999"))
# par(mar = c(8.1, 7.1, 0, 1))
# plot(selectedData(),
# col = clusters()$cluster,
# pch = 10, cex = 3)
# points(clusters()$centers, pch = 4, cex = 3, lwd = 4)
test <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(test)<-c("HEARTBEAT", "BOODY_TEMP")
(kmeans_human_details1 <- kmeans(test ,3))
kmeans_human_details1$cluster
plot(test,col = kmeans_human_details1$cluster)
points(kmeans_human_details1$centers,
col = 1:5,pch =8, cex=2)

})
output$plot2 <- renderPlot({
plot(pred)
})

Analysis of Research in Healthcare Data Analytics - Sathyabama

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis of Research in Healthcare Data Analytics - Sathyabama

Uploaded by

Copyright:

Available Formats

 Correct and effective line of treatment.

 Better informed health related decisions.

 Preventive steps can be taken in time.

 Continuous health monitoring at patients place using wearable wireless devices.

 Designing personalised line of treatment for patient.

 Increase in life expectancy and quality.

3. For Insurance Companies: A large amount of expenditure is done by governments for

■ Operating system : Windows 7 Professional.

1. Patient health data

1. PATIENT HEALTH DATA

E-HEALTHCARE systems significantly facilitate the health condition monitoring, disease

2. SVD FOR DATA COMPRESSION

3. R PROGRAMMING BASED DATA ANALYTICS

R is a programming language and software environment for statistical analysis, graphics

 Reassign data points to the cluster whose centroid is closest.

6. SUPPORT VECTOR MACHINE:

Support Vector Machine or SVM is a further extension to SVC to accommodate non-linear

[7] Matthew Herland,Taghi M Khoshgoftaar and RandallWald, “A review of data mining

[8] MH Kuo, T Sahama, AW Kushniruk, EM Borycki, DK Grunwell, ―"Health big data

A Feasibility Study is the analysis of a problem to determine if it can be solved

Definition of Feasibility Study

Five Common Factors (TELOS)

Technology and System Feasibility

RStudio is a free and open-source integrated development environment (IDE) for R,

R is a powerful language and environment for statistical computing and graphics. It is a

Testing is an important step in software development life cycle. The process of

1. White Box Testing

2. Black Box Testing

White Box Testing

Black Box Testing (Functional Testing)

Source Code Testing

Module Level Testing

User Acceptance Testing

Integration Testing is the phase of software testing in which individual software

HA_001 Data I/P are Entered inputs are Pass Initialisation of

HA_002 Data I/P are Data analysis are pass Analysation

HA_003 Data sets are Clustering are done. pass Clustering is

HA_004 Classification Data’s are classified Pass Classification is

HA_005 View the User only allowed Pass Viewed

human_details$PARAMETERS[human_details$HEARTBEAT <= 71 ] <- "normal"

human_details$PARAMETERS[human_details$HEARTBEAT >= 72 & human_details$HEARTBEAT < 75 ] <-

(kmeans_human_details <- kmeans(human_details1 ,3))

test <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),

matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))

(kmeans_human_details1 <- kmeans(test ,3))

col = 1:5,pch =8, cex=2)

train <- human_details_new[1:800,]

# train <- subset(human_details_new, split==F)

# test<- subset(human_details_new, split==T)

train$PARAMETERS <- as.factor(unlist(train$PARAMETERS))

test$PARAMETERS <- as.factor(unlist(test$PARAMETERS ))

train_class <- train[,-7]

human_details_new$GENDER <- as.factor(human_details_new$GENDER)

human_details_new$GENDER <- as.numeric(human_details_new$GENDER)

human_details_new$`PATIENT ID` <- as.factor(human_details_new$`PATIENT ID`)

model <- naiveBayes(train_class, train$PARAMETERS)

pred <- predict(model, test[,-7])

# normal<-subset(normal, normal$HEARTBEAT > 72 & normal$HEARTBEAT < 74)

headerPanel('HEARTA K MEANS clustering and Classifiation DECISION TREE'),

selectInput('HEARTBEAT', 'kmeans Variable', choices = "HEARTBEAT" ),

# selectInput('HEARTBEAT', 'Longitude_GPS Variable', names(human_details),

# numericInput("clusters", "Cluster count", 3,