You are on page 1of 125

Data Driven Health

Informatics
Digital Lecture Companion

Dimitrios Zikos, Ph.D Health Informatics, M.Sc., B.S.N


Table of Contents

Chapter Page

Preface……………………………………………………………….. 2

1. Introduction to the Discipline…………………………….……... 3

2. The Nature of Data in Healthcare……………………….……… 19

3. Data Workflow and users of health data…………………..…... 32

4. Data Collection Methods in Healthcare…………………..……. 39

5. Hospital Information Systems…………………………………… 53

6. Electronic Health Records……………………………….……….. 64

7. Standards in Healthcare………………………………………….. 73

8. Databases in Healthcare…………………………………...…….. 78

9. Interoperability in Healthcare……………………………….….. 87

10. Security and Privacy of Data in Healthcare…………………… 97

11. Big Data in Healthcare and Emerging Challenges…………... 105

12. Data Mining for Decision Making in Healthcare……….......... 113

1
Dimitrios Zikos: Data Driven Health Informatics
Preface
Data collection, maintenance, retrieval and analysis are especially important in healthcare,
since providers and facilities need immediate access to patient and administrative
information for clinical and administrative decision making. Health informatics provides the
appropriate technologies, tools and methods to efficiently support healthcare and optimally
utilize the wealth of available health care data, in favor of decisions in primary, hospital and
tertiary care. This book begins by introducing the reader to the discipline of health
informatics, and further discusses the fundamental concepts of data and information,
focusing on the nature of health care data and the data flow and sharing across health care
users. The principles of data collection in healthcare are covered, with reference to the
sensitive and complex nature of health data. The Hospital Information Systems and
Electronic Health Records are a discussed, as they are very important tools that facilitate the
information management and support clinical decisions. There is also an extensive coverage
of the interoperability dimensions in healthcare and also the main principles of database
design. We want to provide to the reader a high level of understanding about the healthcare
data organization. The book furthermore explores the challenges and opportunities coming
from the use of big data and covers the fundamental principles of data mining. This textbook
aims to provide to healthcare providers and administrators a thorough understanding of the
significance of the use of new technologies and the potential of utilizing healthcare data to
support healthcare professionals and decision makers make data driven, evidence based
decisions.

The author
Dimitrios Zikos is assistant professor at the Central Michigan University. He holds a
Bachelor of Science degree in Nursing and a Master’s and Ph.D. degree in Health Informatics,
from the University of Athens, Greece. His research involves clinical and administrative
informatics, clinical statistics and data mining, healthcare delivery and clinical care policies
for decision making. In 2013, he arrived at the United States as a visiting professor to help
the University of Texas at Arlington develop new curricula in Health Informatics and
supervised undergraduate and graduate students in health analytics projects. Dr. Zikos has
been investigator of national and regional funded projects and co-organizer of an NSF-
sponsored annual conference PETRA (Pervasive Technologies Related to Assistive
Environments). While overseas, researcher and project manager in large scale European
Union multi-country e-health projects and partner with an accreditation and research center
in Greece for the European Union Network for Patient Safety. This research resulted in
translational guidelines for health professionals and the public on patient safety. He is the
author of numerous peer-reviewed journal and conference papers and reviewer for numerous
journal and conference contributions.

2
Dimitrios Zikos: Data Driven Health Informatics
Chapter 1. Introduction to the Discipline
The Information in Healthcare
In health sciences there are procedures of high complexity with reference to biological
organizations & their functions. A disease is very often a result of interactions of a patient in
their environmental and psychosocial context. Therefore, when evaluating health, many
considerations should be made, and a large number attributes need to be collected, stored
and retrieved during decision making.
The information is produced and communicated during the above mentioned procedures:
there are multiple users that share this information in a hospital and outside of the hospital.
Very often, same data are accessed concurrently by different health providers, who will need
this data for different purposes. Information changes dynamically during the care provision,
for example when a new exam is prescribed or when the treatment plan of a patient is altered.
It is also important to understand that the health care information derives from the
combination of many atomic health data, which have to be assessed together to produce
useful information for the clinical decision making.
The majority of health related information coming from the analysis of numerical, Boolean
data but also from images, video and sound data.

Why Health Informatics?


Health Informatics provides information to make decisions
Better information leads to better decisions
Health care, management, planning and policy all need good information
Health care, health management, health policy and health planning all depend on having
good information to make decisions.

Categories of Information in Healthcare


In terms of the scope of the use of information, there are four categories of health care
information: clinical, administrative, financial and population health information, all used in
various ways to facilitate decision making. There are some important considerations related
to health care data:
Information is data that have meaning: It can be presented in any medium (text, lists or
graphics) in the manner that the end user prefers. A physiological measurement of 150 is
simply a number, we do not know what this figure refers to and how we can measure it. When
we place this number within a context, in other words, when we label it, it becomes an

3
Dimitrios Zikos: Data Driven Health Informatics
information that can be used by the clinician. If this measurement refers to the diastolic blood
pressure of a patient, measured in mmHg, then the health professional will have acquired
information about a patient’s blood pressure. This is a fact that we learned about this patient
and we can assess this information and compare with clinical expectations for this specific
patient, and make evidence based decisions about the treatment plan.
Access & delivery methods of information: Since the 90s there was a slow but steady
transition from traditional to electronic methods to store, access and communicate health
care information. There have been different architectures for the implementation of hospital
information systems over the past four to five decades, and the transition to new
architectures generally followed the evolution of information systems, in general. The recent
approach is based on dedicated private cloud services that can be accessed by health
professionals. This is an important evolution step, from the traditional views to access to
personalized services.
The information in health care is dynamic: patient information constantly changes during a
hospitalization and needs to be up to date. Health professionals need to constantly reassess
any new input for informed decisions.

Methods of utilizing data to inform clinical decisions


A health professional utilizes four critical methods involving data, to make decisions: (i)
medical knowledge base (ii) information coming from patient (iii) experience and judgment
(iv) application of data mining methods on historical data for knowledge acquisition.

The medical knowledge base is acquired via studying, revisiting, reviewing the medical
knowledge of a health professional’s area of expertise. It is not limited to the university
acquired knowledge, but also involves continuous education, reviewing recent literature and
publications, attending conferences and scientific meetings.
Information coming from patient: includes and is not limited to, the patient history,
medical exams, vital signs, radiology tests.

Experience, medical judgment: we refer to the clinician putting together all the different
pieces of a puzzle, including the three components above, to decide on a diagnosis for a patient
and consequently which one would be the treatment options. Doctors, during the medical
judgment have learned how to do a differential diagnosis, which is the process of
differentiating between two or more conditions that share similar signs or symptoms. For
nurses, the nursing assessment is part of the nursing process and is a systematic procedure
that nurses follow. It involves the collection and analysis of the available patient data, and
is not limited to the physiological, but also involving psychological, sociocultural, and lifestyle
factors too.

4
Dimitrios Zikos: Data Driven Health Informatics
Data mining for knowledge acquisition: algorithms are applied on large datasets to
predict health related events for a patient, like the diagnosis, the prognosis, and the optimal
treatment plan. This component is the most recent addition to the clinical decision making
mechanism, and still remains unexplored or, at best, underutilized in most hospitals. These
systems should serve as an additional input for the health professional, who in their turn,
will be expected to use this extra input for more accurate and less error prone medical
decisions. It is generally agreed, in the literature that these systems can neither act
autonomously, nor can dictate to the health professional, the appropriate practices. Decision
making systems, on the contrary, should be designed as an extension to the human cognitive
process that takes place during decision making.

Healthcare professionals combine their knowledge, experience and patient information, and eventually
will also consult model predictions which use historical data, to make clinical decisions

Definition of Informatics and ICT


In order to understand the significance of Health Informatics science, we first need to define
Informatics. Informatics (from the French ‘informatique’) includes the:

Science of information: defines what is considered as information, and how information


is acquired systematically by utilizing raw data. The science of information studies the
analysis, collection, classification, manipulation, storage, retrieval, movement,
dissemination, and protection of information.
Information processing: describes ways that the information can be used, and methods
applied to health care information to transform it into useful and usable knowledge.
The engineering of information systems: it involves the engineering of systems which
involve input of data and transformation to information, with the use of appropriate
methods. The term also refers to the complementary networks of hardware and software
that people and organizations use to collect, filter, process, create and also distribute data.
Informatics studies the structure, behavior, and interactions of natural and artificial systems

5
Dimitrios Zikos: Data Driven Health Informatics
that store, process and communicate information. It is obvious that the term informatics does
not refer to computational methods and the computer science, although many of the methods
that informatics employs, can be computer science methods.
Information and Communication Technology (ICT) – The communication technologies
and services that are used in various applications. The term describes these computer,
communication and multimedia technologies that can be used to receive, process, store,
display and disseminate information. ICT is an umbrella term and often is used to describe
the communication within applications in a specific domain, like for example ‘ICT in Health
Care’.

Key Elements of Informatics


Acquisition: capture data produced during health care provision. There are different ways
that this is achieved in a clinical environment:
1. Observation and clinical examination: doctors and nurses collect crucial information about
patients using the medical and nursing evaluation procedures, which include systematic
steps to assess a patient condition from a medical and nursing perspective. Doctors also
perform the clinical examination which includes palpation, inspection, percussion and
auscultation. This way, doctors use their natural senses and tools which amplify our natural
senses abilities while examining a patient.

2. Talking with the patient, talking with other health professionals. Especially the
interactions between nurses and doctors are of great importance, taking into consideration
that nurses spend a considerable amount of time with the patient and are able to do nursing
observations which can provide an invaluable input to the doctor. A nurse can typically see
small changes or events to a patient’s condition, like for example loss of appetite, skin colour
changes, change of consciousness. The above can be significantly important and the physician
would not be in place to notice, timely. There is evidence that a hospital environment valuing
the interprofessional collaboration between health professionals, is an invaluable factor and
significant contributor to quality health care services and patient satisfaction.

3. Physiological measurements: may be very simple, such as the measurement of body


temperature with a clinical thermometer, and the measurement of blood pressure, or they
may be more complicated, for example measuring how well the heart is functioning by taking
an ECG (electrocardiograph.). Typically, the majority of physiological measurements are
performed by nurse practitioners.

4. Laboratory tests and radiology examinations: laboratory tests involve the analysis of
samples extracted from patients (i.e. blood, urine, tissue). Typically, a laboratory test is part
of a regular check-up, but during a patient hospitalization, these are usually performed to

6
Dimitrios Zikos: Data Driven Health Informatics
help shape a diagnosis. The analysis of enzyme concentration, blood elements, anti-body
tests, urine ketone concentration, are some typical examples.

5. Radiology exams: include a variety of imaging techniques such as X-ray radiography,


ultrasound, computed tomography (CT), nuclear medicine including positron emission
tomography (PET), and magnetic resonance imaging (MRI) which are used to diagnose and/or
treat diseases.

Storage (and retrieval): involves the storage of data in physical medial, so that it can be
retrieved. The storage of healthcare related data is nowadays achieved via (i) the direct entry
of information into Electronic Medical Records, and consequently the storage of data in
relational (in most of the cases) databases, on the system backend (ii) using sensors, which
perform measurements and then send the data via a communication module to interoperable
systems (iii) via scanning handwritten documents and using optical character recognition
(OCR) technologies.
Storing patient information in portable devices is not a recommended practice, for data
privacy reasons. These devices (e.g. bed monitors, tablets) should send the measurements to
the main system via wireless technologies, without storing the data locally.
Information retrieval during the clinical practice involves healthcare professionals
navigating and using search tools found in graphical user interfaces of Electronic Medical
Records. Healthcare professionals are end users and have no direct access to the data, and
no access to database querying engines. A typical functionality of modern systems is the
support of advanced reports which visualize the clinical information, transforms the data into
useful representations with longitudinal data insights.
Communication: data moves from the point of data collection to storage, for analysis, and
finally, back to the point of data use. Communication of health care data within and across
subsections of a hospital information system, involves the use of communication protocols
and interoperability standards. Interoperable systems should not only be technically
compatible but should also achieve seamless data exchange.
Manipulation: data usually needs to be manipulated, to be combined with other data, and
aggregated, for statistical and healthcare analytics purposes. Data manipulation can be as
simple as the calculation of a patient age from the age of birth, or it can refer to more
advanced applications where data science methods are applied on data to generate prediction
models. Data manipulation may also refer to different representations of data for the end
user: it may be possible for the healthcare professional to view information about a patient,
based on a sequential time-line, which represents each clinical intervention and event
according to the time it occurred. At any given point, though, a healthcare professional may
opt for viewing the same exact information for that patient, based on the location of the
clinical interventions and events. For example, the output provides details about events,
classified by the location, e.g. events that occurred in a hospital ward, in the radiology
7
Dimitrios Zikos: Data Driven Health Informatics
department, in the surgery, etc.
Display: refers to the way that data may be displayed so that it can be easily understood and
used. Displaying information does not only refer to the physical output devices (monitors,
printers etc.), but primarily addresses the presentation of the information to the user via user
friendly interfaces, successful human computer interactions and functional dialogue systems.

From data to information and new knowledge: the knowledge circle

The discipline has evolved during the last decades


The area of health informatics is evolving and is nowadays considered to be a well-defined
scientific area with multidisciplinary nature. When mainframe computers were introduced
in hospitals halfway through the 20th century, the obvious benefits were not solely
information science related, and could rather be summarized in (i) moving away from paper
records which are prone to physical damage and take up a lot of storage space (ii) being able
to share data, since the mainframe could be accessed with the use of dummy terminals from
more than one user at a time and (iii) calculations for cost estimation and budgeting purposes
became easier, since they would actually be performed by processing units rather than
manually. Actually, the latter, for many, has been one of the most significant motivators for
decisions involving investments for computer mainframes back in the day.
Later on, with advancements in computer science and information technology science, the
storage and retrieval of the majority of clinical information started to rely on computer
systems and computer networks; this is when there appeared the first early hospital
information systems. These information systems have soon been covering a wide spectrum of
functions within a hospital. Specialized medical devices appeared in the market, often
offering advanced networking capabilities, for their time. Many such examples can be found
in medical imaging. These systems were accompanied by standards defining their
specifications in detail, and protocols that each manufacturer should follow. Within this
context, one could see new evolving sub-domain, or application domain of computer science,
and therefore the medical computer science became a well-defined domain on its own.
8
Dimitrios Zikos: Data Driven Health Informatics
Still one of the biggest challenges of hospitals was ahead, and was primarily related to the
management of the enormous amounts of data that were stored and retrieved during the
clinical care. Efforts then started to focus on how it would be possible to develop user friendly
systems which could make the entry, retrieval and presentation of healthcare data seamless,
and how this data could be transformed into clinically meaningful information for healthcare
professionals. During the decades of the 1980s and 1990s, the direction of the scientific
community was to study and develop novel electronic medical record frameworks which
would facilitate the above priorities.

From “Computers in Medicine” to specializations of health informatics

The term ‘Medical Informatics’, while it precedes heath informatics, is still broadly used
nowadays and refers to the medical applications of health informatics, but is regarded to be
a subdomain of the latter. Today, there are many well defined sub-areas of health
informatics, which independently develop methods and new knowledge for their own areas
of specialization: dental informatics, nursing informatics etc.
The recent direction of health informatics is the focus on the integration of smart data
analytics and data mining algorithms for clinical and administrative decision making. This
direction is driven by recent technological and computer science advancements which make
it possible to analyze huge volumes of data with novel machine learning methods, to provide
accurate predictions and estimations which can be extremely useful during decision making.
Eventually, in the coming years, we will be witnessing the integration of predictive
algorithms into Electronic Health Records, which will be providing recommendations for
patients at the point of care. These recommendations will be a result of the analysis of
enormous amounts of historical patient data, in order to identify useful patterns, for the
diagnosis, therapeutic plan and prognosis of a patient, with the ultimate goal to improve the
quality of care in a patient centered health-care system.

Definitions of Health Informatics


World Health Organization Definition: ‘an umbrella term referring to the application of
the methodologies and techniques of information science, computing, networking and
9
Dimitrios Zikos: Data Driven Health Informatics
communications to support health and health related disciplines such as medicine, nursing,
pharmacy and dentistry’
Edward H. Shortliffe Definition: ‘the field that concerns itself with the cognitive,
information processing, and communication tools of medical practice, education and research
including the information science and the technology to support these tasks’
Health Informatics is therefore an intersection of information science, computer science, and
health care. It studies the resources, devices and methods required, in order to optimize the:
Acquisition
Storage
Retrieval
Use
of information in health.
Health Informatics is an Interdisciplinary field combining health, computer science,
statistics and engineering. It is recognized as a field where information, ICT and cognitive
knowledge come together. The image below presents the scope of health informatics
(information processing, cognitive processing and methodologies), the context of health
informatics (practice, research, education) and finally the technologies facilitating its scope.

Health Informatics uses information to improve health care. One needs to understand how
we define the concept of ‘improving healthcare’, and whether this concept is a quantifiable
one. This is a requirement, in order to meaningfully introduce health informatics
applications. We can understand, though, in the healthcare domain, that the ‘improvement’
concept is related to making more informed decisions which drive better patient outcomes,
and encourage safe and error-free provision healthcare of healthcare services, but also to lead
to improved hospital efficiency and an increase of revenue. As an interdisciplinary field,

10
Dimitrios Zikos: Data Driven Health Informatics
health informatics applies technology & information to enhance healthcare delivery,
biomedical research.
It is closely bonded with fostering education of health professionals and the public. Health
informatics can provide tools and methods for e-health literacy that is to reach out large
populations for disease prevention and health promotion, through targeted and personalized
interventions. Health informatics methods and systems, are also becoming important for the
education of health professionals and healthcare administrations.
Health informatics studies the process where health data, information, and knowledge are
collected, stored, processed, communicated, and used to support health care delivery to clients
and providers, administrators, and health organizations. Each one of those groups may be
utilizing the same data which have been transformed and processed in order to facilitate the
strategic goals of different professional groups in health care, and of patients as well.

Related terms
Consumer Health Informatics: both healthy individuals & patients want to be informed on
medical topics. MediQoC is one example of such systems, and will be discussed in appendix.
This system provides to the user the opportunity to navigate through Medicare healthcare
providers, by typing in their symptoms, in order to find appropriate and safe health care
services. The appendix describes the platform and the methods that have been used for the
development of MediQoC.
Health knowledge management: can prove to be extremely useful in an overview of latest
medical journals, best practice guidelines or epidemiological tracking. Nowadays there is a
wealth of new medical knowledge coming out every day. Hundreds of journal research papers
appear and it is virtually impossible for a health care professional to catch up with this
knowledge flow, in its raw form. Knowledge management tools provide categorized content,
classified on the basis of areas of interest, by the nature of findings, or by the impact of the
published results, making it easier for the researcher and the healthcare professional to
navigate through medical knowledge.
Bioinformatics: a branch of biological science which deals with the study of methods for
storing, retrieving and analyzing biological data, such as nucleic acid (DNA/RNA) and
protein sequence, structure, function and genetic interactions.
Biomedical Engineering: the application of engineering principles and design concepts to
medicine and biology.
Nursing Informatics: the "science and practice that integrates nursing, its information and
knowledge, with management of information and communication technologies to promote the
health of people, families, and communities worldwide." (IMIA Special Interest Group on
Nursing Informatics, 2009). The application of nursing informatics knowledge is empowering

11
Dimitrios Zikos: Data Driven Health Informatics
for all healthcare practitioners in achieving patient centered care.
Public Health Informatics: Chapter 18 discusses the principles of Public Health Informatics
in detail.
Bioinformatics: an interdisciplinary field that develops methods and software tools for
understanding biological data. As an interdisciplinary field of science, bioinformatics
combines computer science, statistics, mathematics, and engineering to analyze and
interpret biological data.
Example of Bioinformatics: applications in basic research
Human Genome Project – Scientists used fundamental research methods and techniques to map the
complete human genome
Provide enormous opportunity to understand human body in ways not previously possible
Relied heavily on IT to sort and manage the data to map human genome
Ability to identify and treat human diseases

Mobile Health (m-health)


M-Health is the practice of medicine and public health, supported by mobile devices. Mobile
technologies such as mobile phones to collect and access health information. It has emerged
as a sub-segment of eHealth. There are three main categories of modalities which are used
for data collection in m-Health applications: (i) integrated mobile sensors (accelerometers,
gyroscopes, and GPS), (ii) specialized biometric sensors (blood glucose, heart rate) and (iii)
manual or semi-automated data entry.
Mobile devices
using modern
communication
technologies help
nurses and
doctors in their
everyday practice

E-Health
E-Health is a broad term for healthcare practice which is supported by electronic processes
and communication. It is a relatively recent term and can encompass a range of services in
healthcare and information technology. It is not clearly defined (for example some use it
instead of healthcare informatics, others use the term describing healthcare practice using
the Internet). A study published in the Journal of Medical Internet Research found 51
definitions for e-health. E-health offers a broader coverage of electronic/digital processes in
health, often including m-health applications too.
12
Dimitrios Zikos: Data Driven Health Informatics
Health informatics tools and methods
Health Informatics are not just “Computers in Healthcare”. It develops advanced methods to
develop and integrate seamlessly into the clinical practice components such as: clinical
guidelines, medical terminologies, clinical dictionaries and nomenclatures, information and
communication systems, decision support and recommendation systems.

Health Informatics ≠ IT: Information Technology in hospitals is not Health Informatics.


Information technology is focused on the development of hardware & software, which are
undeniably invaluable vehicles for the health informatics science, to implement the methods
of health informatics.
Read the message to the left and discuss:
(i) Whether and how health informatics would still
exist as a discipline without the use of computers
(ii) How the development of information technologies
and the progress in computer science have helped
the domain of health informatics develop better
methods and become an established scientific area

The IT sector is very important enabling field for advanced health informatics applications and
methods. The introduction of ICT technologies has sky rocketed the discipline of health informatics.

Health informatics is not quite new, as you might think. Since healthcare services started to
become more systematic, the need for information management was eventually recognized
as an important priority. Here is some evidence:
The first version of the International Classification of Diseases (ICD) was initiated in the
year 1893. Ten updates followed, and we still use the same classification system, in its
10th edition
The first structured clinical guidelines appeared several decades before the widespread
use of computers
Hospital information management methods were evident before the 70s in hospitals,
whereas there are still hospitals in many developing countries around the globe, which
do not use computerized Electronic Health Record systems. It needs to be understood,
though, that, before the introduction of computers, these information management
methods, were typically limited to file maintenance & life cycle management of paper-
based files, other media & medical records.

13
Dimitrios Zikos: Data Driven Health Informatics
Health Informatics health care levels of interest

Single hospital The hospital needs A large area or a The healthcare system
department district at a national level

Health informatics also has application in the community health. The central objective of
community health is to improve the health characteristics of biological communities, in
geographical areas or within groups of people with common characteristics. Health
informatics, therefore contributes with specialized programs, methods and tools that are used
in population based healthcare services, for health promotion, disease prevention and
syndromic surveillance.

Who is involved in Health Informatics


Clinical personnel – they need suitable information in caring for patients. They want the
data that is being generated during the health provision, to be transformed into information
that will facilitate more informed and personalized clinical decisions. At the same time, they
want to have direct, non-delayed access to patient information during the everyday practice,
therefore the data retrieval interval should be minimal.
Nonclinical Staff: educators, administrators, research scientists – they need relevant data
and information to perform their tasks. Hospital databases store information that can be
invaluable if utilized properly. A large hospital counts thousands of patient admissions each
year and detailed data for each of the admissions. In a span of 10 years, for instance, there is
expected to have been collected a very large dataset of history admissions. This dataset can
be used in research, specifically in retrospective epidemiologic studies, or for the development
of predictive algorithms that will be applied to new cases.
Health care administrators utilize historical data to overview the evolution of the cost of
care, the most common case-mix profiles and changes over time, identify new challenges,
measure the efficacy and cost effectiveness of practices over time and evaluate the quality of
health care provision, identify gaps and alterable factors which contribute to the
improvement of health outcomes.
Health educators, often use de-identified subsets of historical hospital data to provide to
their students, real examples and also to investigate patient cases during the medical
specialty practice.
14
Dimitrios Zikos: Data Driven Health Informatics
Information science and IT professionals use computer technologies to manage
information so as to fulfill the needs and requirements of other end users. It is easy to
generate and print-out a clean looking report with information for some patient, which can
be reviewed by a health practitioner and/or archived.
External parties (I): policy makers. The aggregation and analysis of hospital and patient
data from multiple health care providers will provide feedback to health care policy makers
for planning of health resources and assessment of the direction of the healthcare system in
the level of a province, state, or even the whole country.
External parties (II): insurance companies. Payers have access to de-identified data and
they recently apply advance data mining methods to these data: Insurance companies will
not happily accept to pay for services which were evidently not required for the treatment of
a condition, or for complications to the treatment due to malpractices and medical errors.
There should be noted though that medical doctors should always independently do their best
to achieve optimal outcomes for their patients.
In a broader sense all the groups below are served by Health Informatics
Patients Government Bodies and Policy makers
The Community Facility and operational management
Health care providers (MDs, nurses) Healthcare researchers
Primary Care & General Practitioners Healthcare educators and students
Management in Hospitals

As we mentioned earlier, health informatics is a multidisciplinary field. Various knowledge areas


are directly or indirectly related with health informatics. In addition, information and
communication technologies are used to facilitate the scope of health informatics. These technologies
make it possible to develop advanced applications (i.e. EMR, Hospital Information Systems-HIS)
and integrated methods to support healthcare. These applications are ruled by specific standards
and protocols.

15
Dimitrios Zikos: Data Driven Health Informatics
Top Left: Knowledge Areas of health informatics Top Right: Technologies in health informatics Bottom
Left: Applications Architecture Bottom Right: Standards used in health informatics

The improvement of the Quality of Care is the ultimate goal of Health Informatics: this requires the
establishment of inteprofessional collaboration between different specializations and proper
education and training of users to be prepared to use new technologies effectively.

Interprofessional Collaboration
Presupposes AND Health
Education Healthcare Professionals
Reinforces Informatics
Patient Safety
Quality of Care

Areas of interest for Health Informatics


Health informatics is interested in a wealth of areas. The most important ones are:
Communication systems and networks in health care
Modeling, classification and coding systems and integration in electronic medical records
Healthcare Information systems
Electronic Health Record systems
Decision Support Systems (DSS), for clinical and administrative decision making
Knowledge based Systems and expert systems
Bio signal and image processing
Tele-care and telemedicine and use of remote patient monitoring and patient education
Medical education and clinical consultation
Healthcare management, public health systems and health promotion
Patient education

16
Dimitrios Zikos: Data Driven Health Informatics
Healthcare decisions are based on information
The figure below presents a typical example of a decision making process in a hospital. Every step
of this process produces data and, then, this data is communicated and transformed to useful
information.

Services of Health Informatics


Data processing: health is a data intense industry. Includes collection, processing, transformation,
presentation & use
Communication: main emphasis should be on supporting communication between professionals
Knowledge based services, such as on-line practice guidelines, drug lists, decision-support and
reminder systems

Examples of Health Informatics applications

Medication ordering Quality assurance Risk management


Purchasing equipment Medical devices Patient assessment
Clinical pathways Monitors Monitoring patients
Labour management Imaging equipment Stock management
Patient scheduling Clinical decision support Mobile health care
Research Resource allocation

Imaging systems in Health


Imaging systems in health are impossible without use of computers. Computers are used to:
– Develop an image from specific measurement
– Reconstruct the image for optimal extraction of a particular feature
– Improve image quality by image processing
– Store and retrieve-present images
X-rays, ultrasound, computational tomography, MRI etc.

17
Dimitrios Zikos: Data Driven Health Informatics
Telemedicine
Telemedicine (‘tele’= from distance) is the delivery of health-related services and information
from distance. Telemedicine could be as simple as two health professionals discussing over
the telephone about a patient, or more sophisticated as using videoconferencing to between
providers at facilities in two countries, or even as complex as robotic technology.
Tele health is an expansion of telemedicine. It encompasses preventive, promotive and
curative aspects. Today tele health addresses an array of technology solutions, simple or more
complex. For example, physicians use email to communicate with patients, order drug
prescriptions and provide other health services.
Some important clinical uses of tele health include (i) the transmission of medical images for
diagnosis (ii) groups or individuals exchanging real time health services or education live via
videoconference (iii) transmission of medical data for diagnosis or disease management
(remote monitoring) (iv) advice on disease prevention and promotion of good health by patient
monitoring and follow-up.

More applications of tele-health

Distance education including (Continuing medical education and patient education)


Administrative meetings through telehealth networks, supervision, and presentations
Healthcare system integration
Patient movement and remote admission
Research

Questions for discussion


1. Briefly explain in your own words why health informatics is not just “computer
applications in hospitals”.
2. “Health Informatics is an interdisciplinary field”. Please refer to three disciplines, with a
brief explanation.
3. Refer to three reasons why health informatics is essential for quality healthcare services.
4. Out of the many different roles involved in health informatics, which one do you find more
intriguing and why?

18
Chapter 2. The Nature of Data in Healthcare
Healthcare is Data-Intensive
As we already discussed in Chapter 1, healthcare is a data intensive process. Many processes
run at the same time producing new data, literally every single second. Some of these data
are high resolution, uncompressed images (x-rays, CT-scans) which take up a lot of storage
space and need to be further processed to become clinically useful. Multiple records of same
data are often created for each patient and these records are stored and maintained with
older measurements and observations. These longitudinal considerations are extremely
important for a patient evaluation, since they help clinicians reassess their treatment plan,
and make more accurate patient prognoses.
Reference data is data that defines the set of permissible values to be used by other data
fields. For example, the attribute icd-10 diagnosis uses as reference data a table of 60,000
diagnoses codes one can pick up from. Unfortunately, not all data are drawn from reference
data. In many cases, data produced during the clinical care are free text, like in the case of a
nursing assessment.
Very frequently, data comes from processing of other data: for example, clinicians would want
to know the in-hospital mortality ratio for a given disease when they need to provide
treatment to a patient with that same diagnosis. In our example the in-hospital mortality
ratio will be calculated by the formula NX’ / NX
where NX’ = Count of in-hospital deaths for patients with a medical diagnosis x and NX =
count of admissions of patients with a diagnosis x.
To make our example more interesting, let us now assume that the clinician wants to know
if this diagnosis x is a high risk one. In other words we now want to investigate if disease x
should lead the clinician to the conclusion of this patient being a high-risk case for in-hospital
mortality. For now, we will define high risk disease for in-hospital mortality, as a disease
which causes deaths at a higher ratio compared to the death ratio of the whole patient
population.
The Standardized mortality ratio1 would be calculated by the formula
(NX’ / N) / (NA’/NA), where
NX’ = Count of in-hospital deaths for patients with diagnosis x
NX = Count of admissions of patients with diagnosis x

1
Standardized Mortality Ratio (SMR) is a ratio between the observed number of deaths in a study population and
the number of deaths would be expected, based on the age- and sex-specific rates in a standard population and
the age and sex distribution of the study population.
19
NA’ = Count of in-hospital deaths for all hospital admissions
NA = Count of all hospital admissions
This information is easily calculated from data which has already been collected and stored
into the Electronic Medical Record. The data has been collected retrospectively, for the needs
of the clinical care of past patients. The health care professional will therefore be using
historical data in order to understand and evaluate the health of a current patient, at a
present time. When data is used for purposes other than the ones these have been collected
for, we call this use secondary use of data.

Discuss in class

Which of the following health data use is secondary data use?

“Reviewing the five most recent measurements of the blood pressure of a patient, to see if the new
medication schema works well”
“A afternoon shift nurse reading through the nursing assessment notes of the nurse who just finished
her morning shift, for a given patient”
“The estimation of bed availability in a specific hospital department, via the count of patient admissions
and dates, count of patient discharges and dates, and bed capacity, for that given department”
“A warning indicating that a given patient is at risk of developing in-hospital infections, through the
analysis of similar case-mixes of past patients”

The Nature of Data in Healthcare


We will now discuss about the nature of health data that is generated in a hospital. This
section explains the most important categories of data that are produced during the clinical
care. Different users, with different roles and responsibilities in a hospital, often have access
to these data and utilize them to make decisions. For example, medical doctors would be
reviewing the laboratory test and radiology test results, the physical examination, the routine
ward measurements, the feedback from the nursing staff as well as the patient history, in
order to combine this information with their cognitive skills, knowledge and experience, to
assess the health condition of a patient and conclude a diagnosis and an appropriate
treatment plan.
Prior to the hospital admission: data collected during the triage phase. During the triage
phase, health professionals determine the priority of patients' treatments based on the
severity of their condition. It result in determining the order and priority of emergency
treatment, transport and destination for the patient.
Before the hospital admission: demographics, initial evaluation of a patient, source of
admission, health insurance information

20
Discussing with the patient and their caregivers: patient demographics, family history,
occupational history, allergies, pathology by system, past diseases and surgical operations
and information about the social health, all provide to the physician extremely important
input for the patient assessment
Routine ward measurements: e.g. vital signs (blood pressure, respirations per minute,
temperature, pulses per minute), fluid balance (from fluid intake-output)
Physical examination: including percussion, observation, auscultation, palpation
Laboratory tests: blood tests, urine analysis etc. These data have been ordered by the
medical doctor in charge and Laboratory Information System (LIS) receives this order, and
as soon as the samples arrive at the hospital laboratories, there is a variable required time
for each test to be processed. As soon as this is done, the results are uploaded via the LIS and
the Electronic Medical Record would then be updated with the laboratory test result. The
physician will then be notified and timely review the results, in order to make informed
decisions.
Radiology department: medical images, segmentation and handling of images using
DICOM systems, assessments from radiologists.
Pharmacy: including Rx, (re)-stocking and ordering
Patient assessment: medical diagnosis, ordering of laboratory examinations, decisions of
the appropriate medication. During a patient hospitalization, there is typically only one
primary patient diagnosis. This is the diagnosis which, in the majority of the cases, is
considered to be the main reason that led to the decision for a patient admission to the
hospital. Secondary diagnoses, are either pre-existing diseases (usually chronic conditions)
or were diagnosed during the hospital stay.
Tracking down medication: dosage, method of administration (e.g. intravascular,
intramuscular) and time-intervals. Nurses will be responsible for the management of the
medication administering to the patient, according to the physician guidelines.
Discharge data: discharge destination, discharge method, discharge outcome(s).
Data produced by the patient: patient experiences surveys have recently become the norm
and it is widely recognized that the patient feedback does matter. The “Hospital Consumer
Assessment of Healthcare Providers and Systems Survey” (HCAHPS) is the most commonly
used patient experiences survey and is widely used by many healthcare providers. The author
of this book was member of the working group to adapt the survey to other languages2.
Staff records: including but not limited to information about personnel shifts, department
capacity, distribution of human resources, hours of leave and other.

2
Squires A, Bruyneel L, Aiken L, Van den Heede K, Brzostek T, Busse R, Ensio A, Schubert M, Zikos D, Sermeus
W. Cross-cultural evaluation of the relevance of the HCAHPS survey in five European countries. Int J Qual
Health Care. 2012; 24(5):470-5
21
Hospital Budgeting: resources allocation, revenue projections, planning ahead for the fiscal
year. Hospital budgeting is not a trivial process by any means and requires multidisciplinary
work of people who know the healthcare market, understand health economics and
reimbursement challenges, and health professionals who can foresee the healthcare delivery
challenges.
Payments: including patient Diagnosis Related Groups (DRGs), insurance information,
Medicare or Medicaid information, and any other payer data. A Diagnosis-Related Group
(DRG) is a statistical system of classifying any inpatient stay, into groups for the purposes of
payment. The DRG classification system divides possible diagnoses into more than 20 major
body systems and subdivides them into almost 500 groups for the purpose of Medicare
reimbursement3.
Hospital Quality of Care Evaluation and Quality improvement data: we will devote
a separate chapter for this very important topic, to discuss the strategic goals of the
healthcare system, discuss the dimensions of the quality of care and patient safety and
strategies for the healthcare system to assess the quality of health care delivery.

Hospital Consumer Assessment of Healthcare Providers and Systems Survey (HCAHPS)


HCAHPS is a standardized survey instrument and data collection methodology that has been in use
since 2006 to measure patients' perspectives of hospital care. A partnership of public and private
organizations led by the Federal government, specifically the Centers for Medicare & Medicaid Services
(CMS)- Opens in a new window and the Agency for Healthcare Research and Quality (AHRQ)- Opens in a new
window, created HCAHPS (pronounced "H-caps") to publicly report the patient’s perspective of hospital
care. The HCAHPS results posted on Hospital Compare allow consumers to make fair and objective
comparisons between hospitals and with state and national averages on important measures of patients'
perspectives of care. The survey asks a random sample of recently discharged adult patients to give
feedback about topics like how well nurses and doctors communicated, how responsive hospital staff
were to patient needs, how well the hospital managed patients' pain, and the cleanliness and quietness
of the hospital environment. Patients are the best sources of information on these topics.
Source: Medicare.gov

Data Types in Healthcare


It is essential to understand that the data being collected and stored in Electronic Medical
Records did not come from typing-in free text in computer textboxes. For the majority of the
cases, any data comes from reference data, or in other words data dictionaries, which, as
mentioned earlier in this chapter, are predefined lists specifying the acceptable input.

3
Gillian I. Russell, Terminology, in FUNDAMENTALS OF HEALTH LAW 1, 12 (American Health Lawyers
Association 5th ed., 2011.

22
The most obvious example of derived data is that of the medical diagnoses. Since 2014, every
healthcare system in the United States started to use the 10th version of the International
Classification of Diseases (ICD). This is an enormous list of approximately 68,000 different
codes, following a hierarchical organization. Each code represents a medical condition. The
doctor needs to decide which code would accurately describe the patient condition, and to
select the appropriate ICD-10 code for that condition. There are several levels of depth in
ICD-10 and the diagnosis often does not reach the deepest level for a condition.
ICD is not the only classification system in use. For the vast majority of in-hospital data,
there is existing one or more classification systems, which standardizes the data entry and
retrieval process. Chapter 7 will cover some of the most important classification systems and
standards which are used by the major health care providers in the United States.
Numeric data, such as physiological measurements and laboratory test results are captured
and entered into the Electronic Medical Records without any modifications and without any
use of reference data. In this section, we will discuss health care data of various data types.

Numeric Data
Most of numeric data are clinical data “produced” by those directly involved in healthcare.
Numeric data allow a much more efficient data manipulation and therefore a more effective
use of data to produce aggregated information and make other simple of more advanced
calculations. Numbers often come from clinical measurements in hospitals, like, for instance,
vital sign measurements. These data have to be entered into the system indicating the exact
value of the measurement. Acceptable decimal precision varies, according to the nature of
the data and the precision of the measurement device, if used.
Many numeric data in everyday practice are derived data. For example, the 24h fluid intake
and fluid output are used to calculate the fluid balance of a patient. Laboratory examination
results are very often in numeric format. These are often accompanied by the reference
normal values. An Electronic Medical Record, nowadays, is expected to have those normal
values integrated. Therefore, during data entry, when a laboratory test result for a patient
is off bounds, this should be indicated with the use of a different color and potentially a
notification should be generated. It should be mentioned, though, that very often, there are
different normal value bounds for different age groups and patient gender. This should be
taken into account in such implementations, which should automatically recognize the
patient demographics and present personalized notifications for a patient, which are in
accordance with patient attributes such as gender and age group.
Comparison of new examination results with previous results of the current and recent
hospital stays is of uttermost importance for health care professionals, so that they can better
assess the disease progression, re-evaluate the therapeutic plan and make a more informed
assessment of the patient prognosis. It is therefore, important, for comparisons to be made,
and the result of these comparisons should be notified to the physician. We will use, in our
23
example hypertension, which is a very common condition and the major risk factor for
strokes. A female 77-year-old patient was recently admitted to the hospital for uncontrolled
hypertension and was found to have a blood pressure of 210/120 mm Hg. She was therefore
admitted as an emergency hypertension case. As soon as the patient was admitted, she
received hydralazine IV, and was been monitored with the use of a bedside monitor. Nurses
have been checking the monitor every hour and updating the patient record with the blood
pressure value. Few days, later the female patient was measured with a 140/85 mmHg blood
pressure, which, while still higher than normal, when evaluated with a temporal insight,
would indicate a significant improvement.
Examples of laboratory tests that produce numeric data are numerous. Numeric data are
also often accessed and utilized by professionals indirectly involved in patient care. An
example can the number of vacant beds in a hospital department. Again this may be a
number which has been calculated from other primary data: ‘Bed capacity’ MINUS ‘Number
of patients currently hospitalized’.
Some numeric data is used by the hospital quality department or healthcare policy makers.
Usually these data are in the form of indicators: for example, the mortality and morbidity
during the month June 2012, and the number of cases of infectious diseases divided by the
number of patients admitted during a set time period.

Boolean Data: Boolean data types in health care can only have two values (usually denoted
true and false), which represent the truth values of logic and Boolean algebra. Boolean data
should not be confused with categorical data with two categories (for example gender-
male/female).

Examples of Boolean Data in Healthcare

Patient History Related


Were there any cases of a specific disease on a family member of the patient?
Did the patient undergo a surgical operation in the past?
Does the patient receive any drugs?
Admission of Patients: Did the patient arrive at the hospital with an ambulance? Was the
admission an emergency one?
Laboratory Test Results: Existence of SMA Antigens, Glucose found in an Urinalysis
Medical Evaluation: Does a patient have a specific symptom?
Non-directly related with the patient care: Does the patient have an insurance plan?

Alphanumeric Data: also frequently generated during the healthcare process. In most
cases, these are data produced after the “mediation” of the human brain (healthcare
professional) and also in most cases during the interaction between the healthcare
professional and the patient.

Data in healthcare can be images: the medical imaging methods below produce data in
the form of images. Medical images are generated by medical imaging devices. Medical
24
images are nowadays digitally produced and stored into the storage device of the computer.
Images are usually compressed with simple lossless and near-lossless methods and usually
require large storage space. These are the six of the most common radiology tests:
(i) Radiography
(ii) Computer Tomography (CT) and the High Resolution Computed Tomography (HRCT)
(iii) Magnetic Resonance Imaging (MRI)
(iv) Ultrasound-Mammography
(v) Nuclear Medicine Imaging
(vi) Photo acoustic imaging

The most popular standards that are used to store and transmit medical images, are
PACS (Picture archiving and communication system): it is a medical imaging technology
which provides economical storage and convenient access to images from multiple
modalities (source machine types)4.
DICOM (Digital Imaging and Communications in Medicine): standard for handling,
storing, printing, and transmitting information in medical imaging. It includes a file
format definition and a network communications protocol.

There are other obvious data in the form of images in some healthcare organizations, like a
patient photo which is uploaded into the Electronic Medical Record.

The trend is to reduce the use of free text data as much as possible: Information in
the form of codes is assigned to each separate concept with significant benefits related to the
data quality. The advantages of using classification systems are also significant to
researchers, since they can save significant amount of time for data preparation, trying to
manually merge descriptions of conditions with different wording (different syntactic) but
same meaning (same semantic). Chapter 7 outlines the importance of classification systems
and discusses a selection of some critical standards.

Four important hospital data procedures and the method of information acquisition
Clinical Procedure Method of information acquisition
Nursing Evaluation Written nursing assessment, plan and follow-up
Diagnosis Combination of a series of practices: clinical measurements,
laboratory tests, clinical observation
Treatment plan Reviewing the patient condition, and comorbidities, past and current
medications that the patient takes and patient allergies
Patient History Discussion with the patient and/or his family
Various Reports Written analysis of events which are sometimes required by the
existing legislation

4
Choplin R (1992). "Picture archiving and communication systems: an overview". Radiographics. 12: 127–129.
25
Classification of Hospital Data based on their Source
The table below outlines examples of data types which have been organized by their source.
In other words, the table will provide to you an idea about the location where new data is
being generated in a hospital. Discuss the importance of each of those hospital locations and
to the quality of healthcare services and how each location contributes to the clinical and
administrative decision making

Hospital Department Medical Imaging Labs


-Measurements of Vital Signs -Radiography
-Fluid Intake-Output -Computer Tomography (CT)
-Clinical Judgments -Magnetic Resonance Imaging (MRI)
-Clinical Examination -Ultrasound-Mammography

Administrative Department Financial Department


-Number of beds available -Salaries to be paid
-Routine and Urgent Admissions -Diagnoses Related Group Costing
-Weekly Swift Plans -Reporting to the insurance company

Laboratories Supplies Department


-Blood Glucose Test results -Issuing of order notice
-Enzyme & Protein Blood test results -Stock and supplies
-Urine test results

Primary methods for clinical data collection


1. The patient or their caregiver provides information to health professionals verbally
(i.e. medical history). This information is handwritten, since the clinician keeps notes or
competes a structured patient history form which is often digital (via handheld devices and
tablets).
2. Physical Examination by the medical doctor and nursing evaluation based on the nursing
observation. The physical examination is the process by which a medical professional observes a
patient thoroughly for signs, indicating a health conditions. It follows the taking of the medical
history. Together with the medical history, the physical examination contributes to determining
a diagnosis and treatment plan.
3. Manual, direct measurements of health professionals to patients. Examples include the
measurement of blood glucose levels using a stick, blood pressure, respirations per minute
measurements, fluid outtake (with a catheter).
4. Laboratory examinations prescribed by clinicians
5. Radiology tests

26
The fundamental data acquisition methods
(talking with patients, physical
examination, clinical measurements,
laboratory exams and radiology tests)
consist of the main source of data to
populate Electronic Medical Records.
Typically the patient history precedes the
physical examination which precedes the
physiological measurements.

The physical examination is not a static


process; health professionals are
challenged to constantly assess, review and
reevaluate the patient condition.

Derived Data
Derived data are data elements derived from other data elements using a mathematical, logical, or
other type of transformation, e.g. arithmetic formula, composition, aggregation. Modern Healthcare
Information Systems should calculate these data automatically. Derived data are useful for:

More efficient patient monitoring


To assess the quality of care and patient safety (i.e. morbidity indicators)
To assess the health status of populations in regional and national level

Four fundamental unique health care data properties


In modern health care systems, where proper use of data is especially important for the provision of
quality health services, it is important to understand the existence of four fundamental properties
of health care data. These properties are of uttermost importance, in that their existence is a
requirement for the clinical data to become meaningful (information) and useful (knowledge).
1. Non-atomicity 2. Cognition 3. Sharability 4. Longitudinality
The above four properties have to be in harmonic co-existence so as to form the basis for a successful
clinical decision making environment in health care. There are specific organizational (e.g.
interprofessional collaboration) and technical considerations (e.g. decision support systems,
interoperability standards) that the health care system has to ensure in order to satisfy these four
fundamental health data properties.
Non-Atomicity: Each piece of health care data should not be assessed independently: Most of the
clinically useful information comes by combining multiple data resources and by evaluating this
combined information with the clinical knowledge of a health professional. A typical scenario

27
involves a medical doctor who puts together and combines the physical examination, laboratory test
result and patient history data, to make a diagnosis.
Glucose levels of 125 mm Hg are assessed differently when combined
with different patient demographics information: for some 23 year old
patient with Type I diabetes, this is considered a normal value, but this
would not be the case for a non-diabetic person.

Health care professionals should, therefore, have at their disposal, tools which provide easy access
to patient data and generate reports summarizing all the clinical information that is available for a
patient, at any given point of the health care provision (e.g. patient history, clinical observations,
laboratory test results, medical imaging).
Cognition: Health care data should be assessed with human cognitive skills. Differential diagnosis
and other cognitive procedures based on knowledge and skill-sets of health professionals are always
crucial when new data about a patient becomes known and needs to be assessed. Clinicians acquire
medical skills and knowledge and a have a dynamic understanding on how the information they
have in their hands can direct them towards specific clinical decisions.
This cognitive process is systematic and varies across different categories of health care
professionals. Physicians perform differential diagnosis that is the process of differentiating between
two or more conditions that share similar signs or symptoms, while, for nursing practitioners, the
nursing diagnosis is a clinical judgment about individual, family, or community responses to the
health problem. Medical education and continuing professional development are important success
factors for this dimension.
Shareability: Health care data should be shared across the healthcare system and between
different health care professionals, to become more meaningful. No health professional should ever
act in an introverted manner within the healthcare system in that respect.

28
An MRI test cannot be solely assessed by the radiologist, but should be
shared with the physician who is going to review the MRI to make
informed decisions about the patient.
One of the most important requirements to seamlessly share data is to achieve a highly
interoperable environment. Business, technical and information interoperability, are all invaluable
requirements for the fundamental shareability property. Interoperability is the ability of a system
to work with other systems without special effort on the part of the health professionals: data should
be exchanged across the health care system seamlessly. Health Level 7 is the most important
interoperability standard nowadays, and addresses the business, technical and information
dimension of interoperability. Interprofessional collaboration is also crucial, since it is not always
sufficient for the information to be inserted into the records. Often, health professionals need to
discuss to understand qualities of the observations and exchange their insight on the condition of a
patient.
Longitudinality: Health care data should be assessed with a longitudinal insight. The progression
of a disease is not linear, neither are the therapy outcomes. In addition many of the health care
procedures are repeated during the course of a patient hospitalization (e.g. measurement of vital
signs, blood tests). When these data are reviewed, health care professionals need to recognize any
longitudinal changes and patterns over time and assess the disease progression and treatment
effectiveness. There are many tools available that can be used to visualize data. Nurses do not need
to complete manual charts of the vital signs, since these are auto generated from the data.
Longitudinal data can form the basis for predictive modelling of the patient outcomes and the
effectiveness of medical treatments.
Morning blood glucose levels of 135 mm Hg would seem to be elevated
for a given patient, but the clinician would not worry if, for that patient,
five preceding daily higher measurements, showed steadily decreasing
blood glucose levels day by day.

The image on the left shows the blood


pressure, pulse rate and pulse pressure
record plots of a patient for one month. The
physician see that there are quite a few days,
during the month where the blood pressure
was elevated, and there is a pattern of high
blood pressure waves with peaks and lows

Source: raywinstead.com/bp

29
The four health data properties, namely non-atomicity, cognition, sharability and longitudinality
are not unique to healthcare, but their existence is undeniable, and indeed, all four appear in co-
existence in virtually any clinical health care environment. Hospital administrators and policy
makers can prioritize the organizational and technical requirements that need to be built around,
and support these fundamental properties, crucial to the clinical decision making process and
ultimately to the quality of health care services.

The information exchange is non-stop across the healthcare system; patients, being the main source of
health data are placed in the center of the care

The temporal and spatial nature of health data


Healthcare data is of temporal and spatial nature and this has a huge impact to the analysis of a
patient case and the treatment plan. The temporal nature of health data is determined by scale of
various repeated measurements. The continuous monitoring and reassessment of patients plays a
major role and the evaluation of the disease progress over time is in clinician’s everyday schedule.
Medication and self-management of a condition for a chronic patient, as well as during rehabilitation
sessions, also requires similar considerations to be made.
Geography also plays an important role in healthcare with regard to the understanding of various
causes in health dynamics. Geographical Information Systems (GIS) are mainly used in public
health. The trinity of public health, which is constituted by the individual living inside an interactive
environment underlines the importance of geography as far as health and illness are concerned. GIS
system can support public health policy through:
30
The development of map-based applications which visualize health related parameters
Providing location-based information about diseases
Identifying spatial correlations that exist in the data and that can help inform a public policy
decision
Identifying possible relationships that influence the health status within the population

Look at the plot on the left and discuss


the importance of a longitudinal
insight into healthcare observations,
for successful clinical decisions.

Provide three examples, to indicate the


importance of time for the:

(i) population surveillance


(ii) hospital health care
(iii) tertiary care/ patient
rehabilitation

Questions for discussion


1. What types of data does a nurse produce during the healthcare practice? What types of data does
a nurse need to retrieve from the Electronic Medical Records to make informed decisions?
2. An important attribute of healthcare data is ‘longitudinality’. Why is this property important for
decide what would be the optimal treatment plan for a patient?
3. Explain why the entry of a patient diagnosis into an Electronic Medical Record, is considered
reference data.
4. Why is it so important to keep in mind that the data in healthcare are of temporal and spatial
interest? How can these two elements facilitate decision making (one example for each).

31
Chapter 3. Data Workflow and Users of Health Data
The patient should be the main focus
and is placed in the center of the care.
The majority of health care data is
directly or indirectly acquired from the
patient using many different data
collection methods and is used in favor
of the patient in a constant effort to
achieve optimal outcomes, and improve
the quality of care.

Types of health care data users


The health care data users fall into five broad groups. Each of the groups below utilizes data to
achieve different objectives.
1. Directly involved in health care provision (primary): nurses, physicians, physiotherapists
2. Directly involved in health care provision (secondary): includes medical laboratory
personnel, pharmacists
3. Indirectly involved in health care: Hospital management and health care administrations
4. External data users: policy makers (regional and national level), payers, epidemiologists
5. Patients and their caregivers

The patient should be placed in the center


of the care and is one of the most important
users of health care data. Patients are key
components in the
‘Data Information Knowledge’ circle.
There is an effort towards providing to
patients more information to self-manage
their condition. Patients collect their own
data by themselves, share this data with
their clinician and use the data and the
medical feedback to tailor the therapeutic
plan to their own needs.

32
In-class discussion

Below is an extended list of various health care data users. Try to identify the group (out of the five
above) each health care data user belongs to and then discuss in the classroom:
(i) What is the data of primary interest for each type of user
(ii) How each type of user would access these data and
(iii) What is the main use of this data for each type of user

Patients and their Caregivers Government Bodies and Policy makers


The Community Pharmacists
Medical Doctors Operational management
Physiotherapists Financial Department
Laboratory Personnel Healthcare researchers and epidemiologists
Primary Care, General Practitioners Healthcare educators and their students
Nurses Insurance companies
Hospital Management Health care administrators
Hospital IT Staff Medical device representatives
Hospital Database Administrator Healthy Citizens

Dynamic vs. Static data


Most data in healthcare are dynamic, since they are constantly updated during the healthcare
services provision. All measurements provide constant values stored in databases. These
measurements in most of the cases are repeated for a patient. The acquisition of repeated data can
provide extra information of added value, since it can give to health care professionals an insight
about the longitudinal progression of a disease and the effectiveness of a therapy. Some health care
related data, though, are never going to be changed or updated during a hospital stay, like specific
patient attributes (date of birth and sex). In addition, upon discharge, the data collected and stored
into the Electronic Health Records, will be maintained into the hospital database, without any
further changes to be made, during a given hospital admission. A future admission of the same
patient, should be, though providing seamless access to the previous patient data for the clinician,
to facilitate the continuity of care.

Data warehouses
Data warehouses and knowledge base systems are used to facilitate decision making. A common
strategy to effectively govern hospital data is a data warehouse. This allows the hospital to merge
individual databases to a central location for robust reporting and analysis. Data sets from
Electronic Medical Records, disease registries etc. are stored in a data warehouse and data sets can
be pulled from there and analyzed. One common use of these data is to help with decisions for current
patients. Looking back, at the historical data, there can be identified many lines of past patient data
with similar clinical profiles. These past data, for example, can be used to decide which would be an
optimal treatment strategy for the present patient case. Hospital data warehouses offer an
extremely useful data source for researchers, who plan to conduct epidemiologic studies, and for
33
health care administrators, who want to study the health services utilization and payment patterns,
to plan and implement a more effective planning and budgeting strategy.

How different entities and data relate in Occupational Health5


Health data are not only hospital data. There are data that can directly or indirectly be used for the
health evaluation and health monitoring of populations. This research example demonstrates how
different entities and their data are indirectly related to the occupational health of the farming
population and should be taken into consideration during the health assessment.

Object Oriented Modeling of the Health and Safety Process in the Case of the Agricultural Work
The agricultural work is associated with a series of adverse health effects. The causation of the work related
problems is being discussed in many research papers. These factors fall into five major categories, namely the
exposure to hazardous agents, the type of agricultural work, the level of use of Personal Protective Equipment
(PPE), specific demographic characteristics and finally cognitive factors. The above are of great importance for
Primary Healthcare and specifically in Occupational Health and Safety. The five parameters are interrelated
through a specific schema, which describes the occupational health and safety considerations. This is achieved
through an object oriented modeling procedure that may be used for the overall understanding of farmers’
health dynamic by primary healthcare professionals and as a tool to support the development process of Primary
Healthcare and Occupational Health and Safety Information Systems in rural areas.

5
Diomidous M, Zikos D. Object Oriented Modeling of the Health and Safety Process in the Case of the
Agricultural Work. AIM; 2009; 17(4):205-208
34
Object Name Object Code
Prevalence of work related diseases and symptoms among farmers OBJ_01
Frequency of use of PPE in the farming population OBJ_02
Farmers’ knowledge and perceptions on occupational health and safety OBJ_03
Demographic characteristics of farmers OBJ_04
Duration of exposure to the agricultural work OBJ_05
Type of agricultural production OBJ_06

Derived data for decision making


Derived data can be found anywhere in healthcare. Data can be aggregated to support functions
such as:
the calculation of average clinical measurement values per patient
for absolute and relative comparisons of data with reference data, historical data of past
patients or data of the same patient collected at a previous time
grouping of clinical data into clusters of similar data
to perform medical image analysis, in order to find micro calcifications and other pathologies
the association of patent data with geographical and temporal information
the measurement of the prevalence of hospital acquired conditions in a hospital department
to calculate the availability of beds by hospital department

In class practice
Try to provide examples of primary data which are being used to calculate the above derived data
examples. For instance, to calculate the absolute difference of the most recent blood glucose
measurement with the previous one, for a patient, we need to consider two temporally consecutive
values of the same attribute. In other cases, we need to consider two or more different variables,
which will be used in simple of more complex calculations.

We will now explore this simple example: what we see in the table below are the daily measurements
of blood glucose for a specific patient. Have a look at the table for 30 seconds. Then please write
down your observations about any patterns of blood glucose level changes and the linearity of any
variations over the seven-day period.

Measurement date Measurement time Blood glucose (mg/dl)

1/20/2016 9 AM 111

1/20/2016 6 PM 123

1/21/2016 9 AM 108

1/21/2016 6 PM 135

35
1/22/2016 9 AM 113

1/22/2016 6 PM 137

1/23/2016 9 AM 111

1/23/2016 6 PM 142

1/24/2016 9 AM 113

1/24/2016 6 PM 145

1/25/2016 9 AM 108

1/25/2016 6 PM 162

1/26/2016 9 AM 109

1/26/2016 6 PM 177

Let’s get some simple descriptive statistics from our data:


Average Blood Glucose=128 mg/dl.
Just by looking at this single measurement, the clinician would have simply suggested to the patient
a healthier nutritional schema, and would have not prescribed any medication. Now, we will look
into the average blood glucose levels of our patient, separately, in the morning (9 AM) and in the
afternoon (6 PM).
Average Morning Blood Glucose: 110 mg/dl
Average Afternoon Blood Glucose: 145 mg/dl
What we can observe, at this point, is that the patient has significantly higher blood glucose levels
in the afternoon, but again this is typically the case in pre-diabetes; especially if you consider that
the blood glucose levels are higher after lunch or after an afternoon snack. The clinician might still
simply recommend a healthier nutrition scheme.
Now we will calculate the average daily change of the blood glucose levels. We will do this by
averaging the daily blood glucose, and finding the difference of this value for each consecutive days.
This result will be divided by the number of observation days.
Average Daily Change of Blood Glucose = (D2-D1) + (D3-D2) + (D4-D3) + (D5-D4) + (D6-D5) + (D7-
D6) / 6 = 4 + 4 + 2 + 2 + 6 + 8 / 6 = 4 mg/dl
This is an average daily increase of 4 mg/dl in the course of less than one week. With this
information, there seems that a clinical decision needs to be made. Although this result is a strong
indication that there are steadily increasing levels of blood glucose, the doctor would decide that this
is not a substantial increase and would ask the patient to arrange another appointment to the
doctor’s office a month later.

36
We will now try something else, which will be most revealing. We will calculate the average daily
change of the blood glucose levels, in the morning and the afternoon separately.
Average Morning Change of Blood Glucose Value= (M2-M1) + (M3-M2) + (M4-M3) + (M5-M4) + (M6-
M5) + (M7-M6) / 6 = 3 + 5 - 2 + 2 - 5 +1 / 6 = -2/6 = -0.3 (no change)
What about the afternoon?
Average Afternoon Change of Blood Glucose Value= (A2-A1) +(A3-A2) +( A4-A3) + (A5-A4) + (A6-
A5) + (A7-A6) / 6 = 12 + 2 + 5 + 3 + 17 + 15/ 6 = 54/6 = 9 mg/dl
This is an average increase of 9 mg/dl every afternoon. If we plot the values above table, this pattern
is obvious. This patient has been found to have a substancial and progressing increase to the
afternoon blood glucose levels, which would require immediate attention.
This rudimentary example indicates that, in order to successfully understand and communicate
knowledge on a disease progression, we need to (i) be provided with a longitudinal insight on the
fluctuation of health parameters (ii) identify interesting patterns and often seasonal trends. It is
interesting that many measurable levels of symptoms and conditions (such as pain level, mood and
depression, and various physiological parameters) frequently fluctuate during the course of the day,
often in clinically interesting patterns.

Informed clinical decisions need to be made based on data


If we want to describe the function of a clinical decision we can represent it as a system, where the
medical knowledge, the information from laboratory and radiology exams, the patient history and
the physical examination, consist of the input data, while the internal system components
(processing of input) are the:
(i) cognitive process of doctors (differential diagnosis) and nurses (nursing judgment , evaluation)
(ii) interaction of health professionals (interprofessional collaboration, clinical consultation)
(iii) analysis of the input using decision support methodologies
The output of the system is the knowledge which will help towards important clinical decisions like
the patient diagnosis, the appropriate personalized therapeutic plan, the anticipated patient
outcomes and the patient prognosis.
In class practice
By observing the image on the left,
which data are transferred across the
spectrum of a typical hospital day?
Can you think of at least two other
interactions, not shown here, which
will also be useful in an effort to make
evidence based clinical decisions?

37
Questions for Discussion
1. Discuss about four user groups of health data and the primary use of data for each group.
2. Describe with a diagram, how clinical decisions are made in a hospital. Your diagram
should present the users, procedures, interactions and data communication during these
hospital procedures and interactions.
3. Explain what primary data is and what derived data is. Provide examples.
4. To successfully assess the health status of the population and to identify risk factors, will
the disease history, and the patient demographics suffice?

38
Chapter 4. Data Collection Methods in Healthcare
How data are collected
In this chapter we will discuss the data collection methods in healthcare. In general, these methods
can be classified into three broad categories:

Direct intrusive (e.g. vital sign measurements with traditional means, asking the patient to
provide information, patient history)
Direct non-intrusive (e.g. use of sensors to measure physical properties, instances clinical
observation)
Indirect intrusive (e.g. samples collected and analyzed in hospital laboratories)

As far as the means of data collection are concerned, we will start discussing the interview, which is
a common, fundamental technique for collecting data in healthcare. Interviewers ask respondents
in person and write down the responses. The interviewing process requires many considerations,
that need to be made and the interviewer needs to be aware of the context, the responder and a
series of human behavior and psychological factors, during the discussion. Building a trustful
environment is very important for the successful interview. Interviews are usually conducted one-
to-one in healthcare, usually between a health care professional (interviewer) and a patient or their
caregiver (responder).
In some cases, interviewers call respondents by phone and write down/record the replies on semi-
structured or well-structured questionnaires, depending on the scope of the data collection. In other
cases, this is done via mail; once the responders fill up the questionnaires, they mail them back.
Very often, questionnaires are self-administered. The responder is provided with the questionnaire
and the interviewer will return to the responder to take the completed questionnaire back. Often,
paper questionnaires are mailed to respondents. Web questionnaires are becoming more and more
popular recently; the interviewer can setup a questionnaire and disseminate it via online invitations
to responders who will then be completing the online forms. In some cases, OCR technologies, CATI
(computer assisted telephone interviewing), TDE (Touchtone data entry), IVR (Interactive voice
response), may be utilized.

Advantages of interviews for data collection


• They permit the interviewer to ask the respondent direct questions.
• Further clarifications can be possible as the interview proceeds.
• Great flexibility for collecting private views-feelings & exploring new issues.
Examples: Patient History, Patient Satisfaction Surveys, Nursing Evaluation, Medical Diagnosis

Approaching the patient to collect the medical history. Clinicians or therapists (interviewers)
will consider introducing themselves in a friendly manner if the patient is a new case. The first

39
impression is important and any psychological defense mechanisms will be more easily handled
when an environment of trust is built. The interviewer would then consider to kindly ask if it is fine
to ask some questions & comment about their willingness to help. It is important that the
interviewer approaches the patient in a respectful manner, while at the same time, keeping the roles
of the interviewer (clinician) and the patient (responder) very well defined.
The latter requires is a very careful balance that needs to be maintained and there is a learning
curve during the clinical practice through gaining experience. Identifying the core problem of the
patient, will then be the starting point. Very often, patients will be overemphasizing on small aspects
of their health problem, by underestimating oftentimes more important issues (not always
deliberately).

Common problems during the medical history


Misinterpretation of questions
Memory problems and non-deliberate misreporting and omissions
The patient condition does not allow them to effectively communicate information to the health
professionals
There might be problems in formatting an answer in an understandable way, either due to the
complex way that the patient experiences the problem, or due to the nature of the disease and
patient weakness and physical disability
Sometime deliberate misreporting, can be related with a series of socioeconomic, cultural and
psychological factors

Review some of the tools and methods that can be


employed to overcome the above mentioned common
communication problems during the medical history.
Discuss how health care professionals can formulate
the question in such a way that the patient will find
it easier to respond.
Do you know what “gamification” of data collection
is? How such tools can help physicians and nurses
collect the patient history.

Medical history
It is the most common data collection process which is conducted via the method of interview. The
medical history of a patient is typically collected by a physician who asks a series of questions either
to the patient directly or to someone else who can provide accurate feedback (e.g. family members).
The aim of the medical history is to obtain information useful to make a diagnosis and decide for a
treatment plan. The responder usually starts by reporting symptoms, and then responds to more
questions. The physician will ask the patient about past medical problems, hospitalizations and

40
surgeries, past injuries, medications, allergies, family history, social history (e.g. alcohol
consumption, smoking, use of drugs, sexual life) and occupational history.

Understanding the nature of patient symptoms


Your mnemonic rule is O.P.Q.R.S.T
Onset: "how long has it been going on?"
Palliation/Provocation: "what makes it better or worse?"
Quality: "what does it feel like?"
Region/Radiation: "where is the pain? Does the pain move around?
Symptoms/Severity: "other feelings? How bad is the pain?"
Timing of disease and sproradicity of symptoms

Review of Systems: Screen for symptoms in each body system that have not already been
discussed. Skin, eyes, ears, nose, mouth, sinuses and throat, lungs, heart, digestive system,
genitourinary, hematologic, endocrine, musculoskeletal, neurological system and psychiatric
history.

For the majority of the interactions with a patient, the health care professional will capture the patient
response either using numeric scales, closed questions with ordered response scales or scales with
categorical items.
Questions with numerical data-type responses
How many fever waves do you have every day on average?
What is the maximum fever you had during the course of the disease?
How many days has it been since the symptoms appeared?
Closed questions with categorical response options
What is your ethnicity?
Closed questions with ordered response scales
How would you self-evaluate your health status? (Excellent-Very Good-Good-Fair-Poor)

Computer-aided history taking


A computer-assisted history-taking system (CAHTS) is a tool that aids clinicians in gathering data
from patients to inform a diagnosis or treatment plan. Despite the many possible applications and
even though CAHTS have been available for nearly three decades, these remain underused in
routine clinical practice6. There is no clear evidence for or against computer-assisted history taking
and barely any randomized control trials comparing computer-assisted versus traditional history

6
Pappas Y, Anandan C, Liu J, Car J, Sheikh A, Majeed A. Computer-assisted history-taking systems (CAHTS) in health
care: benefits, risks and potential for further development. Inform Prim Care. 2011; 19(3):155-60.

41
taking. There are, though some noted advantages and potential limitations from using CAHTS to
gather the medical history.

Gamification of data collection

Switching from clipboards and paper forms


to an interactive, game-like iPad interface
can have many advantages:

More enjoyable for patients


Higher response rates to surveys
Enables customization
Better patient screening
More accurate data The “Tonic Health” System (tonicforhealth.com)
Cheap and quick

Advantages
Decreased social desirability bias: patients will be unlikely to answer questions in a manner that
will be viewed favorably by a soulless computer
Patients may be more likely to report unhealthy lifestyle behaviors, there is no feelings of shame
when reporting to a machine
Easy high-fidelity portability to a patient's electronic medical record, using plug and play
computer-aided history taking devices
Limitations
Computer-aided history taking systems still cannot successfully detect non-verbal
communication, which may be useful for elucidating anxieties and treatment plans.
Patients may feel less comfortable communicating with a computer as opposed to a human,
although this becomes less of a problem with the widespread use of computer in the everyday
life of most citizens.

In a history-taking setting in Australia using a computer-assisted self-interview, 51% of people were


very comfortable with it, 35% were comfortable with it, and only 14% were either uncomfortable or
very uncomfortable with it.

Use of the Likert scale in data collection


A Likert scale, defines the format in which responses are scored along a range. When using a Likert
questionnaire item, respondents specify their level of agreement or disagreement on a symmetric
agree-disagree scale for a series of statements. This characteristic makes it meaningful for a Likert
scale to have odd number of items, so that there is a middle observation, which is the neutral
response to the agreement-disagreement scale. The Likert scale is not uncommon in the health care
practice. Patient opinion, patient satisfaction surveys and the self-evaluation of a health condition

42
from a patient, are some examples where the use of Likert questions is frequent. In general, the
above use cases involve questions which require responses of subjective nature.

Discussion
Which of the questions below could use the Likert scale for the patient response?
Did the nurses respond timely to your request?
Did you also experience fever during the last three days?
Can you specify the level of pain that you experience using the 0-10 pain scale?
Did you take the evening pills?

Clinical Observation
The clinical observation is another important method for collecting data. Observations are
susceptible to observer biases, but during the clinical process, there is a typical and standardized
procedure during an observation, called physical examination. A clinical observation is used by
skilled clinicians, to obtain information, usually about their patients. These are observations of
behavioral and psychological characteristics that will be useful to make a diagnosis and decide upon
an optimal treatment plan. Very often, the clinician takes notes during (or shortly after) the
interaction with a patient. Clinical observations are widely recognized to be the basis of therapy and
treatment and an extremely useful data collection technique in healthcare. The information
obtained during the medical history, when combined with the physical examination, formulates the
basis for a diagnosis and treatment plan.
It is important to stress, at this point, that the two methods of data collection we discussed so far,
namely the interview (medical history) and the observation (physical examination) are the
cornerstone of data collection for clinical decisions.

Analysis of samples in laboratory examinations


(i) Laboratory examinations produce data which are stored into specialized Laboratory
Information Systems (LIS) which are subsections of an integrated Hospital Information
System
(ii) Despite other processes where data are usually produced near the source (patient), these data
require fluid or other samples to be collected from the patient and further analyzed then with
a series of laboratory methods

Data collection and storage in the Electronic Medical Record


Data Storage and Retrieval. Data are stored in multiple subsystems in an Electronic Medical
Record (EMR). Some Electronic Medical Record systems provide more functionality than others; not
all of the systems are standardized. Some providers have the EMR capabilities, or other software
programs to assist with the data collection.

43
Data Standards. Lack of data standards that contain definitions and taxonomy, may result in
making data acquisition from electronic systems very difficult. Multiple disparate systems co-
existing in a hospital, not communicating with each other will require labor intensive data mapping
by health providers to link systems. Similar challenges exist between different health care
organizations.

Data collection challenges


Enhancing legacy health IT systems, implementing staff training, and educating patients and
communities about the importance of collecting these data can help improve data collection. There
is evidence that healthcare organizations with advanced real-time, computable data capture,
showcase higher performance. There are significant challenges, though that have to be addressed.
While a range of health care entities collect data, data do not flow in a standardized manner. For
example, a single hospital, may use different patient registration systems, which may not have the
capacity to communicate with one another.
At the same time, there are noted limitations of classification systems (i.e., ICD) which lead to
inaccurate or incomplete data and analysis results. Many reporting requirements utilizing varied
taxonomies and data definitions affect the quality of data collected and, cause confusion. Another
issue is that not all data systems capture the method of the data collection, and some systems do not
allow for data overrides. Therefore, data harmonization becomes a difficult task. Finally, some of
the technical challenges include incompatibilities between software products used for data collection
and variations in data submission formats.
Data collected at the hospital level are useful both for assessing the quality of services and, if shared
with other entities, for facilitating analyses of quality across multiple settings. Some entities face
Health IT constraints and internal resistance. Some physicians are highly competitive and feel they
are already providing high quality patient care. Changes in data collection metrics require
healthcare providers to make changes to accurately capture data in the EMR. This is in the majority
of the cases labor intensive and costly.

Collecting and sharing data across the health care system


Health care involves a diverse set of public and private data collection systems
Health surveys
Administrative enrollment
Billing records
Medical records
The above datasets are used by various entities, including hospitals, physicians, health plans,
researchers, health data analysts and epidemiologists. Often hospitals have information systems
which standardize data collection to a large extend, and extract the data that will eventually be
combined with similar data from other hospitals to form –often publicly available- regional and
national health care datasets. Such datasets, specific to Medicare and Medicaid patients care
44
available via the Centers for Medicare and Medicaid services (CMS), as well as from other sources.

Population health surveys


Population health surveys are a type of general survey research methods that overlap with
epidemiology, rather than the study of a disease process on an individual7. Federal and state health
agencies administer surveys that are primary sources for estimating the health of a population and
current and future needs for health care services. For instance, a series studies employed surveys
(i.e. National Health Interview Survey (NHIS), the National Latino and Asian American Survey
(NLAAS), the California Health Interview Survey (CHIS). There are surveys with the aim to capture
data on the uninsured & reports on financial & nonfinancial barriers to seeking care. Other surveys,
such as the Consumer Assessment of Healthcare Providers and Systems (CAHPS®), are designed to
assess plans, hospitals, and medical groups.

Discussion

Study the diagram carefully and


discuss why it is not possible to
survey the entire population in a
population survey, and how this
limitation will be overcome.

How can we draw conclusions about


a population from a sample of data?

Discuss the considerations that need


to be made when selecting an
appropriate sample for a population
survey.

Smart Sensors for Data Collection and Monitoring

What is a sensor?
A sensor is a converter that measures a physical quantity and converts it into a signal which can be
read by an observer or by an (today mostly electronic) instrument. A common example of a sensor
that we use in everyday life, is a mercury-in-glass thermometer, which converts the measured
temperature into expansion and contraction of a liquid which can be read on a calibrated glass tube.
A sensor receives and responds to a signal when touched. Sensors are used in everyday objects such
as touch-sensitive elevator buttons. Applications include cars, machines, aerospace, medicine,

7
Waghorn G, Lloyd C. Population Health Surveys: an introduction to basic concepts. International journal of therapy
and rehabilitation. 2009; 14(4): 191-8.
45
manufacturing and robotics. Sensors ‘sense’ and measure the property using different methods,
based on which sensors can be classified to electrochemical, electromagnetic, electromechanical,
photoelectric, thermoelectric and electroacoustic.

Sensors measure a physical A sensor has three The multiplexing step (Mux) involves the
property and transform an layers, to sense, selection one of the several analog input signals
analogue (continuous) to measure and which is forwarded into a single line.
digital (discrete) signal. communicate the
measured property

Main principles of a good sensor

A sensor's sensitivity indicates the sensor's output changes when the measured quantity changes. For
instance, if the mercury in a thermometer moves 1 cm when the temperature changes by 1 °F, the sensitivity
is 1 cm per degree.
Sensors that measure very small changes must have very high sensitivities
We should minimize the impact sensors have on what they measure. Sensors need to be designed to have a
small effect on what is measured; making the sensor smaller often improves this.
Is sensitive to the measured property only
Is insensitive to any other property likely to be encountered in its application
A sensor should not influence the measured property
The output signal of a sensor is linearly proportional to the value of the measured property
The sensitivity (minimum input of physical parameter that will create a detectable change) should be high

Potential and Challenges


To be able to utilize sensors in healthcare, effectively, there is the need for interoperability between
biomedical devices: sensors should be compatible with existing medical devices and EMR systems.
It is, also, easy to understand why medical sensors should be characterized by reliability &
robustness for accurate diagnoses & functioning in uncontrolled environments. Many patients are

46
monitored for their health conditions on a 24/7 basis, so there is no excuse for missed sensor readings
and sensor measurement errors. Energy conservation technologies, would help expand the
autonomy of a sensor, before a battery charge circle or even battery replacement would be required.
There are cases where operation in hospital buildings results in further interference due to walls,
etc. and this might be decreasing reliability. It is therefore important, prior to the installation of
hospital sensor network, to be verified that the signal can be transferred seamlessly, without any
interruptions.
In addition, since patient data is sensitive information, data packets transferred via sensors should
remain confidential and unreachable by third parties. For this reason, sensor devices should
integrate protocols and data encryption technologies, to be compliant with the HIPAA guidelines
during the data transmission.
Some additional considerations that need to be made include event ordering, timestamps,
synchronization and quick response in emergencies. All these requirements are especially important
in the health care domain, since at a given time point there might be a large number of sensor
measurements from multiple patients and the transmission needs to be prioritized, while the
transferred data need to be labelled with the exact time that the measurement occurred. Consider
that sensor data will be utilized for a longitudinal insight of the patient condition: sensor data
contribute significantly to the understanding of temporal patterns from the health professional, for
a patient.
The integration of many types of sensors demands new node architecture approaches: multi-sensor
networks have generated the need to develop novel data collection and integration technologies to
respond to the bandwidth requirements and the dynamic synchronization of sensor data.
Furthermore, the trend, for the coming years is to use sensor data for predictive modeling, for an
insight on how specific sensor data patterns can be related to optimal treatment decisions. In
addition, there is a widely recognized need to integrate available specialized medical technology with
wireless networks (for example, wearable accelerometers with integrated wireless cards for patient
monitoring). Many commercial applications, which have already become accessible and affordable,
recently appeared on the market, and target (i) healthy citizens who want to monitor health and
lifestyle parameters (ii) chronic disease patients who would monitor their disease related
parameters more efficiently and (iii) health care organizations, which utilize sensor networks for
patient monitoring.
Some of the most obvious benefits from the use of sensors in healthcare, include savings on medical
expenses, time (less face-to-face appointments are required), while the automated data collection
allows the participation of more participants in clinical trials. Measurement bias8 is eliminated and
no measurements are skipped due to possible negligence by a health care professional.

8
A systematic error that occurs when, because of the lack of blinding or related reasons such as diagnostic suspicion, the
measurement methods (instrument, or observer of instrument) are consistently different between the groups in a study.
47
Jawbone UP: a flexible Basis: wrist-worn
wristband packed with device that measures
vibration and motion the wearer’s heart
sensors to track and rate, caloric burn,
analyze exercise, diet, sleep patterns
and sleep data

AgaMatrix: sensor Withings Wi-Fi Body Scale:


for tracking blood sends body measurements
glucose levels. It wirelessly to computer or
also tracks carbs iPhone, to track weight gains
intake and insulin or losses
dose for users with
diabetes.

Examples of commercial applications for lifestyle management and health monitoring

Medical sensor applications include patient monitoring, environmental tests and diagnostics. Some
of the most popular medical sensors are being used for the measurement of:
Patient vital signs: temperature, blood pressure, heart rate, respiratory rate
Pulse oximetry: pulse oximeters are non-invasive devices used to measure a patient's blood-
oxygen saturation level and pulse rate
Blood Glucose sensors
Physical pressure: many applications for orthopaedic patients, in neurology (e.g. monitoring of
patients after stroke)
FMRI Sensors: functional neuroimaging procedure using MRI technology that measures brain
activity by detecting changes associated with blood flow
Mobile Phone sensors, such as accelerometers, gyroscope

The box below discusses challenges and potential considerations that need to be made when a new
medical sensor technology is considered for purchase by the hospital management. Discuss the
significance and validity of the arguments below and suggest meaningful and feasible strategies to
address these points.

The “cost” challenge: discuss the following opinions, about considerations driving decision for
investment of sensor technologies by healthcare organizations.
“Too expensive sensors will never be accepted by the healthcare management”
“Health professionals do not have the knowledge to use this new technology”
“Staff making the measurements would be a cheaper manual alternative”
“Sensors can have a real impact only if they are of low cost”
“Devices using sensors have to be portable”

48
Classification of healthcare sensor systems based on the number of sensors
Single sensor data: one single sensor measures a specific physical property (e.g. temperature or
acceleration). Single sensors are often used to automate basic clinical requirements, such as
measuring the vital signs.
Multi-sensor data: multiple sensors are combined, to measure a property or an event of interest,
which cannot be measured with a single sensor. For instance, in homecare, a device combining an
accelerometer and an ECG sensor can be used to detect cardiac arrest. A patient fell (acceleration)
due to losing consciousness seconds after a cardiac arrest (ECG). Multi-sensor data have recently
been the focus of research, since the combined used of different sensors and meta-data, can be used
to recognize health related events.
An example of a wireless vital sign sensing device is the
wireless pulse oximeter and wireless two-lead
electrocardiogram (ECG). These devices collect heart rate
(HR), oxygen saturation and ECG data and relay it over a
short-range (100m) wireless network to any number of
receiving devices, including computer tablets, laptops,
ambulance terminals. Data can be displayed in real time &
integrated in the developing pre-hospital patient care record.
The sensor devices themselves can be programmed to process
the vital sign data, for example, to raise an alert condition
when vital signs fall outside of normal parameters. Any adverse change in patient status can then
be signaled to a nearby Emergency Medical Technician (EMT) or paramedic.

Transmission technologies for sensor networks


WLAN (802.11) Bluetooth-based WPAN (802.15.1) ZIGBEE (802.15.4)
Range 100m ~10-100m ~10m
Cost/complexity >6 1 0.2
Power consumption Medium Low Ultralow
Size Larger smaller smallest

Biosensors
A biosensor is an analytical device, used for the detection of an analyte that combines a biological
component with a physicochemical detector. The sensitive biological element (e.g. tissue,
microorganisms, organelles, cell receptors, enzymes, antibodies, nucleic acids, etc.), is a biologically
derived material or biomimic component that interacts (binds or recognizes) the analyte under
study. The biologically sensitive elements can also be created by biological engineering. Some of the
biosensor applications include glucose monitoring in diabetes patients, the detection of pathogens,
routine analytical measurement of folic acid, vitamin B12 and other, as alternatives
to microbiological assay, drug discovery and evaluation of biological activity of new compounds.

49
Biosensors-an example
The blood glucose biosensor the enzyme glucose oxidase to break blood glucose down. In doing so it first
oxidizes glucose and uses two electrons to reduce the FAD (a component of the enzyme) to FADH2. This in
turn is oxidized by the electrode (accepting two electrons from the electrode) in a number of steps. The
resulting current is a measure of the concentration of glucose. In this case, the electrode is the transducer
and the enzyme is the biologically active component.

Monitoring and Data Transmission


Monitoring and transmission of sensor data can occur continuously, periodically or be alert-driven.
Sensors (and other wireless devices in the area) form an ad hoc network. If cell phone fails to
transmit data, data can be transmitted over multiple hops in ad hoc network to travel within range.
Transmission of differential data9 to decrease energy consumption and network traffic, is a common
practice. Often, critical signals need to be transmitted with priority. Priority-based transmission is
the path of transmission determined by the nature of data, where critical emergency signals receive
the highest priority.
There are various types of wearable biomedical sensors with integrated radio transceivers (ex:
accelerometer in bracelet to detect hand tremors). Radio signals are received by the cell phone and
transmitted to server and the analysis of raw data performed via wavelet analysis. The data is
aggregated and synchronized and then, machine learning methods such as decision trees or artificial
neural networks are applied to decide appropriate actions (data is within normal range, outside
normal range and either does or does not require emergency action, etc.). The data and the analysis
results are stored in server side database and report is generated to send to healthcare professionals.

9
Differential signaling is a method for electrically transmitting information using two complementary signals
50
Wearable Smart Clothes
Using the rapidly improving wireless communication technologies and advanced sensors available
today, many companies and universities are proposing solutions for healthcare applications. The
smart clothes use e-textiles, which are fabrics that enable digital components such as small
electronics and sensors to be embedded in them.

Left: an example of wearable jacket with health monitoring sensors (Source: IFE Wearable Computing),
Right: Lifeshirt system by ‘Vivometrics’

Wearable sensors: Lifeshirt (vivometrics)


The lifeshirt is a noninvasive system, based on plethysmography10. Lifeshirt provides constant monitoring of
ambulatory patients by measuring and storing respiratory and cardiac parameters and creates health profiling
during normal daily activities. Users wear a light washable garment, with embedded sensors that collect data
on cardiopulmonary function. The system may also include an electronic digital diary for patients where they
can save user data. This device can improve the quality and performance of sleep studies, since most patients
can be monitored from home. Especially it can be used detection of Obstructive Sleep Apnea. Lifeshirt can be
combined with optional peripherals to monitor functions such as ECG, EMG, leg movement, body temperature,
blood oxygen saturation and blood pressure. The person being monitored can indicate symptoms itself, activities
and medication taken in a portable device, which is directly connected to the vest.

Patient Compliance
Considerations, especially for elderly patients, to improve patient acceptance and compliance should
be made. Since seniors often have a tendency to distrust, or even reject technology, the systems
should be intuitive and easy to operate. A study in which elderly residents of Sydney, Australia,
participated in an open-ended discussion found an overall positive view of wearable sensor networks
due to implications for independence. Many users reported that they would feel uncomfortable when
exposed with visible sensors; the design should therefore be as unobtrusive as possible. Compliance
issues due to forgetfulness can be overcome by integrating alarms and reminders, as part of the

10
Plethysmography measures changes in volume within an organ or whole body usually resulting from fluctuations in
the amount of blood or air it contains
51
systems. Finally, data privacy concerns can be reported; many users do not feel comfortable with the
idea that their data are collected by some automated process and transmitted in a matter of
milliseconds to remote locations. Patient education, should become an indisputable component of
the strategy to successfully establish and use sensor networks for patient monitoring.

Questions for Discussion


1. What different data collection methods are employed for a patient from the point of the
hospital admission to the discharge?
2. How is the hospital EMR useful towards more efficient data collection?
3. Discuss some of the technologies and methods that can improve the quality of the data during
the data collection
4. Why is it especially important for a healthcare sensor network to satisfy the properties of a
good sensor?
5. Compare the data collection in hospital patient care with that in a population health
research.

52
Chapter 5. Hospital Information Systems
Information management in healthcare today
One can identify many problems with the information management in hospitals nowadays.
Healthcare is at least one decade behind in terms of e-health technologies adoption, when compared
to other non-health related industries. A series of problems are evident and most are related to how
the complicated and dynamic flow of health information is managed today. The problems can be
varying. Incorrect reporting, such as a wrong laboratory reports, may lead to erroneous and even
harmful treatment decisions. Repeated examinations or missing results that need to be available
are examples with serious implications to the cost of care, since many procedures may be repeated
while they should not. Information should be documented adequately, enabling health care
professionals to access the information as needed in order to make informed decisions. In general,
any type of patient related information should be available on time, and it should be up-to-date and
correct. The dynamic information processing is the key factor for improving quality and reducing
costs, therefore information processing in a health care organization should become a strategic
priority.
Nowadays, in the United States, medical errors account for more deaths than breast cancer, HIV
and motorcycle accidents, with the majority of medical errors being preventable with the use of
advanced information management technologies. In terms of health outcomes, the United States
lags behind in comparison to other industrialized nations, in many areas such as the infant mortality
and life expectancy, while the healthcare system has among the highest healthcare costs per capita
in world. The management of health information and the wide integration of systems is only one
consideration related to the above problems, but an important one.

Discuss how healthcare information systems can be proven to be useful for:


a. The improvement of quality of care and patient safety
b. A more cost-effective healthcare system

What is a System
A system is a set of interacting or interdependent components forming an integrated whole with
relationships to other elements or sets. Every process that involves entities functioning in a specific
way and interacting with each other, may be described as a system.
Systems may consist of subsystems, which may have their own subsystems, and so on. All
subsystems work together to exchange data for a specific purpose. Typically, the elements of a
system can be (i) humans, (ii) machines and (iii) procedures, and determine the system’ s internal
environment. What is outside is called external environment. Those two environments are in
constant communication exchanging data (input-output).
53
Humans Data

Software Hardware

Processes

Components of a system and their interaction.

_________ _______
The above diagram presents the components
of a system and their interaction.

Discuss in class a healthcare scenario and fill


_________ __________
the gaps on the empty diagram on your left
hand for your selected scenario: data,
hardware, software, processes and humans.
____________

Information System
An information system (IS) - is any combination of information technology and people's activities
that support operations, management and decision making. The term is frequently used to refer to
the interaction between people, processes, data and technology. It refers not only to the information
and communication technology of an organization, but also to the way in which people interact with
the technology to support business processes. The most important concept of information systems is
that they transform data (input) to information (output). Although the legacy information systems
were not necessarily computer based, the modern high complexity information systems cannot easily
be implemented without computer and telecommunications support.

Information Systems in Healthcare


Information systems can be found everywhere in healthcare: in hospitals, clinics, long term care
facilities, and public health settings. Their features vary, and need to be compatible with the specific
scope of each setting. Healthcare Information Systems are designed to manage the medical,
54
administrative, financial and legal aspects of a hospitals and their services processing.
An ideal Healthcare Information System should be integrated, and this means that it should
interconnect all different functions of a healthcare organization without any gaps and interruptions
to the information flow across the hospital. Information processing is an important quality factor,
and at the same time an enormous cost factor as well. It is also becoming a productivity factor.

Hospital Information Systems


In hospitals, information processing should offer a holistic view of the patient and of the hospital
functions. A hospital information system can be regarded as the memory and nervous system of a
hospital. It is evident that the amount of information processing in hospitals is considerable.
Therefore, the integrated processing of information is extremely important for hospitals. All groups
of health professionals and any function at any hospital department, depends on the quality of data.
Health care professionals frequently work with the same data and need to have access to this data
from different locations at any given time. Integration of information processing should ideally
consider not only one health care organization, but also the information processing needs across the
healthcare system. These umbrella health information systems can be known as as integrated
health care delivery systems.

A Hospital Information System (HIS) is an integrated computer system which stores, manages and
recalls information related with the clinical and administrative healthcare expectations in a hospital.

This collection of systems and equipment manages all hospital information with the goal to:
1. Support health professionals to be efficient during their everyday practice, by providing to them
easy access to information and by connecting different departments seamlessly.
2. Improve quality of health services provided to the patients, with the integration of standards
which will be an asset towards an evidence based practice. At the same time the patient
information is easier to access and to make decisions for a patient: all considerations that need
to be taken, will be in place. Another dimension of improving quality is the continuous
uninterrupted data flow for a patient during a patient visit or at a follow-up appointment.
3. Reduce the cost of care, with a more efficient management of the patient information, and by
encouraging good practices and eliminating overuse of practice.

Related terms
There are several other information systems for the management of the flow and storage of
information in hospital routine services.
Healthcare Information System: this is a broader term that can be applied to any health context.

55
The term may be used to describe any system that manages and transmits information related
to the health of individuals or the activities of health organizations.
Clinical Information System (CIS): describes information systems with focus on a technology
their application at the point of care and to support the acquisition and process of clinical
information.
Patient Data Management System (PDMS): information systems which integrate all patient
data. Such system are often used in complex intensive care units and high care patient cases.

Evolution of Hospital Information Systems


Traditional approaches were primarily used to encompass paper-based information processing as
well as resident work position and mobile data acquisition and presentation. Since the 60s,
computers started to be used in hospitals for the financial management, clinical laboratories and
the analysis of electrocardiographs. It was during the 70s when the term ‘integrated hospital
information system’ was first introduced. The first information systems were geared for the
automation of administrative and financial functions of hospitals and slowly focused on the
improved productivity of hospital departments. The services were eventually built for the support of
the health information management needs for health professionals and more recently are designed
to provide patient centered, personalized services which can be accessed via the hospital network.
The services are, for an increasing number of modern implementations, based on cloud technologies.
We can summarize the evolution of Hospital Information Systems in that while newer systems have
an increased backend complexity, they are becoming more and more user friendly and patient
centered. The systems have moved away from the need for automation, something that our current
technological evolution makes it a trivial challenge, to an effort for seamless integration and
continuum of patient-centered healthcare services.

Patient and the community

Healthcare Professionals Internet/


Complexity

Computer Networks Cloud services

Productivity client/ server


Areas: Hospital Departments

Personal Computers
Automation: Administrative

Central Computers
1960 1970 1980 1990 2000 2010 2020

Priorities change as health systems and technologies change: moving from “automation” to the “integration”

56
Scope and Requirements of a Hospital Information System
Hospital Information Systems (HIS) provide common registry for all patient information and the
different hospital departments have access to the patient information, from any place at any time.
The availability of the right information without waiting times in healthcare, as well as a guarantee
that the right information reaches the right person at the right time, is indeed very important. HIS
implementations must recognize the health professional as the main user, while, at the same time,
placing the patient at the center of care. An HIS integrates new information using multiple diverse
sources of information, and helps improve healthcare services to become cost efficient by facilitating
a better management of resources in healthcare settings. There are specialized tools which are
components of HIS implementations and have been specifically designed for cost and quality of care
assessment. The infrastructure of an HIS implementation should support the integration of
enhanced applications with add-ons and modules which facilitate the decision making process. A
really important consideration is that HIS implementations should be backwards compatible so that
old data can easily be migrated into the new system.

Primary goals of Hospital Information Systems: improve patient care and efficiently manage resources
Primary goals Related secondary goals
Better care services and improved quality of care Improvement of communications
Smaller waiting times
Better decision making
Cost management and improved cost-effectiveness Shorter length of stay
Lighter administrative workload
Better use of resources
Reduction of staff costs

Main elements of Hospital Information Systems


As we discussed earlier on this chapter, an information system includes humans, procedures,
equipment and data. A hospital information system recognizes these components as follows:

Humans (users): people who produce the information and use it for decision making during their
everyday practice. They can be healthcare professionals, hospital administrators, policy makers, or
even external parties, such as payers.
Data: raw data to be processed based on the needs of the above mentioned user categories.
Healthcare data have been extensively discussed in Chapter 2.
Procedures: series of guidelines that describe how humans will act under specific circumstances. In
healthcare, examples of procedures include clinical interventions, patient transfers and admissions,
orders for examinations and various administrative processes.
Equipment: in healthcare a combination of hardware and software is used for the collection, storage
and communication of health related information.

57
Network technologies are required for a Hospital Information System
High speed wired networks Voice over IP (VoIP)
Wireless Networks Web servers (client-server)
Internet services Intranets
Cloud based services Synchronous video conferencing

Architecture of a Hospital Information System


A typical HIS is composed by one or several software components with extensions, as well as of a
large variety of sub-systems to facilitate the needs of the various hospital functions and the different
medical specialties. Apart from the Clinical Information System subsection, some of the most
important sub-systems are the following:
Laboratory Information System (LIS): records, manages, and stores data for clinical laboratories.
One important function of LIS systems is to send laboratory test orders to lab instruments, track
those orders, and then record the results to a database.
Policy and Procedure Management System: These systems help hospitals comply with HIPAA
regulations, CMS norms and accreditation requirements. They facilitate the development,
implementation, of policies, and often include training modules for staff on hospital policies and
procedures.
Radiology Information System (RIS): is used for the management of imaging departments. An RIS
manages patient scheduling, resource management, examination tracking and interpretation,
results distribution, and billing. It is also worth referring to the picture archiving and
communication system (PACS), which is a medical imaging technology which provides economical
storage and convenient access to images from multiple modalities.

Structure (Architecture) of a Hospital Information System. Notice that the clinical subsection is core element
58
Functions of Hospital Information Systems
The specific operations of an HIS and its dedicated components, primarily focus on the management
of all the clinical, patient admission, administrative, knowledge management and financial
functions of a hospital.
Patient registration and management of the id information of a patient from the admission to the
patient discharge is supported with the use of specialized subsystems.
Billing systems integrate algorithms for the coding of the ICD diagnoses for each patient to a
Diagnosis Related Group and a billing amount.
Appointment and scheduling systems help manage the workload and the services availability and
provide to the patient an informed estimate of their future care plan.
Computerized Physician Order Entry (CPOE) systems help health professionals enter medication
orders or other instructions electronically instead of on paper charts. A primary benefit of CPOE is
that it can help reduce errors related to poor handwriting or transcription of medication orders.
Electronic Health Records (EHR) refer to the systematized collection of patient and population
electronically-stored health information in a digital format11. Chapter 6 is dedicated to the Electronic
Health Records.
Pharmacy systems manage prescriptions, the organization of pharmaceutical inventories,
prescription and ordering workflow and billing functions.
Telemedicine systems support the remote diagnosis and treatment of patients with the use of
telecommunications technology.
Decision Support Systems are computer-based information systems that support clinical and
administrating decision-making activities.

Use of standards in Hospital Information Systems


Up to date standards and guidelines should be integrated into Hospital Information Systems;
otherwise existing standards will be of limited usefulness. Universal medical vocabularies are used
to store and transfer information in a standardized and unified way, without inconsistencies. The
use of standards, therefore, improve the quality of data. There are standards for data exchange and
interoperability (e.g. Health Level 7) to communicate data between different information systems.
There exist standard formats for medical records (e.g. openEHR), laboratory data, medical images
etc. Even the medical literature formats have been standardized (e.g. Medical Subject Headings-
MeSH). Finally, it is worth mentioning the existence of health care standards and the integration of
good practices and treatment guidelines in HIS implementations. Structured guidelines are
important for evidenced based practice and quality of care since they describe the series of steps to
to complete a clinical intervention successfully. Compliance to standards for a series of interventions
can be tracked down by HIS systems, especially when the clinical procedures are performed with

Gunter, Tracy D; Terry, Nicolas P (2005). "The Emergence of National Electronic Health Record Architectures in the
11

United States and Australia: Models, Costs, and Questions". Journal of Medical Internet Research. 7 (1): e3.
59
the use of tracking and monitoring sensors which record the clinical interventions and verify the
successful execution of the clinical practice.

Take home messages

The integrated processing of information is important because:


Health care professionals frequently work with data
The amount of information processing in hospitals is considerable
All groups of people and all areas of a hospital depend on its quality

The systematic processing of information:


Contributes to high-quality patient care
Reduces costs

Information processing in hospitals is complex and therefore we need:


Systematic management and operation of hospital information systems
Medical informatics specialists to design and manage the operation of hospital information systems

Successful Implementation of e-health: a case report


The case report that we will discuss in this section indicates that different healthcare employee
groups value the importance of hospital information system implementations differently, based on
their unique role and responsibility. The authors’ motivation to conduct this study was to assess the
implementation of three Hospital Information Systems. Changes to the hospital workflow, (lack of)
participation to the decision making, user support and professionalism were considerable areas of
concern during the introduction of new information systems, and are experienced in variable ways
by administrative, clinical, technical staff and external providers.

Design, implementation and level of integration were investigated using a questionnaire, based on
literature evidence that success & failure factors are not only technical, but also related to the
existing organizational models, education, managerial and evaluation issues.

The hospital workflow process was examined in depth to identify factors related with the impending
the successful introduction of IS. The case study was performed in 2 mid-sized general hospitals and
one oncology hospital.

Hospital A had an integrated administrative, financial and clinical information system


Hospital B was in the process of purchasing an integrated information system
Hospital C had non-integrated subsystems, without a clinical subsystem.

60
The qualitative assessment involves the analysis of the employee perceptions (2 phases):
During phase 1, the researchers started with a literature research to find the critical items with
regards to the design and implementation of hospital information systems. As a result, an
assessment questionnaire with closed-ended questions was created and distributed to 9 IT
department employees of the three hospitals. In phase 2 the researchers discussed in open interview
sessions the implementation process and met four different groups of hospital employees, in one of
the three hospitals of this study (hospital C).

Items of the Assessment questionnaire (phase 1)


1. Human resources and IT services in the hospital
2. Existence of administrative, clinical & pharmacy and LIS subsystems
3. System specifications
4. Use of coding systems and standards to achieve integration
5. Role of hospital management in terms of planning and financing
6. Education and training of users, motivators and external consultants
7. Contracted agreements between the supplier and the hospital management
8. Was the new system designed to be compatible with the hospital workflow?
9. Data migration consideration for the transition to the new system
10. User willingness to use the system
11. Support by specialized information and telecommunications professionals
12. Needs analysis, feasibility and risk analysis prior to implementation
13. Patient and employee satisfaction considerations

Phase 1: Assessment results


Active involvement of the hospital management was only reported in hospital A, via the utilization
of key employees and the adoption of a long term plan. With an exception of hospital A, the other 2
hospitals did not contract a full task agreement with the developer to commit to providing user
support. Hospital B recruited an external consultant and maintained active support & maintenance
contracts.
Regarding workflow adjustments & data migration, in hospital A, typical workflow has been
reported to be altered due to the introduction of the new information system. In hospitals B and C,
there was followed a reverse procedure, where the system was adjusted to adapt to the existing
workflow.
Changes from the introduction of the new system were handled by the existing IT department only
in the case of hospital A. In hospital C, data was successfully transferred to the new system while
maintenance of the old one remained possible. In hospital B the old system was reported to be
interoperable with the new one in terms of data exchange. User training, education and support by
IT staff, was considered a high priority only in hospital A and, to some degree only, in hospital B.

61
The IT employees of the three hospitals believe that the most important problems during the
implementation have been the lack of central planning, difficulties in the user acceptance and
integration of the new system to the everyday practice and, finally the lack of use of standards.
In the case of two out of the three hospitals there was a consensus that specific “financial of personal
career interests” were involved to the purchase decisions and that there were insufficient IT
professionals and health informatics experts to guarantee the success of the system.

Phase 2: Open Interviews


During interviews with the IT employees, there were identified three clusters of problems: resource
related, human factor related & organizational/planning issues.
Human factors: users ‘did not care’ and there were no motivators to support the transition to the
new system and its actual use in the daily practice.
Resource related: inadequate number of professionals in the IT department, small working space
and minimal financial resources to effectively support the new system.
Organizational issues: lack of a well defined objective and a consistent system integration plan

Discussion with groups of employees in Hospital C


The research team further discussed with four different groups of employees, in hospital C.
Group 1 - representatives of the supplier company: they reported that they contacted the IT
department to understand the hospital structure & facilities and that they also worked with hospital
representatives regarding the coding of hospital material. They invited the HIS users to a
presentation and prepared a computer training room with real scenarios. During the discussion the
researchers were given the impression that the mindset of the supplier was that of an external body
with the sole purpose to deliver what was agreed, nothing more, nothing less.
Group 2 - members of the hospital administration: the hospital administration representatives
expressed concern that the decision to introduce the specific HIS was solely taken by the hospital
62
manager. The hospital management activities and the financial department strategy were also
reported to be a cumbersome process with unnecessary bureaucracy.
Group 3 - employees working in clinical departments (nurses and physicians): they
reported that it was not until late when they were informed about the new system. Despite having
been invited to presentations, most were found to be distrustful about the success of the system.
Skepticism may be related to the expected workflow changes. Most of the negative reactions came
from nurses.
Group 4 - hospital IT employees: they were positive towards the introduction of an integrated
HIS and supported the process. They also helped develop a training program, with the hospital’s
education office and the nursing management. Their complaint was that, while they were willing to
provide assistance and support, they were not given the opportunity to be more actively involved.

Questions for Discussion


1. The objectives of Hospital Information Systems are to improve the quality of care and reduce
costs. How will these two goals be reached with a successful implementation of a HIS?
2. Modern HIS implementations are distributed and service oriented. Can you discuss this
statement?
3. Why is it extremely important for Health Information Systems to be backwards compatible
and upgradeable?
4. Discuss the role of the hospital administration during the introduction of a new Hospital
Information System.
5. Why is the use of standards in a HIS important and how does a HIS act as an enabling
mechanism for the successful integration and use of standards?

63
Chapter 6. Electronic Health Records
The need of Electronic Health Records: continuum of care
The top priorities for healthcare systems today include the continuity of care challenge, the ability
to provide quality of healthcare services, and equity to the access of services, and an increase of the
system efficiency, for all settings across the whole spectrum (primary, secondary, tertiary care).
Healthcare services are distributed across clinical environments where physicians, nurses and other
healthcare professionals work together but many functions primarily interest administrative and
public health services providers (such as hospital administrators, health authorities, and
epidemiologists). The patient information needs to be accessible from different providers, who have
variable roles within the system and need to have access to different views of the clinical
information. This information should follow patients for their whole life, including disease
prevention, treatment and rehabilitation aspects and would provide to the healthcare system a
dynamic, complete and longitudinal insight about the patient health. To satisfy these requirements,
healthcare organizations need to maintain high quality healthcare records.

Reasons that we keep records in healthcare


Direct healthcare provision to the patient: the available information that is stored in the records
can be retrieved and used in clinical decisions
Management of patient care and resources: the healthcare professionals know the clinical steps
that have been/need to be completed and the available resources needed to complete a clinical task.
Assessment of the quality of care: by utilizing the available data, the healthcare system can
assess the quality of the healthcare services it provides, by aggregating data to create useful quality
of care indicators (e.g. prevalence of in-hospital acquired conditions).

64
Research and education: when the healthcare data are stored into records are transcribed into
spreadsheets, they can be proven to be a very useful data resource for clinical and epidemiologic
studies.
Public health policies: when data from different healthcare providers are combined, they can be
used as an enormous dataset with significant public health relevance, for the assessment of the
population morbidity profile.
Financial management: the available data kept into the record, may be used to estimate the use
of resources that the healthcare system has spent for each patient.
Records were kept in paper format during the 20th century and it was not before the last decades of
the past century when the first Electronic Health Record systems appeared. Some of the
disadvantages of traditional paper records are pretty obvious, such as the fact that they make
patient data available only once at a time and in one place. In addition, the handwritten entry
increases the possibility of transferring mistakes from the data source, while legibility issues and
misunderstanding of the information can happen too. With an increasing number of patient
hospitals used to deal with enormous volumes of paper and it was very difficult to use efficiently for
adequate follow-up of the patient health status. The physical security breach was huge, since any
natural disaster would cause the records to be destroyed forever. Finally, it is easy to see why paper
based records made it difficult to almost impossible to gather data for research purposes. This is
mainly because it would take countless hours for the paper records to be transformed to a digital
file, manually.
To support the management of patient data and other patient care related information, we will
introduce the Electronic Health Record (EHR) systems which aim to make the management of the
records not only easier but way more effective. These systems exist in every healthcare system
nowadays and integrate information about the management of patient care to support clinical,
administrative and financial requirements.

Electronic Health Record (EHR)


An EHR is a digitally stored healthcare record (or part of it) for the whole life of a patient. The aim
of an EHR is to support the continuity of patient care (quality, access, and efficiency), education and
research.

Goals of Electronic Health Records


The overarching goal of an EHR is to improve the quality of the healthcare services. The main
principles that are applied to achieve these are the described in the section below, which outlines
the merits of an EHR to be successful. By using EHR systems hospitals are looking to improve
patient safety, support the delivery of effective patient care and facilitate a more effective
management of chronic conditions, and finally, improve the efficiency of healthcare systems, towards
a more cost-effective model of care delivery.

65
Table. Benefits coming from the use of EHR systems
Time Timely access to health data
Quick data retrieval for research purposes
Money Better management of health resources
Reimbursement is faster and more efficient
Quality of care Decision making support
Tools for distributed care
Research and Clinical & epidemiologic research, patient education, training of health
Education professionals

Merits of Electronic Health Record systems


It is important to understand that an EHR system is not simply a digital version of the traditional
patient record. For an EHR implementation to become successful there are five principles which
should be taken into consideration during the development of such systems:
Completeness, in order to support efficiently all the healthcare services procedures. An EHR should
not partially support the clinical requirements but should be designed with a system integration
mindset.
Longitudinal representation of patient information, where health information is defined as
information pertaining to the health of an individual or health care provided to an individual.
Healthcare professionals want to see timelines and temporal graphical representations of the
patient health. This makes the understanding of the patient progress easier.
Flexibility of information representation for different clinical scenarios: an EHR shows the patient
information in a different way based on how the information will be used during a specific
intervention or decision point.
Clinical Views: similar to the above, but mainly focusing on the need to have EHR systems to
facilitate diverse data management needs of different healthcare professionals, according to the
priorities of each profession and their different roles.
Decision making tools: EHR systems provide knowledge and decision-support tools that enhance
the quality, safety, and efficiency of patient care. The decision making tools are often rule based or,
in some cases more advanced, using data science methods on historical data.

In-class discussion

Provide examples to show understanding of the EHR properties of ‘flexibility’ and ‘clinical views support’

1. How exactly two different healthcare professionals (for example a physician and a nurse) want to
see the information of the same patient in a different way?
2. How would the output of an EHR system be different between two healthcare procedures for the
same patient during the hospital stay?

66
Electronic
Computerized
Computer-based Medical
Record
Digital Patient
File
Distributed Health care
Folder
Multimedia Health
Automated
Virtual

Other similar terms have been used, with different meanings in terms of their scope and direction. Try to pick one word
from each column to create “new” terms!

Important functions of Electronic Health Records

Electronic Health Record (EHR) systems need to support both non episodic data i.e. patient history,
patient allergies) and episodic data of care which involve repeated measurements and data entry.
Examples of such data include the patient problem list, patient history, physical examination
information, allergies, vital signs, immunizations, medications, physician orders, diagnostic results
and medical images. The majority of healthcare related data and patient functional status are in
coded form.
An EHR facilitates the efficient data entry of all orders and documentation by authorized clinicians
and provides access to tools and displays that can be customized to end user preferences. Ideally,
this documentation includes the clinical reasoning and rationale for each decision, in an easy to
follow way. An EHR enables the automation of the typical clinician’s workflow, by tracking down
the clinical process pathway in an effective way. During its use, all decisions and interventions to a
patient are accountable and therefore EHR systems should support electronic signatures to avoid
non-repudiation. In addition, oftentimes an EHR provides tools to facilitate teamwork and
coordination.
A modern EHR system provides additional support of data collection for non-clinical uses, such as
billing, quality management, reporting and public health disease monitoring. This is possible by
providing user friendly back ends to enable data extraction, as well as data analytics functionality.
Many EHR systems provide easy access to knowledge sources at any point within the clinical
workflow. The healthcare provider can have easy access to clinical knowledge such as guidelines,
and clinical recommendations.
Access to the patient information is provided with the use of a variety of integrated views, specialty
specific forms, diagrams and flagging any patient information which lies outside of normal limits.
Modern EHR systems also provide tools for the management, communication and monitoring of the
completion of a physician order process. For ambulatory (out of hospital) care, with the use EHR
systems, healthcare professionals store data to support regulatory requirements.
There are many other functionalities of newer EHR systems, such as decision support tools to guide

67
and critique medication administration, recommendations tailored to the condition of an individual
patient and real-time patient surveillance and alerts. There are some systems that provide evidence
based, data driven information about the expected patient outcomes based on the patient condition,
treatment plan and care delivery information.
A good EHR system can accept information from external systems and data capture devices (e.g. bar
code scanners). It would also support reporting for the evaluation of healthcare services, the
compliance & process standards. EHR systems connect financial information and other external
data such as patient satisfaction for purposes of analyzing process and practice performance and
supports data modeling for evaluation of potential organizational changes and predictions of
resources allocation.
The diagram below outlines all the important functions of an EHR for the patient care and the
hospital procedures. A modern EHR system would incorporate the majority of these functions.

Levels of application for Electronic Health Records


Various providers: physicians, nurses, the hospital administration, and to some degree, patients
themselves. Healthcare professionals use EHR systems for their clinical practice, for decision
support, to improve the quality of care and provide evidence based services. The hospital
management benefits from the EHR since they are able to more efficiently manage the financial and
other hospital resources.
Various settings: EHR systems are not only used at in-patient facilities, but can also support
ambulatory services, primary care facilities, rehabilitation centers, long term care facilities and
nursing homes.
68
Geographic areas: The area of coverage for an EHR can be as small as the hospital area, but it
can expand to a healthcare system, a small location or a county, a large metropolitan area, a region,
a state, the whole country. Public health authorities collect EHR data from many hospitals which
are merged into common data repositories and utilized in order to calculate health indicators and
support epidemiologic decisions.

Specific uses of Electronic Health Records

Patient care
Tracking down information about provided services (such as medication and treatments) becomes
easy and effortless with the use of an EHR system. Physicians and nurses have available decision
making support tools to patient diagnosis and treatment decisions. Risk factors for patients can be
tracked down, therefore risk assessment for an individual patient can include risk related indicators
such as the risk for an in-hospital acquired condition, readmission risk and other. EHR systems are
often used to facilitate providing high quality healthcare in line with clinical guidelines, since the
guidelines can now be easily incorporated into the EHR. Setting up guidelines for prevention is also
important. Patient satisfaction can be tracked down with all patient experiences survey responses
being stored into the EHR system.
With the use of EHR systems, the management and development of clinical care plans is easier,
since health professionals have available predefined options and care plan templates. Since nurses
spend most of their shift in clinical departments and are in close proximity to their patients and
their personalized needs, the EHR systems should support all aspects of the nursing care, with the
use of evidence based nursing assessment tools, to make the entry of vital signs an easy process, and
to facilitate the management of other clinical measurements and nursing observations. EHR
systems specifically help nurses create customized nursing plans.

Management of care and quality assessment


Case-mix cohorts and examples of optimal practices can be identified, while the available tools can
facilitate the analysis of disease severity and calculate the risk for unwanted outcomes when the
EHR incorporates risk assessment algorithms.
Electronic Health Records provide a solid basis for system usability studies which can help future
versions integrate characteristics and functions tailored to the specific needs of healthcare
professionals. Another very significant functionality is the automated secondary analysis of the
available data for quality assessment and quality assurance. The data are transformed and
aggregated to provide indicators, which when compared to a baseline, can be proven to be invaluable
to the hospital administration to assess the quality of hospital services and plan for improvements.
In addition, the hospital administration utilizes EHR tools for resource management and workload
assessment and planning.

69
In-class discussion

Explain how the use of EHR systems help achieve the following three milestones:
Patients will be admitted to hospital when this is required
The unneeded laboratory tests and radiology examinations will be reduced significantly
The hospital length of stay will be reduced

Secondary uses of Electronic Health Records


Other uses of EHR systems include research and education functionalities that we discussed earlier
in this chapter, and other uses such as planning for public health policies and incorporation of tools
for the management of the finances for healthcare providers

Electronic Health Record systems provide significant benefits not only to those who are directly involved in
healthcare, but to other professionals, such as pharmacists. With the use of EHR systems, medical
prescriptions (Rx) can be based on a specific predefined plan and the right Rx is prepared firsthand, with
no delays or comebacks. The on site assessment of possible drug interactions and the reduced adverse drug
effects is possible with implications to patient safety. In terms of the prescription patterns, since everything
is documented electronically, Rx extensions to EHR systems allow for an improved drug utilization review.

Electronic Health Records provide flexible representations


As we discussed earlier, one of the most important attributes of an Electronic Health Record systems
is the flexibility of the information representation. The patient information may be presented via a
time oriented, a source oriented or a problem oriented view, while many other approaches and
models can be found in the literature. To understand the three aforementioned different views, we
will transform the patient information in the box below, first to a time, then to a source and finally
to problem oriented representation. You can try to study the patient scenario carefully and take
notes in order to simulate these three different EHR representations on your own.

A patient has been admitted of February 21st 2013 to the hospital with shortness of breath, cough, fever and
very dark feces. The blood pressure was measured 150/90 mmHg, pulses were 95/min, temperature=102.7F.
The blood test measured and ESR of 25 mm/hr., Hb. equal to 7.8, and positive occult blood feces. The patient
was transferred for an x-ray to the radiology department. The exam showed not atelectasis and slight sign of
cardiac decompensation. The diagnosis made was acute bronchitis and the patient was prescribed with
Amoxicillin 500 mg, twice a day. A week later the patient was clear of cough, with only slight shortness of breath
and normal feces. The vitals were found to be 160/95 mmHg for the blood pressure and 82 pulses/min, and the
physical examination only showed slight rhonchi. Based on this assessment the patient was prescribed with
aspirin at 32 mg per day. The blood test showed an Hb of 8.2 grams per deciliter and occult blood feces.

Time oriented
The patient information is presented in a temporal order. For each date/time there is a list of all the
clinical interventions and decisions that have been completed/need to be made. The information of
our clinical scenario will therefore be transformed as follows:

70
Feb 21, 2013
Shortness of breath, cough, and fever. Very dark feces
Exam: RR 150/90, pulse 95/min, Temp: 102.7 F, Rhonchi, ESR 25 mm, Hb 7.8, occult blood feces +
Chest X-ray: no atelectasis, slight sign of cardiac decompensating
Medication: Amoxicillin caps 500 mg twice daily

Feb 28, 2013


No more cough, slight shortness of breath, normal feces
Exam: slight rhonchi, RR 160/95, pulse 82/min
Keep Aspirin at 32 mg per day. Hb 8.2, occult blood feces

Source oriented
Now the patient information will be transformed to a source oriented representation. The
information will be presented on the basis of the department or location of the activity.
Clinical Department
Feb 21, 2013
Shortness of breath, cough, and fever. Very dark feces
Exam: RR 150/90, pulse 95/min, Temp: 120.7 F. Rhonchi, abdomen not tender
Feb 28, 2013
No more cough, slight shortness of breath, normal feces
Exam: slight rhonchi, RR 160/95, pulse 82/min
Medication: keep Aspirin at 32 mg /day

Laboratory
Feb 21, 2013
ESR 25 mm, Hb 7.8, occult blood feces +
Feb 28, 2013
Hb 8.2, occult blood feces.
Radiology Department
Feb 21, 2013
Chest X-ray: no atelectasis, slight sign of cardiac decompensation

Problem oriented (SOAP model)


SOAP is a method of documentation employed by health care providers and often incorporated into
Electronic Health Record systems. The four components of a SOAP note are Subjective, Objective,
Assessment, and Plan and are described below.
Subjective: the problem as experienced by the patient; it is based on the patient description
Objective: this information is the translation of the problem that the patient experiences to a clinical
judgement, based on findings by health professionals
Assessment: includes the clinical diagnosis and all the procedures that were arranged or completed
for the clinical diagnosis to be concluded (such as laboratory examinations)

71
Plan: includes the treatment plan and the medication information
We will now transform our patient scenario to a SOAP note.

Problem 1: Acute bronchitis


Feb 21, 2013
S: Shortness of breath, cough, and fever.
O: Pulse 95/min, T: 102.7F, Rhonchi, ESR 25 mm, x-ray: no atelectasis, sign of cardiac decompensation
A: Acute bronchitis
P: Amoxicillin caps, 500 mg twice daily
Feb 28, 2013
S: No more cough, slight shortness of breath
O: Pulse 82/min. Slight rhonchi
A: Sign of bronchitis minimal
Problem 2: Shortness of breath
Feb 21, 2013
S: Shortness of breath
O: Rhonchi, RR 150/90, Chest X-ray: no atelectasis, slight sign of cardiac decompensation
Α: Minor sign of decompensation

Ethical Issues and Electronic Health Records


Ethical issues related to Electronic Health Records confront health personnel. EHR systems create
conflict among several ethical principles. They may represent beneficence because they are alleged
to increase access to health care, improve the quality of care and decrease costs. Autonomy is
jeopardized when patient data are shared or linked without the patients' knowledge. Fidelity is
breached the exposure of thousands of patients' health data through mistakes or theft while lack of
confidence in the security of health data may induce patients to conceal sensitive information.
Justice is breached when persons, because of their socioeconomic class or age, do not have equal
access to health information and public health services. Health personnel, leaders, and policy
makers should discuss the ethical implications of EHRs to avoid conflicts among ethical principles.

Questions for Discussion


1. Can you refer to those attributes of an Electronic Health Record, which make it an important
asset towards high quality and safe care?
2. Explain the advantages of the use of Electronic Health Records over the traditional paper records.
3. What are the merits that characterize a successful EHR implementation?
4. Electronic Health Records advance research. Explain how this function becomes a possibility.
5. Can you describe the time-related benefits for a healthcare organization which recently put into
practice a modern Electronic Health Record system?
72
Chapter 7. Standards in healthcare
The need for standards in healthcare
Health professionals need to use a common language to describe various elements of healthcare
services. In addition, they want to have at their disposal, evidence based universal methods to follow
to rest assured that their practice is error free and safe for the patient. For this reasons we need to
use standards in healthcare. The development of standards is feasible for all these procedures that
can be well defined, and for all those interactions that can be described in a finite space. For example,
the number of clinical procedures that a health professional can undergo are specific, and therefore
it is feasible to create a comprehensive list of all the available procedures, each one assigned with a
unique code. Most of the entities in healthcare contain a finite number of data and therefore most of
the data categories that we discussed in the previous chapters have already been efficiently codified
under formal classification systems.
As an example, medical diagnoses are well-defined and finite (to our knowledge, of course, up until
this point of time) and they have actually been classified successfully. As the medical science
progresses and new diseases, symptoms, interventions and practices are put into practice, standards
need to be updated to integrate the new knowledge. With the use of the International Classification
of Diseases (ICD), each disease has been assigned to its own code, in a hierarchical structure.
Sometimes the terms medical classification and medical coding are used to describe the same thing.
Our second example comes from the nursing practice; The International Classification for Nursing
Practice (ICNP) uses specific codes for each one of different levels of the nursing care to describe
nursing interventions; this way, the nursing practice has been standardized. Other nursing
classifications that are widely used are the NANDA International (NANDA-I), the Nursing
Interventions Classification (NIC) and the Nursing Outcomes Classification (NOC). These are
comprehensive, research-based, standardized classifications of nursing diagnoses, nursing
interventions and nursing-sensitive patient outcomes.

The importance of standards in healthcare


Medical classification (or medical coding), is the process of transforming descriptions of
medical diagnoses & procedures to universal codes. Diagnosis codes are used to track diseases and
other health conditions, whether they are chronic diseases such as diabetes mellitus and heart
disease, to contagious diseases such the flu, and athlete's foot. These diagnosis and procedure codes
are used by all those directly or indirectly involved with care (i.e. regional health programs,
insurance companies etc.).
Medical classification is widely used in hospitals, primarily to support standardized and evidence
based care during the everyday practice. Since the reason that the healthcare systems have moved
from the traditional records to Electronic Health Record systems is not simply record keeping but

73
the flexible and evidence based support of the clinical care and to facilitate disease making, research,
communication and data sharing, management and statistical surveys, healthcare systems need to
use common classifications, terminologies and codifications.
Using common classification systems across the healthcare systems an organization can produce
analytics and performance statistics that can be compared to a baseline performance. Standards can
therefore provide the basis for quality improvement in healthcare services. Some other indirect uses
of classification systems include the fact that the statistical analysis of diseases and therapies
becomes possible, and the healthcare data can now be utilizes in knowledge-based and decision
support systems, as well as for the direct surveillance of epidemic or pandemic outbreaks.

Some well-known medical classifications


o Diagnostic and Statistical Manual of Mental Disorders (DSM)
o International Classification of Headache Disorders second Edition (ICHD-II)
o International Classification of Sleep Disorders (ICSD)
o Systematized Nomenclature of Medicine - Clinical Terms (SNoMed-CT)
o Australian Classification of Health Interventions (ACHI)
o Health Care Procedure Coding System (HCPCS)
o ICD-10 Procedure Coding System (ICD-10-PCS)
o Classification of Pharmaco-Therapeutic Referrals (CPR)
o Logical Observation Identifiers Names & Codes (LOINC)
o Medical Subject Headings (MeSH)
o Unified Medical Language System (UMLS)

Classification, terminology, codification


There are three very common terms and each one describes different aspects of the standardized
care; these three terms are the ‘classification’, ‘terminology’ and ‘codification’. Classification systems
describe different concepts and their relationships. They provide lists of items included for a specific
healthcare entity or a specific objective in healthcare. Placing objects in groups (or classes) based on
their relationships, we achieve classification. Classification is based on a priori knowledge of the
field of knowledge (i.e. diagnoses, medical procedures etc.) and is the key for new knowledge.
Terminologies describe the appropriate medical terms that are being used to describe a specific
concept. The term thesaurus is often used, and describes the list of medical terms and their
synonyms. Finally the term codification refers to assigning a unique code to each unique concept,
usually of a classification system. Codifications are useful since they provide a shortened
identification of a concept which would require a lot of effort or long text to be described, otherwise.
A codification, is therefore the process of replacing a concept with a combination of numbers and/or
letters. Codification systems are usually numeric, mnemonic of juxtaposition based.

Classifications can be single or multi-level


A single level classification requires only one code/descriptor to describe a concept. The most common
74
example of a single level classification is the ICD classification system. For the 10th version of this
system to describe one disease, for example pneumonia, there is only the requirement to provide one
code which is for this example J18.9. A multi-level classification system, on the other hand, requires
for each concept, many different descriptors and, in some cases their relationships. The systematized
Nomenclature of Human and Veterinary Medicine (SNoMed) is such a system. In SNoMed,
Tuberculosis (D-14800) can be classified as: Lung (T-28000) + Granuloma (M-44000) + Tuberculosis
Mycobacterium (L-21801) + Fever (F-03003). The table below show the different levels that need to
be described (some or all of these) for any new SNOMED entry.
Table. The eleven levels of the SNoMed classification
Level Dimension Level Dimension
1 Topography (T) 7 Living organisms (L)
2 Morphology (M) 8 Chemicals, Drugs and Biological Products (C)
3 Function (F) 9 Physical agents, forces and activities (A)
4 Diseases/Diagnosis (D) 10 Social Context (S)
5 Procedures (P) 11 General Linkage-Modifiers (G)
6 Occupations (J)

Merits of a good classification system


A classification system should, first of all, be suitable for its scope and useful for healthcare
professionals. The hospital administration and the IT people in the hospital need to have a plan on
how the system would become part of the hospital policy; this can be possible with the dissemination
of specific clinical guidelines which will guide health professionals on how to use the classification
system successfully during their everyday practice.
Classification systems should also offer a complete coverage of the field of interest. In other words,
there should be no missing concepts that can possibly be used and are not included in the system.
At the same time, they should be characterized by a desirable detail level (a good balance between
completeness and never-to-be-used concepts). Next, they should contain no overlapping classes. This
means that if the classification system is category based, the placement of concepts under categories
is straightforward and non-ambiguous. The categories should be homogenous and only one principle
should be included by level. The different categories are well defined and the classification systems
should include clear criteria for the class limits.

Metathesauri
A metathesaurus makes available biomedical concepts and concept names, from many different
incorporated controlled vocabularies and classification systems. A famous metathesaurus in
healthcare is the Unified Medical Language System (UMLS). The UMLS, is a set of files and
software that brings together many health and biomedical vocabularies and standards to enable
interoperability between computer systems. The UMLS enables the intelligent retrieval of
biomedical information from various sources. The Metathesaurus is one of the three UMLS

75
components: the Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon. The
National Library of Medicine (NLM) updates the UMLS twice a year in May and November.

Meta, the UMLS metathesaurus and some of the medical vocabularies and classifications included

Standards do not solely refer to the standardization of healthcare information. There are numerous
standards addressing areas such as communications in healthcare, the characteristics of systems
design (for example Electronic Health Records), information exchange, interoperability, and other.
There are standards which refer to the continuity of care and the management of clinical
transactions. Chapter 9 describes one of the most well-known healthcare interoperability standards,
the Health Level 7 (HL7).
The table below provides examples for some of the more popular standards that are being used in
the United States.

Examples of healthcare standards which are popular in US healthcare systems

ASTM: Continuity of Care Record, is a patient health summary standard. A record can be created, read and
interpreted by various EHR systems, allowing easy interoperability between otherwise disparate entities.
ANSI X12 (EDI): Transaction protocols used for transmitting any aspect of patient data. Has become popular
in the United States for transmitting billing information, because several of the transactions became required
by the Health Insurance Portability and Accountability Act (HIPAA) for transmitting data to Medicare.
CEN - CONTSYS (EN 13940), a system of concepts to support continuity of care.
CEN - HISA (EN 12967): a services standard for inter-system communication in a clinical environment.
DICOM: a heavily used standard for representing and communicating radiology images and reporting
HL7: HL7 messages are used for interchange between hospital and physician record systems and between
EMR systems and practice management systems; HL7 Clinical Document Architecture (CDA) documents are
used to communicate documents such as physician notes and other material.
ISO: ISO TC215 defined the EHR, and the technical specification of the EHR requirements architecture.
OpenEHR: next generation public specifications and implementations for EHR systems and communication,
based on a complete separation of software and clinical models.

76
Questions for Discussion
1. Explain why the ICD-10 is a single-level classification system while SNoMed is a multi-level
classification, by providing examples of entries for each one of these two systems.
2. How will standards in healthcare help transform the healthcare system to a more cost-efficient
model of practice?
3. Discuss how classification systems can act as enablers for hospitals to provide evidence based
services to patients.
4. Explain why it would not be possible to conduct large scale secondary data analysis research
without the use of classification systems.
5. Which are the main characteristics of a good classification system?

77
Chapter 8. Databases in Healthcare
Databases and Types of Data structure
Databases are collections of data with a specific well-defined structure and purpose. In hospitals
databases are the “spinal cord” of hospital information system. Databases in healthcare are the
collection of health data. Programs to develop & manipulate these data are called Database
Management Systems (DBMS).
Which of these data collections are databases?
An excel file with names and medication of patients within a hospital
A nurse’s agenda with to do’s
A schedule of the shifts for next week
A list of the medicines available
The medical record of a patient

There are many different types of data models for databases organization: Flat Data, Hierarchical
Data, Relational Data, Object-oriented data and more recently, NoSQL databases. The most
commonly used database model is the relational model, which can be found in the vast majority of
the health care databases. Generally, all data submitted into an Electronic Medical Record are most
of the times based on relational, object-oriented or, in fewer, more recent cases, on NoSQL databases.

What is a flat file?


A flat file can be a plain text file, usually containing one record per line. Flat files use no
sophisticated methods to store data. You have been using flat files: an excel document with a list of
your movie collection (e.g. the movie title in column ‘a’, and movie genre in column ‘b’) or a comma
separated text document with the responses of students in a study about smoking, where each line
indicates a new student response. There are many disadvantages of using flat files.
The majority of the existing everyday use computer software allows easy access to flat data files.
Flat files might be proven sufficient enough for simple data, but they waste computer storage by
requiring the machine to keep information on items non-logically available. Flat databases are also,
not “advanced-query friendly”: this means that when the user needs to retrieve a specific sub-set of
the data, with many criteria, a flat file will not make it as easy.
Here is an example which will help you understand the limitations of flat data files. Consider a
scenario where a patient can be diagnosed with up to five different ICD-10 diagnosis codes. You
want to keep track of the patient id, patient name, date of birth, address, gender and the diagnosis
codes. In this scenario, your health care organization uses no database and you are required to keep
track of your patient data using flat documents. You plan to use the Patient_id information to be

78
able to locate patients and their information. Your empty “shell” which you will use to start adding
patient data, would look like this:

Pat_id Pat_name Pat_DOB Pat_Address Pat_Gender ICD_n1 ICD_n2 ICD_n3 ICD_n4 ICD_n5
… ... ... … … … … … … …

At first glance, the above shell seems to be a functional approach: we created the attributes and we
are ready to start adding lines with the information for the hospital patients. Actually, the above
approach has some very serious limitations. The six questions below will help you identify and
understand each one of these limitations.
1. What would happen if a patient has a sixth icd-10 diagnosis?
2. What would happen if a patient is readmitted to the hospital for a second time?
3. What would happen if a patient with two different admissions changes address?
4. How many empty cells will you spare for patient with less than 5 diagnoses?
5. What would you do if the hospital decides not to keep track of the patient address anymore?
6. What would you do to delete an icd-10 code which was not supposed to be used, for any patient?

Continue to discuss in class about some of the disadvantages of the use of flat data files. Database books
always mention that using flat files to organize complicated data, can cause problems with data insertion,
data deletion and data update. Can you discuss some of these problems in a data collection scenario involving
new patients and health care professionals assigned to one or more of these patients?

A brief history lesson


An example of a well-known historical legacy database model is the hierarchical model. The
database systems that were built around this data model, were very popular from the 60s to the 80s.
The hierarchical model used to work pretty reasonably well in cases when the mini world follows a
top down, one way inheritance approach. An example of a hierarchical structure is that of the folders-
subfolders on our computers.
Advantage: Actions on “parents” save time since they affect all “children”
Disadvantage: In the real healthcare world most relationships are not hierarchical.

Does this structure actually effectively describe the health process realistically?

79
Conceptual design of healthcare using Entity-Relational Diagrams (ERD)
Before the actual development of a database, we need to first design a conceptual schema which
presents a high level model of how our healthcare mini-world functions. Designing an ERD is easy
and does not require any prior knowledge of computer science skills. These conceptual schemas,
though, are extremely important since they act as communication means between the healthcare
organization and the database developer. The ERD can then easily be transformed into a computer
database, when a transformation algorithm is applied.
An ER diagram has three components: Entities, Attributes and Relationships.
Entities: Specific objects or things in the mini-world that are represented in the database. For
example, the PHYSICIAN Dr. Willy, the Surgical DEPARTMENT etc.
Attributes: Properties used to describe an entity. For example, a PHYSICIAN entity may have the
attributes Name, SSN, Clinical Specialty, Sex, and Birthdate.
An attribute can be simple (sometimes called atomic) when each entity has a single value for the
attribute. For example, the SSN or Sex of an employee. In few cases an attribute may be composed
of several components. For example: Address (Apt#, House#, Street, City, State, ZipCode, and
Country).
Each entity may have one or more key attributes. A key attribute is a unique attribute, which cannot
accept the same value more than once. The SSN is a typical example.
Relationship: a relationship relates two or more distinct entities, with a specific meaning. For
example, the PHYSICIAN Dr. Willy works at the Surgical DEPARTMENT. For each relationship
we need to define two things
A. The relationship cardinality which can be:
• One-to-one (1:1)
• One-to-many (1:N) or Many-to-one (N:1)
• Many-to-many (M:N)
B. The relationship participation which can be:
• zero (optional participation)
• one or more (mandatory participation)

For the relationship PHYSICIANS work at DEPARTMENTS, the cardinality is Many-to-one, since
one PHYSICIAN can work at a maximum at one department, while a DEPARTMENT can have at
a maximum many PHYSICIANS.

For this same relationship, the participation is mandatory from both sides of the relationship, since
every single PHYSICIAN has to work at a DEPARTMENT and every single DEPARTMENT has to
have PHYSICIANS.

80
The table below show the most important components of an ER diagram

In ER diagrams, the entity type name is


displayed in a rectangular box.
Attributes are displayed in ovals. Each
attribute is connected to its entity type
with a line. Each key attribute is
underlined.
A diamond-shaped box is used to display
a relationship. Two or more entities can
be to connected to one relationship via
straight lines.
You cannot connect two entities directly.
They should be connected via a
relationship. Cardinality is specified by
labeling 1, M, or N to the relationship
lines.

The Relational Database Model


The relational database models are based on the fundamental mathematical set and relation
theories. The major elements of the relational database model are summarized below:
The database is a collection of tables, which represent entities and relationships between them
Columns represent the characteristics of the entity
Each table includes records, which are the table rows
In relational databases, there exist one-to-one, one-to-many or many-to-many relationships
It is possible to access a table record by using keys. This key, known as ‘primary key’ is an
attribute (or a combination of attributes) which is never repeated on a table
A relationship is a reference of a record to another record of another (or the same) table. To
achieve this the referencing table has to include an additional attribute which will facilitate this
link. This extra attribute is called foreign key.
An index the physical mechanism which improves the database efficiency. This is part of the
physical database structure
In a relational database, a ‘View’ is a virtual table composed by a sub-set of the actual tables.

81
An example of a relational database. In this schema
there are three different entities: Patients,
1
Admissions and Diagnoses. There are four tables
though. In relational databases, database designers
are often required to add extra tables to express
specific relationship types. The most common
M scenario is that of the many to many relationship
N
type, where the extra (intermediate) table (see red
N arrow) stores the combination of the unique
identifiers of the two related tables. This is the only
way to match a specific admission id with a specific
diagnosis id, since both the diagnoses and the
admissions values can be repeated in that
intermediate table!

In the above schema, can you identify which are the referencing and which are the referenced tables for
each one of the existing relationships?

Benefits of using of Relational Schema


1. Databases can be examined by many different perspectives: it is easy to generate different views
to facilitate many health care delivery requirements
2. There is no need to enter missing information for variables that are not logically possible
3. Easy to modify because adding new entities involves adding new tables and not altering old ones

Normalization in Relational Databases


Normalization is the process where bigger tables are split into a smaller tables with more desirable
characteristics. With normalization, we succeed to minimize anomalies during data entry, update
and deletion. ‘Normalized forms’ provide the methodological framework to analyze the database
schema based on the database keys and the so called, functional dependencies. The goal should be
to have a database where in each table, the attributes are fully and directly (non-transitively)
dependent to the table identifier (primary key). This statement may seem difficult to understand,
but if you pay attention to the scenario below, it will make sense. You can take a pencil and paper
to follow the description below by drawing tables, attributes and example data.
Let’s start: Imagine a hospital database table with the name “DIAGNOSES”. The primary key of
this table is the combination of two attributes: Patient_ID and Disease_ID. The primary key is a
combination, because neither the Patient_ID, nor the Disease_ID are unique, individually. This
composite primary key, if you think about it, expresses a diagnosis, or, in other words, “a patient
with a disease”. We are going to call the other attributes of this table non-key attributes. These non-
key attributes should be fully dependent to the primary key. What does ‘fully dependent” mean? It
means that a value for any non-key attribute in that table should only be revealed by the
82
combination of the primary key components. If we include, in this table the attribute Patient_Name,
this would not be the case, though, since the Patient_Name is partially dependent (to the Patient_ID
only) and not fully dependent to the composite primary key. Therefore, all the non-key the attributes
in this table should neither refer to the patient solely, nor to the disease, but to the process of the
diagnosis. Examples of such attributes can be the Diagnosis_Date or the Doctor_Diagnosing.
When we achieve this, we also want to verify a second thing: between the non-key attributes there
are no inter-functional dependencies. In other words, there should not exist any non-key attributes
which further describe details of an already existing non-key attribute. For this reason, we do not
want to add a Doctor_Speciality attribute to this table, when the Doctor_Diagnosing attribute is
already on the table. Although the Doctor_Speciality attribute would be fully dependent on the
primary key, this would not happen directly, but indirectly through the Doctor_Diagnosing
attribute. It is not uncommon that the normalization often helps reveal new entities that we did not
initially consider as separate entities in the database schema. A more detailed reference to the
normalization principles is beyond the scope of this book. The box below, lists down some easy to
remember principles on how a sufficiently normalized database should look like. Try to revisit the
flat files examples, earlier on this chapter and discuss if those files satisfy the principles below.

1. Every characteristic only exists once in a database.


2. Keys fully define the records and characteristic belongs to the entity it characterizes. We cannot
mix and match! A table with the field patient_id as a primary key, cannot include information
about the doctor’s specialty or the department capacity!
3. Each value of the same characteristic is stored into the database only once. This is mainly
achieved by creating new relations and references.
Normalization-three easy to remember rules

Discovering Knowledge in Medical data using Structured Querying Language (SQL)


Querying a database to return data of our interest is an exciting, easy to learn but not so easy to
master, process. You need to define the attributes you are interested in and the tables they belong
to. You also need to define your conditions, which need to be applied to the data before the output is
processed. It is always required to explicitly define the path that connects the tables that you query,
that is the references between the linked tables.
The structure of an SQL query is as follows:
SELECT [list of attributes]
FROM [list of tables that need to be used for the result to be calculated and definition of joints]
WHERE [conditions to be applied to filter out non qualifying tuples]

83
If we want the query to return the admission
dates of patients born after 1970, we need to
1 use the Patients and Admissions table. Our
condition will be Pat_Birth > 1/1/1970.
The query would look like this:
SELECT Pat_Name, Admission_Date FROM
M N Patients INNER JOIN Admissions ON
Patients.Pat_ID = Admissions.Pat_ID
N
WHERE Pat_Birth > 1/1/1970;

Database categories found in healthcare

Distributed Databases in healthcare


Data are kept in different settings and different computers. Since data produced are huge, the
replication and distribution of databases improves database performance at the healthcare settings.
Distributed databases need to address the location of the data and the audit log, which is a
chronological record of the destination and source that provide documentary evidence of the
sequence of activities that have affected at any time a specific procedure.
The main advantage over traditional database models, is that a possible data loss is only limited to
nodes affected and this is critical for healthcare. Since these data are decentralized, they are more
flexible and allow different units to update and maintain their own data.

Large Healthcare Utilization Databases


These databases are used to study the use and outcome of treatments. Since they are representing
the clinical routine care, they can address real world effectiveness and utilization patterns. Today,
with the advancement of advanced data analytics methods, these databases can be an asset to
healthcare organizations, since they can be used to develop clinical and administrative models to
predict the patient outcomes, optimal therapies and identify possible diagnoses. Their huge size
(often including millions of records) allows the study of rare events.

BLOBS-Binary Large Object Files: Very frequent in healthcare settings: Images (ct, mri), Audio
(heartbeat seq.), Video (ultrasounds…).

Data-less databases are distributed databases which have been set-up without any data, until
such a need arises. They may be useful in healthcare.
Less expensive than centralized registries (it requires no equipment and little personnel)
The use of the system does not require vague and time-independent patient consents
The system does not require duplication of data in different databases
84
Object Oriented Data Models
Use of real-life objects (entities) for a more efficient data organization. They use SQL but also provide
to the user much higher programming flexibility since there is the possibility to integrate the
database with object oriented programming languages (i.e. java, C# etc.). Object oriented data
models have not yet been fully standardized.

Example of object oriented model

Hands on Practice: try, with the help of your professor to complete the tasks, in class
There is a small clinic where patients can be admitted multiple times. Patients, when hospitalized
are given medication during their stay. In addition, the hospital has various departments (i.e.
surgical, medical, orthopedic etc.). In each department there are a number of nurses who work there.
All those nurses make blood pressure measurements to the patients. Each measurement can be
performed by different nurses each time. There is a doctor in charge assigned to each patient upon
hospitalization. One or more diagnoses are assigned to each patient during every hospitalization.

Task 1. Define the entities for the process and, for each entity, the required attributes
For the patient, we only need to capture the patient name, gender and date of birth
For each hospitalization, we need to know the admission, discharge date and discharge status
We only need to capture the diastolic blood pressure and the exact time of each measurement
The nurses and doctors have clinical specializations
Each department has a title and a bed capacity
We need to know the diagnosis code the diagnosis description, and the diagnosis date

The description implies the existence of patients (Entity 1) who can be admitted multiple times. Therefore,
admissions (Entity 2) have to exist in the database as a separate entity, too. Not all patient are admitted.
There are departments (Entity 3) and nurses (Entity 4) working in departments. Nurses make blood
pressure measurements (Entity 5) to patients. Medical doctors (Entity 6) are assigned to patients.

85
Task 2. Design an ER Diagram to conceptualize the clinical mini-world requirements. Use the
appropriate notations that we explained earlier in this chapter.

Task 3. Design the database schema using arrows to define the appropriate references

Task 4. Prepare SQL code to define the above mentioned database schema. Each relation
should have the appropriate fields. Think of the appropriate fields based on the information provided
in Task 1. Try not to over-do it with many attributes. We are only building a sample database, so
just include those attributes that have to be there.

Task 5. Populate with data


Once your schema is ready, populate the database with sample data. You can add: three patients,
two hospitalizations per patient, two doctors, two nurses, two hospital departments, two blood
pressure measurements per patient, and two diagnoses for each hospitalization.

Task 6. Query your database to retrieve knowledge


Based on the above database go on and create appropriate SQL queries for the following
a. Show all patients of the hospital
b. Show all patients whose name starts with a letter of your choice.
c. Show the doctor names with the department names they work into
d. Show the blood pressure of each patient
e. Show for each of the diagnoses, the ratio of in-hospital mortality (deaths)

Task 7. Create views


Create three different views which will accommodate the data needs of three user categories:
a. Nurses who want to know their patient demographics and their blood pressure.
b. Hospital administrators who need to know the department capacity and bed availability
c. Admissions department who needs to know the length of stay and patient discharge status.

86
Chapter 9. Interoperability in Healthcare

Interoperability is the ability of diverse systems and organizations to work together (inter-operate).
The term is often used in a technical systems engineering sense, or alternatively in a broad sense,
taking into account sociopolitical, and organizational factors that impact performance between two
systems.
The IEEE Glossary defines interoperability as “the ability of two or more systems or components
to exchange information and to use the information that has been exchanged”
Interoperability is achieved when we are “able to accomplish end-user applications using different
types of computer systems, operating systems, and application software, interconnected by different
types of local and wide area networks” (O'Brien et al.).
The diagram below presents some of the recent forces that make the need for reforms to achieve an
interoperable healthcare a priority. Population is ageing, more citizens suffer from chronic
conditions and therefore they need to access the healthcare system on a frequent basis. The need for
the continuity of care in inevitable and the healthcare system should be connected to support the
health needs of these populations. At the same time, people today know more about healthcare
symptoms, healthcare resources and have access to online information portals. There are more
services for lifestyle management and rehabilitation available and an increasingly growing demand
for such services, which need to be interconnected with the electronic patient record.

Discuss in class how


the increased life
expectancy today is a
force for interoperable
healthcare.

Then, discuss how an


interconnected
healthcare system
will reduce the cost of
care in response to
the financial
constraints of health
services today.

87
Dimensions of Interoperability

Interoperability refers to three different levels:


Business Interoperability: Describes the organizational context including policies, agreed
organization communication practices, and bylaws. It is independent of existing technologies.
Technical Interoperability: Refers to an environment where despite the heterogeneity across
technical implementations, still the systems can harmonically operate together. Hardware, software
coexists nicely together, without incompatibilities, while at the same time the system support the
seamless “plug-and-play” of new devices.
Information Interoperability: Refers to the ability to interchange data in a meaningful manner
by establishing common semantics (meaning). Systems might be technically compatible, but this is
not enough: they need to share and understand messages coded with different ways without trouble.

The importance of standards in an interoperable health IT environment


It is crucial to know how applications interact with users (i.e. e-prescribing), how systems
communicate (such as messaging standards), how information is managed and processed (i.e. health
information exchange) and how add-on devices work together with other systems and applications
(i.e. handhelds, tablets). For this reason there are specifications that will require systems, devices
and software in healthcare to comply to. These specifications are called standards. Without
standards, it will be impossible to achieve interoperable healthcare services.
Syntax vs. semantics: Syntax describes the structure of a message while semantics is the meaning.
Syntax alone is not always a reliable determinant of semantics. Take a look at the table below to
understand the difference between meaning and syntax (structure).

Same structure-different meaning


“The patient was given pain medication”
“The patient was driven chocolate education”
Their semantics or meanings are different; the first makes sense, the second one is nonsense.
Same meaning-different structure
“The patient was given pain medication”
“The patient was given medication for pain.”
These have the same meaning but use a different syntax
Example of statements with same semantics-different syntax and same syntax-different semantics

Understanding semantic interoperability


The term semantic interoperability in hospitals, does not just refer to the packaging of data (syntax),
but mainly focuses on simultaneous transmission of their meaning (semantics). This is succeeded by
adding metadata (data about the data) and linking each data element to a shared vocabulary. The
meaning of the data is transmitted with the data itself, in an "information package" independent of
88
any information system. This shared vocabulary, and its associated links to an ontology, which
provides the basis for machine interpretation and understanding of the logic of the message.
Syntactic interoperability is a prerequisite for semantic interoperability and refers to the packaging
and transmission mechanisms for data. To understand this concept, imagine having a code (for
example 486) which comes from the International Classification of Diseases (ICD) version 9. This
code has no clinical meaning at all for a system which uses a newer version of ICD, unless you send
together alongside with the code a supplementary index indicating that ‘486’ has been coded using
ICD-9. This information will make it possible to look-up the code at a supplementary file (in our case
an ICD-9 dictionary) which explains what the code ‘486’ means. The system which receives the
information will now know that information refers to the disease pneumonia.
I will ask you what the meaning of the word ‘gift’ is. Most of you will give me the English definition
because you assume that my reference is English. But what is not? Did you know that in German
‘gift’ means poison?

Interoperability needs to include semantics (meaning)


Many healthcare applications are not semantically interoperable with each other, which is another
major barrier toward reaching interoperability in healthcare and across providers. For example, one
system might call a disease as “diabetes mellitus” in its Electronic Health Record, while another one
might call it “diabetes”.

What is required for interoperability? A business oriented approach


There must exist a market demand for interoperable products
Standards and rules, defining what interoperability means in the health organization
should be set.
Business conditions and market drivers must urge health software companies to make
interoperable products.
Guidelines to make the often-complicated standards usable and easy to interpret
Interoperability must be actively promoted by the management, the hospital administration
and the IT professionals
Compliance with standards must be verified by independent testing (by third parties)
Interoperability requirements in a healthcare enterprise

The “Gender” example


Something “so obvious” is not really! Information sharing within a hospital is not easy. Especially information
sharing across different hospitals is really hard and it is not easy to manage seemingly “valid” data.

“Gender = 1” “Gender = nil”? “Sex = 2”? “Sex = m”? “Sex = female”?

Controlled Terminologies: they express and define concepts and allow for the navigation of
concept-to-concept relationships. They provide a basis for determination whether two items are the
same or different. Controlled terminologies provide a basis for knowledge representation, capture,
discovery, and management knowledge use.
89
Interoperability is not yet resolved in healthcare
Information interoperability is a key ingredient for modern health information technology. It is
therefore essential for healthcare information systems to communicate critical data. Another
dimension of the importance of interoperable environments, is that they allow vast amounts of data
to be gathered for research and trends analyses. The absence of a robust set of standards to resolve
data incompatibility issues is becoming increasingly costly to the U.S. healthcare system. Savings
of ~$78 billion could be achieved every year if data exchange standards were used across the
healthcare sector in the United States. Identifying problems related to healthcare information
systems standardization and interoperability and potential issues for future research that can
address the given problems. One component for interoperability is the availability of data standards.
Problems with current standards include:
Gaps: missing information which cannot be utilized, because standards cannot accept it
Redundancies: the same information is repeated twice into a standard, with different ways
Data exchange requirements: the same standards need to be used during data exchange and
this is not always the case since different systems might be using different standards

Electronic Health Record Interoperability


From our discussion so far, you can see how significant it is for different systems to exchange patient
data seamlessly. A complex health care system requires diverse electronic health record (EHR)
products. To realize their potential, EHR products must share information seamlessly and an
interoperable health information and technology environment makes this possible. EHR
Interoperability enables better workflows and reduced ambiguity, and allows data transfer among
EHR systems and health care stakeholders. Can you discuss, the strong statement in the box below?

With interoperable Electronic Health Records we will be able to improve health care delivery by
making the right data available at the right time to the right people.

Challenges for Interoperable Electronic Medical Records


Technological challenges
• Patient identification: using common patient identification codes (IDs) across the whole
spectrum of the healthcare system could resolve this
• Being able to handle data coming from external entities (i.e. insurance companies) which might
be using different standards or no standards at all
• Many considerations need to be made for the security & information exchange on mobile devices
Business challenges
• It is sometimes fuzzy and unclear on who really owns the Electronic Medical Record. This is
also a data privacy issue. We should never forget that the medical
• Patient treatment coordination from multiple providers, often in different healthcare systems

90
Some important interoperability standards

Health Level 7 (HL7): a collection of message formats and related clinical standards that define
an ideal presentation of clinical information, and together the standards provide a data exchange
framework. HL7 is a standard for healthcare specific data exchange between computer applications.
The name comes from "Health Level 7" (top layer of the Open Systems Interconnection layer protocol
for the health environment).
Cross-enterprise Document Reliable Interchange (XDR): It is used for the exchange of health
documents between health enterprises using a web-based, point-to-point push network
communication, to permitting direct interchange between electronic health records, patient health
records and other systems without the need for a document repository. Example: A nurse at Hospital
A enters a patient's information in the local EHR, and then sends the CCD (a clinical document
exchange standard) directly to Hospital B's system.
Picture Archiving Communication Systems (PACS): These are devoted to the storage,
retrieval, distribution, and presentation of images. The medical images are stored in an independent
format, most commonly DICOM.
Logical Observation Identifiers Names and Codes (LOINC): LOINC applies universal code
names and identifiers to medical terminology related to the electronic health records and assists in
the electronic exchange and gathering of clinical results (laboratory tests, clinical observations,
outcomes management, research)
Electronic Data Interchange (EDI): A standard format for exchanging business data, widely
used in healthcare too. Each element in an EDI message represents a singular fact, such as a price,
product model number. A transaction set often consists of what would usually be contained in a
typical business document or form. Parties who exchange EDI transmissions are referred to as
trading partners.

Open Content: HL7, CCR/CCD, SNOMED, LOINC, HSSP


Open Source: VistA, CONNECT, caBIG, Open Health Tools, Protégé, RIMBAA
Open Standards openEHR, HL7 DCM, DSS knowledge, Clinical protocols
Continuity of Care Record (CCR) is an XML-based standard for the movement of "documents" between clinical
applications. Furthermore, it responds to the need to organize and make transportable a set of basic information about a
patient's health care that is accessible to clinicians & patients.
Clinical Context Object Workgroup (CCOW): an HL7 standard protocol designed to enable disparate applications to
synchronize in real-time at the user-interface level. It is vendor independent and allows applications to present information
at the desktop and/or portal level in a unified way.
Clinical Document Architecture (CDA) HL7: uses XML for encoding documents and breaks down the document in
generic, unnamed, and non-templated sections. Documents may include discharge summaries, progress notes, history and
physical reports, prior lab results, etc. HL7's CDA defines a very generic structure for delivering "any document" between
systems. CDA was previously known as the Patient Record Architecture (PRA).
Many interoperability standards and controlled terminologies exist, addressing specific purposes

91
The Office of Standards & Interoperability
The goal of the Office of Standards & Interoperability (OSI) is to help build nationwide Electronic
Health Record interoperability. The office is under the umbrella of the U.S. Department of Health
& Human Services. The main goals of OSI are summarized below:
Achieve seamless exchange of health data across: federal agencies, governments, private sector
Encourage the further development of health IT standards
To achieve these goals, OSI's roles include:
Enabling stakeholders to utilize simple, shared solutions to common information exchange
Overseeing a set of standards, services, and policies that accelerate information exchange
Enforcing compliance with validated information exchange standards, services, and policies
Additional Study Material
Healthcare Interoperability Glossary: a very good source for further reference
www.corepointhealth.com/resource-center/healthcare-interoperability-glossary

An interesting paper: “A framework for interoperable healthcare information systems”


http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=5643522

The Health Level 7 (HL7)


"Level seven" refers to the highest level of the International Standards Organization's (ISO)
communications model for Open Systems Interconnection (OSI)12 – this is application level. The
application level defines the data to be exchanged, the timing of the interchange, and the
communication of certain errors to the application.
Heath Level Seven (HL7) is an ANSI-accredited Standards Developing Organization operating in
healthcare. As we discussed earlier, HL7 provides standards for data exchange to allow
interoperability between healthcare information systems and focuses on the clinical and
administrative data. The key goals of HL7 are both the syntactic and semantic interoperability. Over
90% of US hospitals have implemented some version of HL7 messages. In healthcare, HL7 has been
used since the 80s (pre internet era). Without a data dictionary to translate the contents of the
delimiters, the data remains meaningless. While there are many attempts at creating data
dictionaries, none have been practical to implement. This has perpetuated the ongoing data
"babelization" and inability to exchange data with meaning.
In more recent version (v3) HL7 uses an object-oriented development methodology and is based on
a Reference Information Model (RIM) to create messages. The goal of HL7 is semantic
interoperability. HL7 develops Conceptual standards (RIM), Document standards (CDA),
Application standards (CCOW), and, of course messaging standards (HL7 v2.x and v3.0).

12
OSI is a conceptual model that characterizes and standardizes the communication functions of a telecommunication or
computing system without regard to their underlying internal structure and technology. Its goal is the interoperability of
diverse communication systems with standard protocols. The original version of the model defined seven layers.
92
Vocabulary in HL7: is the set of all concepts that can be used as valid values in an instance. For
example, the Living_subject class has a coded attribute called administrative_gender_code. A
message instance is subsequently created as part of an implemented interface, one would expect the
administrative_gender_code attribute to convey male or female. Male and female are concepts and
there may be several coding schemes that contain concepts for male and female.

If you want to browse through an HL7 vocabulary, you can download a vocabulary tool here:
http://hl7-vocabulary.pilotfishtechnology.com/HL7/index.html

The Reference Information Model (RIM) of HL7


The Reference Information Model (RIM) is the cornerstone of the new HL7 version. It is the
fundamental model from which all v3 messages are derived. The RIM expresses the data content
needed in a specific clinical or administrative context and provides an explicit representation of the
semantic and lexical connections that exist between the information carried in the fields of HL7
messages. RIM follows an object oriented methodology and is a deliberately abstract model that
expresses the information content of all health areas. It is based on categories named classes and
their attributes. RIM defines all the information from which the content of HL7 messages to describe
a healthcare domain. RIM model uses different colors for each type of backbone class.

RIM is an Information Model: an information model is a structured specification of the


information in a specific domain of interest. An Information Model consists of:
Classes, their attributes, and relationships between the classes
Data types for all attributes and vocabulary domains for coded attributes
State transition models for some classes.
Specifically, HL7 information models are based upon the Unified Modeling Language (UML)13, and
may be represented graphically.
The Classes of the Reference Information Model
Entity: A person, animal, organization or thing. The classes represent health care stakeholders and
other things of interest to health care. The class ‘Entity’ has the following sub-classes: Container -
Device - Language Communication- Living Subject – Manufactured Material – Material –Non
Person Living Subject – Organization – Person - Place
Role: A responsibility played by an entity. A collection of classes related to the Role class and its
specializations. These classes describe the roles of participants in health care. The class ‘Role’ has
the following sub-classes: Access - Employee - Licensed Entity - Patient
Role Link: A connection between 2 roles, to express a dependency between the two roles.

13
UML is a modeling language based on object-oriented modeling methods
93
Participation: An association between an Act and a Role with an Entity playing that Role. The
class ‘Participation’ has only one possible sub-class: Managed Participation
Act: Describes the actions and events in health care services. Examples from healthcare include a
clinical observation, an assessment of health condition (i.e. a diagnosis), treatments (medication etc.),
and patient education. The class ‘Act’ can be one the following sub-classes: Account - Control Act -
Device Task – Diagnostic Image - Diet - Financial Contract – Financial Transaction – Invoice
Element – Observation – Participation – Patient Encounter - Procedure- Public Health Case -
Substance Administration – Supply - Working List
Act Relationship: An association (with direction) between one Act (source) and another Act
(target). It may be an association of a later instance to an earlier instance OR an association from
collector instance to component instance.

The Backbone classes of RIM


Syntax of HL7 messages
HL7 messages use a human-readable (ASCII) encoding syntax based on segments and one-character
delimiters. Segments have composites (fields) separated by the composite delimiter. A composite can
have sub-composites (subcomponents) separated by the sub-composite delimiter, and sub-composites
can have sub-sub-composites (subcomponents) separated by the sub-sub-composite delimiter.
Main Delimiter Characters
0x0D Marks the end of each segment. & Sub-sub-composite delimiter.
| Composite delimiter. ~ Separates repeating fields.
^ Sub-composite delimiter. \ Escape character.

Take a look at the HL7 message below. The segments contain the following information:
The MSH (Message Header) segment contains information about the message itself. This
information includes the sender and receiver of the message, the type of message this is, and the
date and time it was sent. Every HL7 message specifies MSH as its first segment.
The PID (Patient Information) segment contains demographic patient information about the

94
patient, such as name, patient ID and address.
The NK1 (Next of Kin) segment contains contact information for the patient's next of kin.
The PV1 (Patient Visit) segment contains information about the patient's hospital stay, such as
the assigned location and the referring doctor.

MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|199912271408|CHARRIS|ADT^A04|1817457|D|2.5|
PID||0493575^^^2^ID 1|454721||DOE^JOHN^^^^|DOE^JOHN^^^^|19480203|M||B|254
MYSTREET AVE^^MYTOWN^OH^44123^USA||(216)123-4567|||M|NON|400003403~1129086|
NK1||ROE^MARIE^^^^|SPO||(216)123-4567||EC|||||||||||||||||||||||||||
PV1||O|168 ~219~C~PMA^^^^^^^^^||||277^ALLEN MYLASTNAME^BONNIE^^^^||||||||||
||2688684|||||||||||||||||||||||||199912271408||||||002376853

In this segment, the fifth composite is the patient name, which is DOE^JOHN^^^^. (The four ^^^^
characters at the end of this composite indicates that it has a total of six sub-composites, and that
only the first two of the sub-composites are defined.) In this composite, DOE represents the family
name of the patient, and JOHN is the patient's given name.

Determining the HL7 Message Type


Each HL7 message is of a particular message type. This HL7 message type indicates what health-
related information is being provided in this message. The message type also determines what
segments can be included as part of the message. To determine the message type of an HL7 message,
examine its MSH segment. The message type is normally the ninth field of this segment. In the
message header below, the HL7 message type is ADT^A04, which is "Register a Patient".
MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|199912271408|CHARRIS|ADT^A04|1817457|D|2.5|

Additional Information about the structure of a message


There are two main types of coded values in an HL7 message:
Specialized codes used for structural attributes and are defined by HL7 itself.
Externally defined terms and codes such as SNOMED CT (Clinical Terms).
The vocabulary is a set of allowed values for a coded field. All attributes with a code data type have
an assigned vocabulary. The ‘code system’ must be identified for any coded value in a message.

Questions for discussion


Some of the statements below are incorrect. Can you spot the incorrect statements and discuss them?
1. Technical interoperability refers to data exchange between different electronic medical records
2. Business interoperability defines specific evidence based methods that the nurses have to follow
when they perform various interventions to the patient
95
3. Information exchange is crucial because it enables to maintain the continuity of care
4. Patient monitoring devices used in rehabilitation should be plug and play
5. Billions of dollars could be saved annually if interoperability standards were utilized
6. The Office of Standards & Interoperability develops new interoperability standards
7. A code from a classification system can be transferred without the need of any metadata
8. The RIM is based on classes, their attributes and their relationships
9. The role class of RIM defines a physical thing, group of physical things or an organization
10. Is this a valid example of RIM entries?
Entity: Patient Role: Member Participation: Transportation Act: Observation
11. It is optional classes of RIM may have subclasses
12. In HL7, a vocabulary is the set of allowed values for a coded field

96
Chapter 10. Security and Privacy of Data in Healthcare
Personal Protected Health Information (PHI)
The Personal Protected Health Information should be protected and considered private since
possible acquisition by unauthorized parties may reveal the health status of individuals. The tables
below presents the Protected Health Information. Below is a list of information that is considered
PHI and should be protected.

Patient Name Account Number


Address: city, county, zip code (more than 3 Certificate/License Number
digits) or other geographic codes Any vehicle or device serial number
Dates directly related to patient Web URL, IP Address
Telephone, Fax Number Finger or voice prints
E-mail addresses Photographs
Social Security Number Any other unique id number, characteristic,
Medical Record Number or code (generally available or not)
Health Plan Beneficiary Number

Healthcare is a High Security Environment


Healthcare is at high risk of attack or data exposure. Health information systems contain
confidential information (e.g. patient records, financial, administrative information) or have
important, sensitive organizational roles (e.g. accounting, payrolls, quality indicators and
reporting).New security concerns have been raised during the past decades. Those concerns are due
to the fact that the various parties who provide services exchange of data through potentially
unsecure networks. There are many examples indicating how the technologic evolution creates
security concerns and new considerations that need to be made. During patient care, authorized
users have nowadays instant access to current data from anywhere but at the same time there still
exist family doctors who are not networked or their computers are not protected.
Laboratories and other departments are equipped with systems of different architectures and
therefore it is not easy to implement a universal security infrastructure. Insurance and billing
business processes use large amounts of health data which are communicated from hospitals though
secure networks. Data is transferred to other external health organizations for public health
surveillance and other purposes. The notification of infectious diseases to state and federal
authorities is such an example. In addition, all medical prescriptions are nowadays electronic and
data are transferred from the provider to the pharmacists, while telemedicine and remote homecare
services are becoming popular; huge amounts of patient data are sent to case managers via local
home networks and then via the internet.

97
US Government Healthcare Security Regulations
In the United States there are acts and regulations describing the use of electronic patient
information, the privacy of personal identifiable information and accountability issues in electronic
records. In addition, there are in effect various state security and privacy laws and regulations.
Privacy Act (1974): a United States federal law, establishes a Code of Fair Information Practice
that governs the collection, maintenance, use, and dissemination of personally identifiable
information about individuals that is maintained in systems of records by federal agencies. A system
of records is a group of records under the control of an agency from which information is retrieved
by the name of the individual or by some identifier assigned to the individual. The Privacy Act
requires that agencies give the public notice of their systems of records by publication in the Federal
Register. The Privacy Act prohibits the disclosure of information from a system of records absent
the written consent of the subject individual, unless the disclosure is pursuant to one of twelve
statutory exceptions. The Act also provides individuals with a means by which to seek access to and
amendment of their records, and sets forth various agency record-keeping requirements.
Health Insurance Portability and Accountability Act-HIPAA (1996): was enacted by the
United States Congress and then signed in 1996. Title I of HIPAA protects health insurance coverage
for workers and their families when they change or lose their jobs. Title II of HIPAA, known as the
Administrative Simplification (AS) provisions, requires the establishment of national standards for
electronic health care transactions and national identifiers for providers, health insurance plans,
and employers. You can visit the address below to learn more about HIPAA security and privacy
rules and requirements http://www.hhs.gov/ocr/privacy/hipaa/understanding/summary/index.html
Electronic Signature Act (2000): facilitate the use of electronic records and electronic signatures
in interstate and foreign commerce by ensuring the validity and legal effect of contracts entered into
electronically.

Health Insurance Portability & Accountability Act (HIPAA) Privacy & Security
HIPAA aims to protect the confidentiality, integrity, and availability of electronic Protected Health
Information (ePHI). The so called Security Rule of HIPAA addresses three areas (i) administrative
(ii) physical (iii) technical aspects of ePHI. The rules apply to the security (keep secure) and integrity
(keep intact) of electronically created, stored, transmitted and processed personal health
information.
According to HIPAA, healthcare facilities will monitor logon attempts to the network. Inappropriate
logon attempts should be reported to the respective departmental level security designee. All
computer systems that have been installed in the hospital are subject to audit. The same applies to
the access to the hospital intranet, which will also be monitored. As far as the access to protected
health data is concerned, this should only be granted to authorized individuals. Installation of
software without prior approval is prohibited and disclosure of ePHI via electronic means is
forbidden without authorization. Lastly, all computers should be manually logged off when the

98
authorized user is not in front of the computer, for any reason (for example returning from the
screensaver back to the desktop should require a password).

The four dimensions of Security


“Systems and applications in healthcare should operate effectively and provide appropriate
confidentiality, integrity, and availability”. Both security experts and users should realize the level
of risk of harm resulting from unauthorized access, loss, misuse or modification.
Confidentiality is commonly applied to conversations between doctors and patients. Data or
information is not disclosed to unauthorized persons or processes. The information contained in the
message is kept private and only the sender and the intended recipient can read it. The rule of
confidentiality dates back to at least the Hippocratic Oath, which reads: “Whatever, in connection
with my professional service, or not in connection with it, I see or hear, in the life of men, which ought
not to be spoken of abroad, I will not divulge, as reckoning that all such should be kept secret”. Legal
protections prevent physicians from revealing discussions with patients. This physician-patient
privilege only applies to physician-patient secrets during medical care. The maintenance of this
confidential relationship should therefore be preserved when such data is communicated with
electronic telecommunications.
Availability is achieved when health data or information is accessible and usable upon demand by
an authorized person without any delay. Data must be protected against threats and hazards that
may deny access to data or render the data unavailable when needed. For this purpose there must
exist appropriate backup in the event of a threat, hazard, or natural disaster. Health systems should
also provide appropriate disaster recovery and business continuity plans for departmental
operations involving ePHI.
Integrity is the verification that the information contained in the message is not tampered with,
accidentally or deliberately, during transmission. In order to achieve this, health organizations
should ensure that health data must be protected against improper destruction or alteration and
provide appropriate backup in the event of a threat, hazard, or natural disaster.
Authenticity verifies that the people with whom we are corresponding actually are who they claim
to be. An “authentic” system should ensure that the data, transactions, communications or
documents produced and transmitted (electronic or physical) are genuine. It is also important for
authenticity to validate that both parties involved (sender-recipient) are who they claim to be.

Access to EHR systems and proper use of passwords


We use usernames and passwords to authenticate and authorize the user (the “two A’s”). Risks
related with passwords and address potential issues like theft, sniffing, brute force attacks and
others. For the above reasons, “One Time Password Devices” have been used for access to health
information systems by health professionals, like the “RSA SecurID” system. Such small devices
address many username/password concerns using various access techniques (time based/event
based etc.) but they are only good for authentication of the user. User id and password are critical
to ePHI security. Users must access the healthcare facility information utilizing their username and
99
password; password sharing is prohibited. Users are personally responsible for access to information
utilizing their password and are subject to disciplinary action.
The most important rules that should be taken into consideration when handling passwords are
summarized in the frame below:
Do not keep an unsecured paper record of passwords
Do not post passwords in open view e.g. on your monitor
Do not share passwords with anyone
Do not include passwords in automated logon processes
Do not use “weak” passwords. Passwords should be at least 6 characters long & must contain
components from at least 3 of 4 following categories: upper, lower case, numerals, keyboard symbols
Passwords must be changed every 90 days

Strong passwords should be “meaningless”, and should contain a combination of numbers, letters
and symbols and a couple of one or two capital letters. They should also be longer than eight
characters. On the contrary, weak passwords are either too short and/or are real dictionary words.
To remember a password, you can apply your own rules: e.g. ‘my favorite hobby is baseball’ gives
this password: Mfhib!

Very often access to a system is based on the role of the employee. For example, all nurses share the
same role, therefore they use the same credentials to gain access to the Electronic Heath Record.
Each user, as an individual can belong to a group, while groups should be granted access rights.
Different categories of healthcare professionals are granted different access rights and specific
policies should be established for regular audits and updates of group membership, for example on
a yearly basis.
Administrative directors are responsible for informing the IT administrator of any employment
status changes. Upon termination of employment the employee’s network and PC access has to be
terminated without any delay; that means that the former employee can no longer access the system
using their passwords. All ePHI & computer equipment of the former employee (laptops, PDAs)
should be retrieved. For those reasons, the use of a prior employee’s user-ids and passwords should
100
be strictly forbidden. “Generic” user-ids are strictly forbidden and a new clean account should be
provided to the new employee. According to HIPAA, known and suspected security violations must
be reported to the Administrative Director or their designee. Security incidents must be fully
documented to include time/date, personnel involved, cause, mitigation, and preventive measures.

Malicious Software, Backup and Reporting


Electronic messages and e-mail attachments should be tested for malicious software. Nowadays
most security software has dedicated features to ensure safe email. All software installed in
hospitals must be approved by a department level security officer. Pirated programs and files
downloaded from peer-to-peer file exchange software are sometimes infected by viruses, worms,
trojans and spyware. Installation of personal downloaded software is not allowed since it may
provide some security holes to the potential intruder, while suspicious software should be reported
the IT technical support personnel immediately Approved anti-virus software must be installed and
kept up to date on all computer systems and portable devices as well as to any home computer
system and portable device of an employee which may be used remotely to access the hospital
network.

Did you know?


Online discharge summaries are available to everyone in a hospital. A little information is enough to know
more about a person. Criminals use the so called social engineering methods to take the word out of
someone’s mouth and use this stolen patient info to blackmail the patient’s relatives; or the staff may use
patient data to hunt victims, in extreme cases.
Have you heard or packet sniffers and honeypots? Packet sniffers are programs or a hardware that can
intercept and log traffic passing over a digital network or part of a networks: can reveal a lot about health
networks and HIPAA compliance. It is possible to lure potential intruders with a Honeypot: a trap to
detect/counteract attempts at unauthorized use of information systems. It consists of a computer, data, or
network site that appears as part of a network, but is actually isolated & monitored

Backup and physical damage


A system must ensure recovery from any hardware or software damage within a reasonable time
(based on how critical this function may be). Each department must determine data criticality and)
potential threats and must have a plan for backup, disaster recovery, and business continuity in
case of an emergency. In healthcare health professionals cannot be disconnected from their patients’
data even for a single second. Backup data must be stored on an off-site location; so that in the case
a disaster (i.e. flood, fire) destroys all equipment, data are somewhere else intact. It is also assumed
that data coming from the backup must be maintained with the same level of security as the original
data. Electronic assets must be protected from physical damage and theft. Electronic devices
containing ePHI should be secured behind locked doors, when possible. Special security
consideration should be given to portable devices (laptops, smart cell phones, digital cameras, DVDs,
USB “drives,”) to protect against damage and theft. ePHI must never be stored on mobile devices or
storage media unless there exist power-on or boot passwords and auto-log off of the system or
101
encryption of stored data using specific technologies such as True Crypt®. Physical safeguards must
also must provide appropriate levels of protection against fire, water, and other environmental
hazards such as extreme temperatures and power outages/surges.

Data Encryption
Technological solutions are required to protect ePHI where applicable. Examples include data
encryption and secure data transfer over the network. All wireless networks require security
protocols and encryption and any electronic transmission of ePHI must be encrypted. Encryption
must be achieved through software approved by the IT Department Security designee. Data
encryption is the method of using algorithms and mathematical calculations to transform plain text
into ciphered text, to make it non-readable for unauthorized parties. To decrypt an encrypted
message the recipient must use a special key that transforms text back to the original version. No
particular encryption technology-no matter how ‘strong’ it may be can ever, ensure that information
remains secure. Instead, a variety of circumstances need to be taken into consideration to ensure
that personal information is protected against unauthorized access. Data encryption is a
requirement for many data transactions. Earlier, pre-internet era, people rarely used encryption.
Nowadays with banking, online shopping and other services data encryption is a primary
requirement. Connecting to a secure server with a web browser automatically encrypts data to
prevent intruders. In case one attempts to capture encrypted information successfully, it will be
scrambled and unreadable, since the intruder does not have the reverse algorithm to read the data.
Data Encryption algorithms are constantly advancing. Until recently, 64-bit encryption was
considered strong enough, but nowadays at least 128-bit solutions are used. A newest standard
called Advanced Encryption Standard (AES) allows a maximum of 256-bits.
Encrypted data should always stay encrypted when not used and encryption keys must be of
sufficient length to resist attempt to break the encryption. Secure authentication of users is
required. Prior to decrypting, authorized users must be securely authenticated with the use of robust
passwords; only authorized users can decrypt data. No file containing decrypted data should keep
existing after a user had accessed encrypted data and viewed or updated it in decrypted form. Health
information security professionals determine which users have access to encrypted information on
a given mobile device or on mobile media.
For the encryption implementations of ePHI there should be considered that ePHI must be 24-h
accessible. If an encryption system makes data unreadable when a user is unavailable (e.g. death,
illness etc.), or when a user forgets a password, then that encryption is unsuitable for healthcare
environments. Products from well-known vendors provide centralized management of passwords;
remote password resets etc. to facilitate the efficient management media without fearing loss of
data. Encryption systems must backup the encrypted data files on a regular basis. Poorly designed
encryption systems may leave temporary file copies of encrypted data in unencrypted form on
disks/mobile devices.
Symmetric Encryption. This method uses a single key that is shared by the pair of users who
want to communicate a message. It is also called ‘Secret Key encryption’, since the key has to remain
102
secret, because its acquisition is enough to retrieve the original message. Restrictions of symmetric
encryption include that it does not scale very well. Also there is the risk of having intruders ‘grab’
the key through by trespassing into a network or the internet. This is simply because this key is
enough to reveal all our data. Many people have the key and this increases the risk of having some
of them losing it. On the other hand, there are important advantages of symmetric encryption such
as the fact that there is no overhead, making it very fast encryption process. The method can also
be used together with other encryption methods.
Asymmetric Encryption. It is called “Public key cryptography” and is a relatively newer
technology. The idea of asymmetric algorithms was first published in 1976 (Diffie and Hellmann).
In asymmetric encryption there are used 2 different keys. Firstly, a private key that should be kept
secret. None needs this but the message sender. Secondly, a public key that can be seen by anyone.
The private key is the only one that can decrypt data in asymmetric cryptography and there is no
way one can “retrieve” or reverse engineer the private key if they have the public key in their
possession. The role of the public key is to decrypt the data encrypted by the private key. To use
asymmetric encryption, there must be a way for people to discover other public keys. The typical
technique is to use digital certificates. A certificate is a package of information that identifies a user
through his id information (i.e. name, user's e-mail address and the user's public key). During a
secure encrypted communication, both ends send a query over the network to the other party, which
sends back a copy of the certificate. The other party's public key can be extracted from the certificate.

Symmetric vs. Asymmetric encryption in terms of complexity


“How many keys?” Assumption: 6 users, one by one transactions
Symmetric: one public key Asymmetric: each
is shared to 6 people so we new user should just
need to exchange one have 2 more keys (a
public key for each public and a private)
transaction. Total keys to to securely
exchange = (n-1) + (n-2) + communicate data
… + (n-(n-1)) = 15 with all the rest. So
total keys to exchange
= 2*n = 12

Symmetric and
asymmetric encryption
methods can be used in
combination together, to
provide fast, efficient
and secure encryption to
the sensitive health data

103
Common Encryption Protocols and Algorithms
Strong encryption like TLS (Transport Layer Security) and SSL (Secure Sockets Layer) will also
keep data private (but they can't always ensure its security). Websites that uses these types of
encryption may be verified with the procedure of checking the digital signature on its certificate that
in turn must be validated by an approved Certificate Authority.
The Advanced Encryption Standard (AES) is based on “substitution-permutation network” and is
based on a 4×4 column-major order matrix of bytes. Most AES calculations are done in a
special finite field. AES has a fixed size of 128 bits, and a key size of 128, 192, or 256 bits. The key
size specifies the number of repetitions of transformations that convert the input (plaintext) into
output (ciphertext).

Questions for Discussion

1. The need for health data privacy existed since the early steps of medical science. In your opinion
which are the two major challenges of privacy, that are driven by the extensive use of networks
and computer technology in healthcare?
2. Outline any possible negative effects that a possible unauthorized reveal of personal protected
information would have for a patient.
3. Rank the five passwords below in terms of strength (weak-average-strong).
(i) newyork
(ii) happymanie
(iii) etEs!$pr99
(iv) logon
(v) 01081985
4. In your own words describe the symmetric and asymmetric encryption in brief. Which method(s)
would you implement to ensure the protection of ePHI in your organization and why?

104
Chapter 11. Big Data in Healthcare and Emerging Challenges
There are many sources of information which could be used by clinicians for decision making, but
these are not always available. This lack of complete information affects decision making, treatment
and patient outcomes. Healthcare Information systems unable to recognize clinicians as the main
users and, especially, they do not succeed to foreseeing the clinicians’ need for complete and up-to
date information. Even when systems use multiple sources of rich data, their sources often include
outdated, or incomplete and disorganized information. Clearly healthcare costs continue to increase;
therefore by simply implementing new systems without any considerations on how to integrate
diverse and distributed datasets, will not solve the problem.
It is also evident that in healthcare new systems adoption is slow and clinicians continue to lose
valuable time hunting for information. It has been estimated that healthcare professionals actually
waste 20-40% of their time for such procedures. Patient registration systems are not connected and
since the data are widespread and disconnected, across the healthcare system. As a result, each time
a patient visits a new setting, their information has to be reentered into the system. Re-entering
demographic and other registration information is error prone, time consuming and is related to an
additional burden for the employee. As a result, communication of information is not timely and can
be inconsistent, with negative effect on the quality of care.

What is Big Data


The volume of global digital data is increasing exponentially, from 130 exabytes in 2005 to 7,910
Exabyte in 2015. By 2020, it is expected to be 35 zettabytes. This is almost four piles of CDs reaching
Mars from earth! Specifically for healthcare, in 2012 worldwide digital healthcare date was
estimated to be equal to 500 petabytes and is expected to reach 25,000 petabytes in 2020. In other
words worldwide healthcare data is expected to grow to 50 times the current total.

Low storage medium prices


+ Faster CPUs
+ Easier and automated access to Information
------------------------------------------------------------------
= Big Data
An interesting equation of what contributes to the big data phenomenon

Big data is a collection of large and complex data sets which are difficult to process using common
database management tools or traditional data processing applications. The challenges related with
the big data management, include the process to capture, storage, search, share and analyze data.
O’Reilly, a multi-faceted media company and publisher defines big data as: “the data that exceeds
the processing capacity of conventional database systems. The data is too big, moves too fast, or
doesn’t fit the limitations of database architectures”. Big data refers to the tools, processes and
procedures allowing an organization to create, manipulate, and manage very large data sets and
105
storage facilities. It is not appropriate to quantify the meaning of big data in terms of storage size.
Big data is not just about storing huge amounts of data; it is the ability to mine and integrate data,
extracting new knowledge from it to inform and change the way providers, even patients, think
about healthcare. An organization facing hundreds of gigabytes of data for the first time may be in
front of a big data challenge and a need to reconsider data management options. But for a larger site
which utilizes a distributed computing framework and uses advanced data management methods,
may take tens or hundreds of terabytes before data size becomes a significant consideration.
Extremely large data volumes were originally an issue for supercomputers, nuclear physics,
meteorology, and space travel. Late in the 20th century airline and bank operations, entered the ‘big
data family’, while during the mid-1990s, the Human Genome Project was initiated and this was
the first large scale project to use big data in healthcare. Later big data started to be used in finance,
research, marketing and entertainment. Nowadays big data is considered as a challenge and an
opportunity and provides great potential for most industry sectors. Data sets grow in size because
of information-sensing mobile devices, remote sensing, software logs, cameras, and microphones
storing audio and video digitally, radio-frequency identification readers, and wireless sensor
networks. Data also grow since, nowadays every information is captured in digital format and stored
in databases. Most of the data entry procedures nowadays are paperless. Big data is difficult to be
used with relational databases, desktop statistics and traditional visualization packages. They
require instead massively parallel software, on tens to thousands of servers.
In healthcare he modern trend is to utilize larger datasets in health analytics and during decisions,
because of the additional information from related data, as compared to separate smaller sets with
the same total amount of data. This makes it possible for correlations to be found and to identify
trends in the health of an individual of a specific population, prevent diseases and organize health
promotion activities, as well as to determine and improve the quality of health related research. The
box below lists some of the most common uses and sources of big data.

Data for genetic research Internet text, web logs, internet indexing
Radio Frequency Identification (RFID) Storage and data warehouse
implementations Risk management and modeling
Data from Sensor networks 360 View of the Customer
Social networks and their data Mass amounts of e-mails for e-mail Analysis

The data we need in healthcare research is actually unstructured


Despite the fact that Electronic Health Records store patient data in a very well defined and
structured way, the large amounts of combined datasets from many hospital providers, population
based surveys and other contextual resources are not well organized with many inconsistencies,
missing values and other problems.
In healthcare more and more data come from two processes. The first one is the conversion of existing
data to electronic form personal medical records, radiology images, clinical trial data, FDA
submissions, human genetics and population data. The second is the generation of new types of data
106
such as 3D imaging, sensor readings, and genomics. Lots of data in healthcare are so called
‘unstructured data’. Historically, the point of care generated mostly unstructured data: office
medical records, handwritten nurse and physician notes, hospital admission and discharge records,
paper prescriptions, radiograph films, MRI, CT and other images. Structured data can be easily
stored, queried, recalled, and analyzed. Structured data include electronic accounting and billings,
actuarial data, some clinical data, laboratory instrument readings and data generated by the
ongoing conversion of paper records to electronic health and medical records.
When we refer to unstructured data we refer to those sources of data which share the following
characteristics: (i) is full of unneeded/confusing information (ii) is often times unclear and “dirty”
(iii) is often full of valuable information hidden within the massive amounts of data (iv) has the
potential to be useful only when combined.

Big Data and Healthcare Organizations


As we have already discussed before, big data is emerging because hospitals and health systems are
collecting large amounts of data on patients every single day. These data comes for a variety of
settings such as clinical, billing, and scheduling. In the past, a lot of that data was not leveraged to
make patient care and hospital operations better, but recently, there has been a shift to change that.
The explosion of electronic healthcare information is evident mainly due to a series of reasons.
Healthcare systems nowadays store records electronically with the use of Electronic Health Records
and new technologies such as capture devices, healthcare sensors and mobile applications
communicating and collecting data, have emerged.
As a result, increased electronic data is being generated. There are new challenges appeared for the
better understanding of the health status of a population, since public health decisions should be
based upon different health, environmental and socioeconomic characteristics. New reimbursement
models need large amounts of information to accurately understand what occurs with patients, while
the clinical practice becomes evidence-based and predictive. There are many research studies where
historical data have been used in predictive models to facilitate clinical and administrative decision
making. These models will become more accurate when they are built with the use on larger datasets
with a diverse number of variables taken into consideration. The current infrastructure of health
organizations reflects the importance of the issue. Without aggregating, managing and analyzing
big data, the healthcare industry would be in information overload, which would me sometimes
meaningless and often unexploited. Healthcare organizations will keep collecting massive volumes
of data, so aggregating and analyzing it will be a continual challenge.
Big data can enable more than $300 billion savings per year in US healthcare with two-thirds of
that through reductions of around 8% to national healthcare expenditures. Clinical operations and
research and development are two of the largest areas for potential savings, with $165 billion and
$108 billion in waste respectively14.

14
Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014; 2: 3.
107
Those who are directly and indirectly involved in healthcare will benefit from the use of big data
and data-driven systems. Clinicians want real-time access to patient, clinical and other relevant
data to support improved decision-making and facilitate better quality of care. Researchers and
epidemiologists want to have, at their disposal, novel data-driven tools to improve the data workflow
e.g., predictive modeling, statistics and algorithms that improve the design and outcome of their
experiments and epidemiologic studies. Pharmaceutical companies want to better understand the
causation of various diseases and the factors related with the pharmaceutical response, in order to
find more targeted drugs, and design successful clinical trials to introduce new medicines into the
market. Medical device companies collect data from hospital based and home based devices for
monitoring of the patient safety and also to predict possible adverse events. Therefore they need this
data, to integrate it with old and new forms of personal data and make safer and accurate medical
devices for diagnosis and therapy. Finally, patients themselves want their everyday use of
technology to flow into their medical care and to have better control of their own data.

Benefits of Big Data in Healthcare


Big data can be used to convert health data into useful and meaningful information utilizing more
usable formats and appropriate visualizations. During the everyday practice, physicians have more
information at their disposal and they can providing personalized care to their patients, while useful
visualizations and predictive functions facilitate the better point of care decisions. Nurses also
benefit from big data, since nursing care is related not only to the assessment of the patients’ clinical
needs, but also to the understanding of the psychological and social problems of the patient. In
clinical statistics, examples include the estimation of the patient admission count, and the
readmission rate. In financial statistics big data can be utilized to calculate and predict the total
direct cost by total admission and by readmission.
In operational statistics there are many applications such as counting different length of stay
periods. An important issue which the challenge of big data comes to address in the United States
is the cost and quality of services. Rather than just collecting data, hospitals can analyze and use it
to inform decisions in the organization. Big data also provides hospitals performance metrics with
which it is possible to compare operational efficiencies of healthcare organizations. An example can
be the case of individual employees, who can now see their ranked performance among many others
and increase their motivation to achieve better results.
For population health management, big data informs population health management as findings
from predictive models can be shared with providers across the care continuum. This diverse data
offers providers the ability to use information and discover patterns in patient populations that may
not have been possible before. Big data also advances clinical research towards new knowledge
discovery, since more information is analyzed regarding patient care and disease. Therefore, studies
can be completed faster. Recent growth of biomedical research data from genomics, imaging and
electronic health records gives great research possibilities.
The list below presents specific examples of useful indicators that can be calculated with higher
accuracy when a healthcare organization utilizes big data platforms.
108
Accurate estimation of the average direct cost by total admission, by readmission, or by different
case-mix groups and hospital departments.
Risk assessment of patients, to predict cases with an increased probability to develop hospital
acquired conditions and complications, or those cases at risk to be readmitted to the hospital
soon after their discharge.
With the application of statistical process control methods in large datasets, it is possible to
recognize epidemics in a population in a timely manner.
Estimation and projection of the mortality rate (number of cases of a specific disease in a given
population). This information can be extremely useful for population health planning and is often
provided in health population surveillance systems.
Decision making support for physicians. Predictive models provide information about the patient
treatment, diagnosis and prognosis, with the use of the large datasets from historical patient
admissions. These tools extent the decision capacity of physicals who can make the right
decisions at the right time.

Prediction of patients who develop nosocomial infections


Northshore University Health System, Evanston, implemented predictive modeling in clinical decisions. Using large data
sets, there were developed models to identify which patients are most likely to be carriers of a threatening microorganism,
Methicillin-Resistant Staphylococcus Aureus. By implementing the results of that modeling into the Electronic Medical
Records, the health providers working in the hospital were also receiving alerts when a patient who meets the
characteristics of being a high-risk carrier, was admitted. This implementation had very high success rates, reaching
around 90% of correct predictions.
Reduced readmissions
Predictive models for the likelihood of readmission within 30-days data from the EMR into data warehouse have also been
developed. Patient's risk of being readmitted in 30 days is computed and then feed that data back into the EMR, risk (high,
medium or low) of being readmitted in 30 days. Messages are sent to primary care practices. The messages alert the
patients' primary care providers of their high risk and if they have any follow-up appointments scheduled.
Real examples of applications of the use of big data for predictive modeling in clinical settings

The four Big Data “V”s


Volume: The most obvious characteristic that makes data “big” is the sheer volume. There are very
large healthcare datasets and data warehouses, often with sizes counted in Exabytes. For your
reference, one Exabyte has 1 million Terabytes.
Velocity: it is the frequency of incoming data that needs to be processed. As we discussed in earlier
chapters healthcare is a data intensive process and healthcare data are not static, sitting on a slowly
updated database; they instead move fast in networks and shared across the healthcare system in
complex real-time transactions. Velocity, therefore describes the constant flow of new data
accumulating at unprecedented rates and the speed needed to retrieve, analyze, compare and make
decisions using the output has changed. In some medical situations, real-time data (trauma
monitoring for blood pressure, bedside heart monitors, operating room monitors etc.) are actually a
matter of life or death. Future applications of real-time data in the ICU (detecting infections as early
109
as possible) could reduce patient morbidity and mortality or even stop hospital disease outbreaks.
Being able to perform real-time analytics against such high-volume data in motion could
revolutionize healthcare.

An example of big Velocity is the case of “Credit Suisse”. Their business included the Processing of 1,000,000,000
transactions during working hours. These extravagant volumes of transactions raised the demand for: in-memory
architecture for performance, on-disk resiliency for availability and distributed architecture for data coherency

Variety: the data sources are many and diverse often located in different databases across the
healthcare system. The challenge is the integration of data, structured or not, to make them a useful
knowledge discovery resource. Data that are merged together for problem solving are not only text
and numbers, but often clinical orders, online interactions, medical images and videos and recorded
messages.
Veracity: this dimension of big data refers to the correctness, meaningfulness and the relevance of
the data in relation to their intended use. There are challenges related to the evaluation of the
accuracy of the data to become useful for decision making. Traditional data management assumes
that warehouse data which is stored in warehouses is certain, clean, and precise; but data is
sometimes uncertain, imprecise or wrong. Data quality issues are a particular concern in healthcare.
Veracity issues are unique to healthcare are about diagnoses, treatments, prescriptions, procedures,
and outcomes correctness.

Technologies in healthcare related with big data


There are many technologies and sources in healthcare contributing to the big data phenomenon.
Data from sensors provide repeated accurate measurements with no need for humans to make the
measurements themselves. Digital medical images are stored and communicated across the
healthcare system according to standards for medical images such as DICOM and PACS. The
medical imaging systems are interoperable with the Electronic Health Record for a shared research
warehouse enterprise. Knowledge decision databases maintain vast wealth of existing medical
knowledge while robotic devices such as surgical robots, and intelligent adaptive patient
rehabilitation robots operate with the use of artificial intelligence algorithms that have been
developed with the use of large datasets.

Data security and data ownership considerations


Concerns about data security, unintentional exposure or loss of data to unauthorized parties are
expected to exist since we have lots of multi-centric data moving across networks and therefore
private health information should be protected. There is still resistance to moving healthcare data
to the cloud. The idea of putting PHI in the cloud is still immature: use of the Internet, cloud
computing and pooling of data all raise the data security risks. Healthcare data contains details of
a person’s life and it must be protected with the highest security possible. Another concern in
relation to the use of big data is the data ownership. Although most people would assume that they
own their own healthcare data, this may not always be the case. These concerns have led to patient

110
groups (e.g. e-patient movement), where patients help each other to become active participants in
their own care alongside doctors. There is a peer-reviewed, open access journal titled ‘Journal of
Participatory Medicine’ with the aim to advance the participatory medicine among healthcare
professionals & patients. The Society of Participatory Medicine is a cooperative model of healthcare
that encourages active involvement by both patients and healthcare professionals

Big Data frameworks and technologies


Many companies, regardless whether they specialize on healthcare or not, develop and advance data
management platforms, data storage solutions and frameworks. These include traditional vendors
like IBM, Cisco Systems, Oracle, smaller organizations, individual developers, platform companies
like Google, Amazon and finally many open source groups (Linux Foundation, Apache Foundation-
Hadoop, and Mozilla Foundation). Many of the above mentioned solutions and architectures are
based on NoSQL databases, which differ from the traditional relational databases in terms of their
flexibility and distributional nature. NoSQL is a large family of databases which do not all use the
same architecture but they do share some basic principles: they are based on more flexible data
models, can be easily scaled horizontally and vertically and allow an easier and faster development.
Therefore they better deal with the requirements of big data. Big Tables are used in many NoSQL
systems. An example of an open source technology based on Big Tables is Hbase. Big Tables are
distributed storage systems, which scale huge arrays of data among distributed servers. These tables
map three values into a byte array, namely the Row key, Column Key and Timestamp. Tables are
split upon ~200 MB of size chunks.
There are available many software frameworks that can help with the analysis of big data. One of
the more well-known is Apache Hadoop. It is a software framework (not a database) that supports
data-intensive applications and enables applications to work with thousands of nodes. It uses both
the CPU and disc of single commodity boxes (or nodes). Boxes can be combined into clusters, while
new boxes can be added as needed without changing data formats, the way data is loaded etc.
Finally, it is worth mentioning the IBM Big Data Platform, which includes traditional Big Data
technologies (i.e. Netezza) that have been used to address the more traditional Big Data problems
and NoSQL-like technologies that include velocity and variety capabilities.

Examples of Big Data Applications in Healthcare


The National Institute of Health (NIH) aims to play a very important lead role in addressing the
complex issues of big data. Their strategic plan includes involvement of stakeholders in the research
community, government agencies and private organizations. NIH is involved in scientific data
generation, management and analysis.
An example of healthcare data repositories is that of tranSMART. TranSMART is a public domain
data repository with clinical observations, adverse events, patient demographics, clinical trial
outcomes, gene expression and metabolism data. Practice Fusion is a cloud-based EMR platform for
medical practices that also aggregates population data across multiple sites to improve clinical
research and public health analysis. Practice fusion includes e-prescribing, labs, meaningful use,

111
charting and scheduling. Recent projects of Practice Fusion are on cancer and heart disease. Practice
fusion analyzes aggregated data from the EMR and public health to monitor health on a population
level. These data include health population Surveillance and Education (e.g. flu, asthma), drug
surveillance, public health research, care plans and best Practices. Healthx develops and manages
online cloud based portals for health healthcare companies, focusing on enrollments, claims
management and business intelligence. The company uses vast data coming from benefits,
physician, prescription information and other information.
The Institute for Health Metrics & Evaluation (IHME) gathers large distributed data sets globally
for data analysis and health measurement data from disparate sources including censuses, surveys,
vital statistics, disease registries, hospital records. Aim is to support policy decisions and improve
population health. The most recent project of IHME is the ‘Global Burden of Disease’ and seeks to
identify the world’s major health problems, assess the response of the society to address these
problems and identify optimal methods to dedicate resources and maximize health improvement.
The University of California, Santa Cruz Initiative started a large scale 10.5 million project in 2012,
to create the world’s largest repository for cancer genomes, a huge database with biomedical
information is structured, which will allow to get a complete molecular characterization of cancer.
Sickweather LLC scans social media (Facebook, Twitter) to track outbreaks of disease, offering
forecasts to users, similar to keep individuals aware of outbreaks in their area. Humedica, a medical
informatics company connects clinical and patient information across varied settings and time
periods to generate longitudinal views of patient care, to provide accurate and detailed predictive
models over longer periods of time.
Humetrix’s iBlueButton is a mobile health information exchange app to access and exchange
medical records. It combines the convenience of mobile phones with Big Data and gathers medical
information, tracks sleep, manages diabetes, heart disease and asthma, to understand behavior
patterns and motivations for prevention. Asthmapolis collects patient data and provides them with
feedback to better manage their asthma. A mobile sensor tracking device attaches to asthma
inhalers to monitor the time and location of events. Asthmapolis aggregates real-time data for
epidemiologic and public health use. Finally, ZEO is a personal sleep coach device which analyzes
over a million nights of data to help consumers improve their sleep. The device tracks the quality of
sleep and gives personalized advice on how to improve sleep. ZEO shared sleep data with
universities, for a 360 degree understanding of sleep. Its limitation is that the sleep data is not
combined with blood pressure, weight, heart rate, and other sleep related measures.

Questions for Discussion

1. Explain how the use of big data can improve health outcomes.
2. Discuss a clinical scenario and explain how the big data Vs are relevant to your scenario
3. Discuss how big data can contribute to high quality population health surveillance systems.
112
Chapter 12. Data Mining for Decision Making in Healthcare
As we discussed in Chapter 11, healthcare data is expanding. Huge amounts of data are generated
during healthcare transactions and large datasets are available from many different sources of data,
which are often not directly health related, but are useful to assess the health status of an individual
patient or a population as a whole. These may include environmental datasets, labor data, and other
socioeconomic datasets. Human beings exist in balance with their environment and this is the reason
why external factors affecting human health should be considered when we try to understand the
health dynamics and the causation of health conditions.
The aforementioned datasets and healthcare transactions are too voluminous and complex to be
processed by traditional methods. Recent technological advances in new computer and information
sciences knowledge make it possible to analyze large amounts of data in order to find useful
information and to create predictive models for clinical and administrative decision making. Recent
advances in computers are more than evident in everyone’s life. Computer systems become cheaper,
hard disk drive capacity increases and processors are faster. In addition parallel computing
architectures and advanced affordable networks make it possible to apply advanced data analytics
methods on large distributed data files.

Definitions of Data mining


Data mining is the process of finding previously unknown patterns and trends in data, to build
predictive models. It involves data selection and exploration and building models using vast data
stores to reveal unknown patterns. The analysis is on often large observational data sets to find
unsuspected relationships and to summarize the data in novel ways that are both understandable
and useful. It is becoming a trend for data mining algorithms to be used in order to support both
clinical decision-making (e.g. diagnosis, treatment, prognosis prediction) and administrative
decision-making (e.g., staffing estimates, insurance, demographic trends, quality assurance, etc.) in
healthcare delivery.
Unlike statistics, data mining, without a hypothesis, explores data that have been collected in
advance, and discovers hidden patterns from data. In short, data mining is a process of producing
the general (i.e., knowledge or an evidence based hypothesis) from the specific data. Data mining is
one of the major steps of the Knowledge Discovery in Databases (KDD) process, which starts from
the data selection, there follows with the data preprocessing and data transformation step, then the
data mining phase and finally the interpretation of results.
Data mining emerged in the middle of the 1990’s. Actually the term ‘Data Mining’ was only
registered as term for 2010 Medical Subject Headings (MeSH)15 in late 2009. Data mining has been
used extensively by financial institutions, marketers, retailers and manufacturers. In other words

15 MeSH is a comprehensive controlled vocabulary for the purpose of indexing journal articles and books in the life sciences
113
it is a new approach to data analysis and knowledge discovery. Data mining originated from work
of statistics and machine learning as an interdisciplinary field has advanced since, including the
areas of pattern recognition, database design, artificial intelligence, visualization and other.

How Data Mining compares to traditional Statistics


Statistical methods have been in existence and extensively used in research well before the age of
computers. They were, therefore, created around the assumption that the mathematical calculations
need to be feasibly performed manually. Clearly, very large datasets were close to impossible to be
handled by conventional methods back in the day. For this reason, statistical methods rely on the
selection of a sample from the entire population, and the analysis of the sample data; the results
will then be generalized for the entire population with a given level of measurable statistical
confidence. In statistics, it is the user who specifies the variables, the questions that will be included
in a data collection tools and the method of data collection (e.g self-administered questionnaire vs
interview. These factors may influence the resulting models negatively. We need to mention, though,
that in the everyday life, data are not collected in order to perform statistical analysis, but to support
primary functions, such as providing healthcare services to patients.
Statistics are based on predefined hypotheses (deductive), while data mining is abductive. In
statistics, a hypothesis is built and then data is collected to test the hypothesis, as modern science
does. In statistics, the use of a collection of data provides almost identical statistical significance as
would occur if the researcher had to use the entire population data. This dataset is called a sample
in statistics and is typically collected for the sole purpose of that study. In the past, pursuing
research by collecting and analyzing sample data was unavoidable for two reasons. Firstly it is
humanly impossible to actively collect data from the entire population for the purpose of a study in
a reasonable timeframe and secondly, even if this entire data became available. Secondly, most
legacy computer systems or manual calculations could not handle an entire population data sample
due to the technology limitations.
Statistics use ‘conservative’ analysis strategies, while data mining is flexible as far as the methods
that may be used are concerned. Traditional statistical methods handle numeric data only, while
data mining can handle other kinds of data such as medical images, and text. In terms of the
underlying methodologies, statistics are primarily based on mathematics, while many data mining
approaches also adopt heuristics to resolve problems, especially in discrete data, while assumptions
such as on linearity, and probability distribution, need to be satisfied before a statistical test may
be used.

Sampling versus Entire Population


Statistics uses a sample of data from a population
Data mining typically uses data for entire population
Since data mining discovers hidden patterns, data mining should use all the population data.

114
Need for Data Mining in Medicine
Medical data are often noisy, incomplete and uncertain. Lots of data are collected due to
computerization (text, graphs, and images) while there exist many disease attributes available for
decision making. Evidently, there is nowadays an increased demand for health services because of
the greater awareness of citizens, increased life expectancy. At the same time, overworked
physicians and facilities, and stressful work conditions in ICUs and other settings, are often a
reality, and therefore providing health care professionals and administration with advanced data-
driven tools to help them with decisions can positively contribute to a more effective, evidence based
and time and resources efficient healthcare model.
Data mining methods can support the whole spectrum of medical procedures. From the patient
diagnosis, by classifying disease patterns, to the treatment, by selecting, from available treatment
methods the most suitable one, and the prognosis, by predicting future outcomes based on previous
data and present conditions. Information gained from data mining is expected to maintain a high
level of care and to improve organizational planning. This is evident by the fact that healthcare
organizations that perform data mining have better predictions about their mid and long-term
requirements. There exist numerous financial motivators in the health industry for the use of data-
driven, predictive tools; healthcare organizations slowly but steadily started to recognize the need
to make decisions based on the analysis of clinical and financial data. Health insurance companies
try to reduce money loss due to fraud by using data mining methods while prospective payments in
hospitals may be based on classifying patients into case-mix groups.
Data mining can also help advance research, since healthcare data analysts and researchers gain
up-to-date biomedical knowledge and they can more easily understand large biomedical datasets. It
is therefore possible, with the use of data mining, to generate scientific hypotheses from large
experimental data, clinical databases, and biomedical literature.

Data mining methods


There are two main categories of data mining, namely the descriptive (or unsupervised learning)
and predictive (or supervised learning) data mining. Descriptive data mining methods group data
by measuring the similarity between objects and discover unknown patterns so that users can
understand huge amounts of data. Descriptive data mining is of exploratory nature. The most
commonly used descriptive (unsupervised) data mining methods are clustering, association, and
data summarization. Prediction data mining, on the other hand, infers prediction rules
(classification/prediction models) from training data and applies the rules to unclassified data.

Clustering
Clustering is unsupervised learning and when implemented it observes only independent variables.
There is no target variable to be specified before a clustering experiment. For this reason, clustering
may be best used for studies of an exploratory nature, especially with large amount of data, with
little known about data. Clustering methods group objects in specific number of clusters: objects in
115
a cluster are similar and objects from different clusters are not similar. Clustering algorithms are
generally categorized into partitional and hierarchical methods.
Partitional (or centroid based) clustering algorithms require a user to select a desired number of
clusters (k), in order to relocate objects to the user provided number of k clusters. Partitional
clustering methods are categorized according to how they relocate objects, how they select a cluster
centroid among objects within a cluster or how they measure similarities between objects and cluster
centroids. The advantages of partitional clustering methods are that they are, in general, very fast,
often providing superior clustering accuracy as compared with hierarchal clustering algorithms,
while they can handle large data sets which hierarchal algorithms cannot (better scalability). Their
major drawback is that their clustering results depend on the initial cluster centroids (which are
random); in other words, clustering results are a little different each time the partitional algorithm
runs. K-means is the most widely-used partitional algorithm and generates clusters in a two-step
process. Firstly it randomly selects k centroids (objects) then it decomposes objects into k disjoint
groups (based on the similarity between centroids and objects). A cluster centroid is the mean value
of objects in the cluster.
Hierarchical clustering algorithms merge the most similar two groups of objects based on pair wise
distances between two groups of objects, so that objects are hierarchically grouped. There are
different methods of hierarchical clustering, based on the selection methods of the representative
object of each group for similarity calculation (e.g. single-link, complete-link, and average-link). The
main advantage of hierarchical clustering is their visualization capability which shows how many
objects are similar one another. By reviewing dendrograms, researchers can reasonably guess the
number of clusters. The disadvantages of this family of clustering methods are that they are very
complex algorithms and therefore require a huge amount of system memory to calculate distances
between objects. It takes for a hierarchical algorithm approximately 60 seconds to cluster 1000
objects (records); in order to cluster 2000 objects it takes 480 seconds (2000/ 1000)3 *60), provided
that there is enough system memory available. This complexity is such that the algorithms are very
much limited for very large data sets. Among the hierarchical algorithms the average-link algorithm
provides the best clustering accuracy in most cases.

Left: Centroid based clustering, Right: hierarchical based clustering

116
Another type of clustering following a different approach is the distributional clustering where data
is modeled with a fixed number of randomly generated Gaussian distributions. The most well-known
distributional clustering algorithm is the Expectation-Maximalization (E-M). The ‘E’ phase
estimates the expectation of log-likelihood using the current estimate for the parameters while the
‘M’ phase computes parameters maximizing the expected log-likelihood of the ‘E’ step. In order to
obtain a hard clustering, objects are often then assigned to the Gaussian distribution they most
likely belong to, for soft clusterings this is not necessary.
When performing clustering analysis, repeated sampling and analysis is a good practice, to avoid
sampling bias and to correctly determine the correct number of clusters. After determining the
number of k, sophisticated partitional algorithms can be used. During a clustering experiment, data
outliers can influence the number of resulting clusters. Some datasets contain only a few outliers
while some samples are noisy and contain many outliers. In each case, even a single outlier forms a
cluster. Such outliers should be eliminated, especially when partitional clustering methods are used,
since these are mathematically mean is sensitive to outliers. Sometimes though, outliers contain
useful meaning. It should be noted that most clustering algorithms only handle numeric data.
However, most healthcare databases have a number of categorical attributes. Although it is possible
to convert categorical into numerical data for clustering, this conversion distorts distances between
categories. Imagine three discrete values, A, B, and C being converted into 1, 2, and 3, respectively.
In the real world let’s assume that distance from A to B is 1 point, from B to C is also 1 point, but
the distance between A and C will be 2 points. This conversion indicates that A and C are more
dissimilar than either A and B, which is not true. There exist few clustering algorithms that can
handle categorical data, such as the FarthestFirst (found in Weka), and the two-step cluster analysis
(found in SPSS).
Clustering has been widely used to study genes when very little information is available and when
microarray data can be used for clustering genes. Gene clustering information is valuable for
researchers who study genes. When very little or no information about data is known, hierarchical
clustering algorithms should be used first because they do not require the input of k (# of clusters).

Association Rule Mining


Association rule mining methods are often known with the name basket analysis in the market,
when used to discover customers’ hidden sales patterns or relationships among items purchased. A
customer who buys bread and butter, is likely that he/she buys milk also. Using this information,
grocery managers can increase their sales, with more targeted advertising campaigns and pricing.
Association is used to discover relationships among all the attributes can generate a huge number
of association rules.
In the healthcare, the same methodology is used to discover underlying relationships among
symptoms, diseases and other inter-related factors. The algorithm requires two user inputs: support
and confidence (%), which serve as filters. Association rules (sets of transactions) that frequently
occur (support) and which are accurate (confidence). The importance of the support metric is to limit

117
the number of rules for very infrequent transactions. This property also significantly limits the
search for frequent item sets and considerably improves the efficiency of the algorithm. In healthcare
applications, though, the selection of support levels needs to be done carefully, since an extremely
low threshold will exclude possible strong rules for rare events (e.g. male breast cancer).
Techniques to make searches faster in huge datasets include the use of hash tables, sampling
techniques and transaction reduction (transactions without frequent items are not read further). A
good association mining system should provide tools that help domain experts eliminate
meaningless association rules (e.g., prostate cancer → male) and organize raw association rules
using the concept of hierarchy.

Classification for predictive modeling


Classification methods are used to classify data into predefined categorical class labels. To classify
data, a classification algorithm creates a model consisting of classification rules. We use the term
‘class’ in classification, to describe the variable in a data set, in which we are interested to predict.
The term ‘class’ is mostly known as ‘dependent variable’ in statistics. In healthcare, classification
methods are very often used to predict the medical diagnosis and prognosis based on health
conditions.
The creation of a model with the use of a classification method is called training phase. This step
builds a classification model, by analyzing a portion of the dataset, which is called training data.
The models can be simple rules such as ‘IF BloodPressureHigh=yes AND Smoking=yes THEN
HeartFailureRisk=yes’ to equations deriving from statistics or more advanced computational
methods. Classifying rules are never 100% true; rules with 90-95% accuracy are generally
considered solid rules. After the training phase, there is a second step that cannot be skipped and
this is called the testing phase. Testing is performed to evaluate the performance of the developed
algorithm. Without this step we will not be able to know if the algorithm performs reasonably well
to be useful for clinical and administrative predictions. Testing is very simple and computationally
lightweight compared to the training step. Typically the ~70% of a dataset is used as training data
(~70%) and the remaining 30% as testing data.
Sometimes, the records in testing data are either very hard or very easy to classify and in such cases,
classification accuracies are not reliable. For this reason, a method cross-validation needs to be
applied so that every record is used for both training and testing in rounds. These multiple rounds
of cross-validation using different partitions each time, aim to reduce variability. In cross-validation,
the evaluation experiments are normally performed 10 times (10-fold cross-validation). Cross
validation enables classification accuracy comparisons between classification algorithms and is used
as a technique to assess how the results of a statistical analysis generalize to an independent data
set. One round of cross validation is done by partitioning a sample of data into subsets, performing
analysis on one subset (training set), and validating the analysis on other subset (validation set).
This process is repeated multiple times (e.g. ten rounds in the 10-fold validation) and validation
results are averaged over the rounds.

118
Classification methods belong to supervised data mining since the user needs to provide a target
variable of interest. Before applying classification, redundant attributes and irrelevant attributes to
class (e.g., sex attribute on prostate) should be discarded and methods to select features that are
relevant to the classification process need to be selected using feature selection algorithms. Without
discarding irrelevant attributes, these would increase the noise and slow performance. The feature
selection is, in most cases, performed with the use of statistical methods (such as correlation
analysis) to find the most important attributes, which are those correlated in a statistically
significant way to the class under investigation. Noise reduction should be used carefully: a
drawback of eliminating variables during feature selection based on simple correlations, is that we
may miss an important relationship between a set of independent variables and a dependent
variable. Individually, smoking and an infection might not affect stomach cancer so that they might
be eliminated. However, their combination may be significantly related to the cancer. This is why
there are available more advanced feature selection methods which investigate the feature relevance
following multivariate approaches.
Classification is the core data mining method in bioinformatics. Researchers have managed to
distinguish between similar diseases if they can have the DNA expression microarray data of sample
cells infected with similar diseases and can correctly classify microarray data. Golub et al. correctly
distinguished acute myeloid leukemia and acute lymphoblastic leukemia by applying classification
algorithms on gene expression data. In another study, Harper compared the performance
classification algorithms such as Discriminant Analysis (DA), regression models (multiple and
logistic), tree-based algorithms (CART), and artificial neural networks, on healthcare datasets.

The well-known classification methods


Weka is a software which incorporates a collection of machine learning algorithms for data mining
tasks. Weka (version 3.x) provides more than 50 classification algorithms. There is no single-best
classification algorithm for every biomedical data set. Output (classification model) is tested with
testing data to measure accuracy and the top classification algorithms should then be selected for
future prediction. The next section discusses some of the most well-known classification algorithms.
Naïve Bayes is a probabilistic statistical classifier widely-used in medical data mining. The term
“naïve” indicates conditional independence among features and, for this reason, the computation
complexity is greatly reduced. The algorithm’s assumption that all attributes are independent one
another is considered to be the main drawback of the method, but at the same time, this
characteristic makes Naïve Bayes a surprisingly computationally efficient option. Because of this
simplicity, it can handle a dataset with many attributes and only needs a small set of training data
for accurate estimations because it only requires the calculation of the frequencies of attributes in
the training data set. Generally, Naïve Bayes produces good accuracy despite the above violations.
Many researchers use the classifier to estimate a baseline performance, which in many cases can be
surpassed by more advanced classification algorithms.
Neural networks (NN) mimic the neurologic functions of the brain using computational nodes. The
goal of artificial neural networks is good, or human-like, predictive ability. Each node/neuron is
119
interconnected with other nodes via weighted links. The link weights are adjusted when the NN is
being trained. Nodes are classified into three categories: the Input, the Hidden and the Output
layers. The most widely used NN is the ‘multi-layer perceptron with back-propagation’ algorithm. It
often is the best classification algorithm which compares to newer methods such as decision trees
and the Support Vector Machine. On the downside, neural networks require many parameters, that
are empirically determined and their classification performance is sensitive to the parameters
selected. Clinicians, therefore, find it difficult to understand how its classification decisions are
taken and cannot interpret the results easily. Neural networks require an extremely slow training
process and are computationally heavy (~100 times slower than regression when using the statistical
analysis package SPSS). Most importantly, in some scenarios neural networks provide classification
accuracy inferior to recent classification algorithms (decision tree and support vector machines).
Decision trees are classification methods with easy to understand visualization of the output. The
algorithm C4.5 is the most widely-used decision tree algorithm. Decision tree classifiers construct a
hierarchical like a tree structure, which is the training step of classification. The method for the
construction of the tree is called Attribute Selection Method (ASM). ASM finds an attribute whose
sorting result is closest to the pure partitions by the class in terms of class values. Selected attributes
become nodes in a decision tree. A drawback occurs when a data set contains many attributes and
in this case the decision tree may be too complex to be easily understood. To resolve the problem,
tree pruning statistical approaches are applied to such decision trees.
The Support Vector Machine (SVM) algorithms are based on the statistical learning theory and is
designed to solve two-class classification problems (e.g., safe therapy vs. risky therapy). When a
dataset is represented in a high dimensional feature space, it searches for the optimal separating
hyperplane where the margin between two different objects is maximal. Hyperplanes are decision
boundaries between 2 different sets of objects. SVM uses support vectors and the margin is
determined using the two support vectors. The major advantage of SVM is its classification accuracy.
SVM is designed to resolve 2-class classification problems; for multiclass problems, it is possible to
reduce a multiclass problem into multiple binary problems. Various SVM functions have different
classification accuracy for every data set and this is resolved by selecting a right kernel function.
The main drawback of SVM is that its training step is extremely slow and requires extensive
computational resources.
Ensembling methods combine multiple classifiers together for better classification accuracy than
the use of one classifier. A number of studies show that ensembling improves classification
performance in the biomedical and healthcare fields. For example if Classifiers A, B, and C predict
that the patient has lung cancer and only two Classifiers, D and E predict that the patient doesn’t
have lung cancer, with lower accuracy, then, using a voting strategy, the patient is determined to
have lung cancer. It is also possible for each classifier of an ensembling method to be differently
weighted. A very famous ensemble algorithm is AdaBoost (Adaptive Boosting). Adaboost received
attention in biomedicine because it has very high classification performance and normally
outperforms even SVM. AdaBoost uses weighted majority voting; classifiers with good classification
results during the initial training process have higher weight in final decisions. Examples of using
120
ensemble methods in healthcare include mining of MRI neuroimaging data obtained for observation
of persons engaged in a bivariate task (i.e., belief vs. disbelief), mining of prostate cancer images,
the detection of Alzheimer’s disease through hippocampi segmentation, and other.

Applications of data mining from health industry


Data mining is widely used in healthcare because of its descriptive and predictive power. Many
examples of applications have been published, such as the prediction of healthcare costs, the
detection of health insurance fraud, disease diagnosis and prognosis, and the prediction of length of
stay (LOS) in a hospital. With the use of data mining methods, it is possible to obtain frequent
patterns from biomedical and healthcare databases, such as relationships between health conditions
and a disease, relationships among diseases, and relationships among drugs. There are many
examples in the literature describing the application of data mining methods for healthcare
management.
Group Health Coop. stratifies patients by demographic characteristics and medical conditions to
determine which groups use the most resources while the Arkansas Data Network uses readmission
& resource utilization and compares data with scientific evidence to determine best treatments. Blue
Cross uses emergency department and hospitalization claims data, pharmaceutical records, and
physician interviews to identify unknown asthmatics and develop interventions. Seton Medical
Center uses data mining to decrease patient length-of-stay, avoid clinical complications, provide
information to clinicians etc. Sierra Health Services has used data mining to identify areas for
quality improvements, including treatment guidelines, disease management groups, and cost
management. Finally the ‘Lightweight Epidemiological Advanced Detection Emergency Response
System’ (LEADERS) analyzes data and statistics to search for patterns that might indicate bio-
terrorist attacks.
There are many data mining applications to evaluate treatment effectiveness. By comparing and
contrasting causes, symptoms, and courses of treatments, data mining can deliver an analysis of
which courses of action prove effective. The United HealthCare mined its treatment record data to
cut costs and deliver better medicine. It also developed profiles about doctors’ practice patterns to
compare these with industry standards. Another example comes back from 1999, when Florida
Hospital launched the clinical best practices initiative, to develop standardized paths of care.
The application of data mining can support the customer relationship management, to determine
preferences, usage patterns, patient needs to improve satisfaction. As an example, using data
mining, the Customer Potential Management Corp. developed an index that indicates an
individual’s trend to use specific healthcare services, defined by 25 diagnostic categories. The index
was based on millions of transactions and can identify patients who can benefit most from specific
healthcare services, encourage those who most need specific care and reach out audiences for
improved health and long-term patient relationships.
Use of data mining to detect fraud and abuse to establish norms and then identify unusual or
abnormal patterns of claims by physicians and clinics, is not uncommon. The Medicaid Fraud &

121
Abuse Detection System, recovered $2.2 million & identified 1,400 suspects for investigation after
operating for less than one year. ReliaStar Financial Corp. reported a 20% increase in annual
savings, while the Wisconsin Physician’s Service Insurance Corp noted significant savings. Finally
the Australian Health Insurance Commission estimated tens of millions of dollars of annual savings.
Highmark, a health insurance company, built classification models, based on claims, customer and
provider data, to identify potential fraud instances. Their fraud detection system aimed at real-time
analysis to build predictive models that can detect fraud and stop it before it occurs. Highmark has
found that conducting decision making regarding fraud is carried out more quickly than before, as
the classification system is automated to avoid labor-intensive work. This updating cycle of data
mining led to savings of up to $11.5 million.
In another study, van’ T Veer used the DNA microarray data of 98 primary breast tumors to cluster
the tumors using a hierarchical algorithm and classified the 34% “relapse” patients in the upper
cluster (62 tumors) and the 70% of “relapse” patients in the lower cluster (36 tumors) for the
development of distant metastases. Upper cluster is considered to be “poor prognosis tumors” while
lower cluster is “good prognosis tumors”. After clustering, the researchers also used classification to
predict poor prognosis; according to their results, the prediction of cancer outcomes using microarray
data was better than the prediction using clinical parameters.
In another case, a decision tree model (Security Blue Reimbursement Model) was built using patient
symptoms, health history, and patient demographics to predict the risk for developing diseases and
to rank patients based on the risk to develop one of 13 diseases. The objective was to enable proper
Medicaid and Medicare reimbursements. The cost of care for patients detected at an early stage can
be lowered as providers and insurers do not resubmit claims. These decision trees were annually
revised because of the growth of the number of diseases modeled.
The next example is an association rule mining application, and comes from South Korea; scientists
used the Korea Medical Insurance Corporation (KMIC) database to identify relationships between
two drugs, or between diseases to help formulate a government policy on hypertension management.
The KMIC used healthcare utilization data, demographic, clinical data (e.g., blood glucose) and
lifestyle data (e.g., smoking and drinking) from a nation-wide-health-promotion program. The
second example of association rule mining comes from Taiwan. Antacids are used to alleviate the
gastric ulcer and relieve heartburn and do not require prescription. Despite this, the Taiwanese NHI
reimburses them. Researchers were interested to know how antiacids are used with other drugs and
analyzed the use patterns of antacids using association mining. 526,693 patient visits and 2,574,739
prescription records were analyzed. With the use of a support level of 1% and a confidence level of
52.2%, the model output 36 association rules out of which the researchers manually extracted the
five most frequently used drug sets with antacids.

Step by step business model


Organizational priorities: Before applying data mining on healthcare datasets it is important to
outline the organizational priorities and objectives, taking into consideration which are the main

122
attributes, what questions need to be answered and what is the available data that can be utilized
to develop predictive algorithms.
Data preparation: as soon as priorities have been defined, the next step involves merging all the
data files that will be used for the analysis, which might come from different systems and databases.
In some cases, a random data sample is selected and data transformation methods are applied. The
above data preparation steps will contribute to forming the target dataset.
Modeling stage: this is the actual data analysis step, which can include one or more data mining
methods (cluster analysis, regression analysis, decision trees etc.). In this step, the development of
the model (testing phase) will be followed by the testing phase to estimate the model accuracy.
Evaluation stage: when the models have been created and their performance is known to the
researchers, the comparison of the models will facilitate decisions on which model(s) should be kept
in order to be incorporated into the under development decision support systems. The data mining
models will be compared by using a common yardstick, such as lift charts, profit charts, or diagnostic
classification charts. Factors to consider do not only include the model performance but also the
computational efficiency.
Deployment: this is where the data mining models are actually deployed and start to be used as
part of the healthcare function. The developed models are built around an intuitive graphical user
interface. An ideal data mining software should support intelligent data preprocessing that
automatically selects data for data mining and uses domain knowledge for various data processes
and should fully automate the knowledge discovery process so that it understands and utilizes
existing knowledge in data mining processes for better knowledge discovery.

Limitations for the application of data mining in healthcare


Resources, particularly time, effort, and money need to be allocated when a new system is in use; in
the case of data mining, though, these investment considerations provide very attractive
opportunities, since they will clearly contribute to an evidence based, data driven medicine and
therefore to improved healthcare services. On the downside, as of 2016, there still is a lack of
commercial full data mining packages for knowledge discovery. Interoperability is not achieved yet
in its entirety and different parts of the hospital are still disconnected. This limitation makes it
difficult for complete applications to be incorporated into healthcare organizations. Another
technical limitation is that many hospitals do not maintain data warehouses, since this can
sometimes be costly and hospitals fail to see the added value of such an investment. It is important
that the hospital management is committed to support any effort to the right direction.
Data mining algorithms usually require user parameters. End-users usually do not have sufficient
information about the parameters and their selection. Domain knowledge, statistical and research
expertise, and IT and data mining knowledge and skills are involved to the development of data-
driven healthcare predictive systems. For this reason challenges for end-user applications include
the automation of the parameterization process and the translation of the parameters to easy to
understand and toggle end-user application options. Data mining results are sensitive to these
123
parameter(s).
Another consideration of data mining is related with the acceptable model performance thresholds.
The accuracy is normally not high enough to be used in a clinical environment, due to the low quality
of patient data and most importantly because of a requirement for extremely high accuracy levels,
since decisions have huge impact on human health and life. Health Information Systems have often
primarily designed for financial and administrative/planning purposes and secondarily for clinical
functions and therefore it is challenging to obtain high quality data for clinical data mining, since
there is a lot of missing data.
Mining medical data is also involved with privacy and legal issues. Health researchers must ensure
patient privacy and anonymity of patient data. In addition, medical data mining may reveal
previously unknown medical errors (suspicious patterns in medical practice), which could lead to
lawsuits against doctors. Therefore a balance should be kept between data quality and availability
versus the need to protect patient confidentiality.

Questions for Discussion

1. Discuss how ‘the use of data mining in healthcare can contribute to improved patient
outcomes’.
2. Provide an example of knowledge discovery in healthcare data, with the use of supervised
methods and another one with the use of unsupervised methods.
3. Explain in what ways the model performance affects the potential feasibility of an algorithm
in a clinical environment.
4. Explain the training and testing phases of a classification experiment. Why is it unreliable
use a model that you only trained, without further testing it?
5. You want to find case-mix groups of your patients. Which family of data mining algorithms
would you use and why?
6. You want to predict the patient length of stay using clinical and demographic information.
Which family of data mining algorithms would you use and why?

124

You might also like