Professional Documents
Culture Documents
Informatics
Digital Lecture Companion
Chapter Page
Preface……………………………………………………………….. 2
7. Standards in Healthcare………………………………………….. 73
8. Databases in Healthcare…………………………………...…….. 78
9. Interoperability in Healthcare……………………………….….. 87
1
Dimitrios Zikos: Data Driven Health Informatics
Preface
Data collection, maintenance, retrieval and analysis are especially important in healthcare,
since providers and facilities need immediate access to patient and administrative
information for clinical and administrative decision making. Health informatics provides the
appropriate technologies, tools and methods to efficiently support healthcare and optimally
utilize the wealth of available health care data, in favor of decisions in primary, hospital and
tertiary care. This book begins by introducing the reader to the discipline of health
informatics, and further discusses the fundamental concepts of data and information,
focusing on the nature of health care data and the data flow and sharing across health care
users. The principles of data collection in healthcare are covered, with reference to the
sensitive and complex nature of health data. The Hospital Information Systems and
Electronic Health Records are a discussed, as they are very important tools that facilitate the
information management and support clinical decisions. There is also an extensive coverage
of the interoperability dimensions in healthcare and also the main principles of database
design. We want to provide to the reader a high level of understanding about the healthcare
data organization. The book furthermore explores the challenges and opportunities coming
from the use of big data and covers the fundamental principles of data mining. This textbook
aims to provide to healthcare providers and administrators a thorough understanding of the
significance of the use of new technologies and the potential of utilizing healthcare data to
support healthcare professionals and decision makers make data driven, evidence based
decisions.
The author
Dimitrios Zikos is assistant professor at the Central Michigan University. He holds a
Bachelor of Science degree in Nursing and a Master’s and Ph.D. degree in Health Informatics,
from the University of Athens, Greece. His research involves clinical and administrative
informatics, clinical statistics and data mining, healthcare delivery and clinical care policies
for decision making. In 2013, he arrived at the United States as a visiting professor to help
the University of Texas at Arlington develop new curricula in Health Informatics and
supervised undergraduate and graduate students in health analytics projects. Dr. Zikos has
been investigator of national and regional funded projects and co-organizer of an NSF-
sponsored annual conference PETRA (Pervasive Technologies Related to Assistive
Environments). While overseas, researcher and project manager in large scale European
Union multi-country e-health projects and partner with an accreditation and research center
in Greece for the European Union Network for Patient Safety. This research resulted in
translational guidelines for health professionals and the public on patient safety. He is the
author of numerous peer-reviewed journal and conference papers and reviewer for numerous
journal and conference contributions.
2
Dimitrios Zikos: Data Driven Health Informatics
Chapter 1. Introduction to the Discipline
The Information in Healthcare
In health sciences there are procedures of high complexity with reference to biological
organizations & their functions. A disease is very often a result of interactions of a patient in
their environmental and psychosocial context. Therefore, when evaluating health, many
considerations should be made, and a large number attributes need to be collected, stored
and retrieved during decision making.
The information is produced and communicated during the above mentioned procedures:
there are multiple users that share this information in a hospital and outside of the hospital.
Very often, same data are accessed concurrently by different health providers, who will need
this data for different purposes. Information changes dynamically during the care provision,
for example when a new exam is prescribed or when the treatment plan of a patient is altered.
It is also important to understand that the health care information derives from the
combination of many atomic health data, which have to be assessed together to produce
useful information for the clinical decision making.
The majority of health related information coming from the analysis of numerical, Boolean
data but also from images, video and sound data.
3
Dimitrios Zikos: Data Driven Health Informatics
information that can be used by the clinician. If this measurement refers to the diastolic blood
pressure of a patient, measured in mmHg, then the health professional will have acquired
information about a patient’s blood pressure. This is a fact that we learned about this patient
and we can assess this information and compare with clinical expectations for this specific
patient, and make evidence based decisions about the treatment plan.
Access & delivery methods of information: Since the 90s there was a slow but steady
transition from traditional to electronic methods to store, access and communicate health
care information. There have been different architectures for the implementation of hospital
information systems over the past four to five decades, and the transition to new
architectures generally followed the evolution of information systems, in general. The recent
approach is based on dedicated private cloud services that can be accessed by health
professionals. This is an important evolution step, from the traditional views to access to
personalized services.
The information in health care is dynamic: patient information constantly changes during a
hospitalization and needs to be up to date. Health professionals need to constantly reassess
any new input for informed decisions.
The medical knowledge base is acquired via studying, revisiting, reviewing the medical
knowledge of a health professional’s area of expertise. It is not limited to the university
acquired knowledge, but also involves continuous education, reviewing recent literature and
publications, attending conferences and scientific meetings.
Information coming from patient: includes and is not limited to, the patient history,
medical exams, vital signs, radiology tests.
Experience, medical judgment: we refer to the clinician putting together all the different
pieces of a puzzle, including the three components above, to decide on a diagnosis for a patient
and consequently which one would be the treatment options. Doctors, during the medical
judgment have learned how to do a differential diagnosis, which is the process of
differentiating between two or more conditions that share similar signs or symptoms. For
nurses, the nursing assessment is part of the nursing process and is a systematic procedure
that nurses follow. It involves the collection and analysis of the available patient data, and
is not limited to the physiological, but also involving psychological, sociocultural, and lifestyle
factors too.
4
Dimitrios Zikos: Data Driven Health Informatics
Data mining for knowledge acquisition: algorithms are applied on large datasets to
predict health related events for a patient, like the diagnosis, the prognosis, and the optimal
treatment plan. This component is the most recent addition to the clinical decision making
mechanism, and still remains unexplored or, at best, underutilized in most hospitals. These
systems should serve as an additional input for the health professional, who in their turn,
will be expected to use this extra input for more accurate and less error prone medical
decisions. It is generally agreed, in the literature that these systems can neither act
autonomously, nor can dictate to the health professional, the appropriate practices. Decision
making systems, on the contrary, should be designed as an extension to the human cognitive
process that takes place during decision making.
Healthcare professionals combine their knowledge, experience and patient information, and eventually
will also consult model predictions which use historical data, to make clinical decisions
5
Dimitrios Zikos: Data Driven Health Informatics
that store, process and communicate information. It is obvious that the term informatics does
not refer to computational methods and the computer science, although many of the methods
that informatics employs, can be computer science methods.
Information and Communication Technology (ICT) – The communication technologies
and services that are used in various applications. The term describes these computer,
communication and multimedia technologies that can be used to receive, process, store,
display and disseminate information. ICT is an umbrella term and often is used to describe
the communication within applications in a specific domain, like for example ‘ICT in Health
Care’.
2. Talking with the patient, talking with other health professionals. Especially the
interactions between nurses and doctors are of great importance, taking into consideration
that nurses spend a considerable amount of time with the patient and are able to do nursing
observations which can provide an invaluable input to the doctor. A nurse can typically see
small changes or events to a patient’s condition, like for example loss of appetite, skin colour
changes, change of consciousness. The above can be significantly important and the physician
would not be in place to notice, timely. There is evidence that a hospital environment valuing
the interprofessional collaboration between health professionals, is an invaluable factor and
significant contributor to quality health care services and patient satisfaction.
4. Laboratory tests and radiology examinations: laboratory tests involve the analysis of
samples extracted from patients (i.e. blood, urine, tissue). Typically, a laboratory test is part
of a regular check-up, but during a patient hospitalization, these are usually performed to
6
Dimitrios Zikos: Data Driven Health Informatics
help shape a diagnosis. The analysis of enzyme concentration, blood elements, anti-body
tests, urine ketone concentration, are some typical examples.
Storage (and retrieval): involves the storage of data in physical medial, so that it can be
retrieved. The storage of healthcare related data is nowadays achieved via (i) the direct entry
of information into Electronic Medical Records, and consequently the storage of data in
relational (in most of the cases) databases, on the system backend (ii) using sensors, which
perform measurements and then send the data via a communication module to interoperable
systems (iii) via scanning handwritten documents and using optical character recognition
(OCR) technologies.
Storing patient information in portable devices is not a recommended practice, for data
privacy reasons. These devices (e.g. bed monitors, tablets) should send the measurements to
the main system via wireless technologies, without storing the data locally.
Information retrieval during the clinical practice involves healthcare professionals
navigating and using search tools found in graphical user interfaces of Electronic Medical
Records. Healthcare professionals are end users and have no direct access to the data, and
no access to database querying engines. A typical functionality of modern systems is the
support of advanced reports which visualize the clinical information, transforms the data into
useful representations with longitudinal data insights.
Communication: data moves from the point of data collection to storage, for analysis, and
finally, back to the point of data use. Communication of health care data within and across
subsections of a hospital information system, involves the use of communication protocols
and interoperability standards. Interoperable systems should not only be technically
compatible but should also achieve seamless data exchange.
Manipulation: data usually needs to be manipulated, to be combined with other data, and
aggregated, for statistical and healthcare analytics purposes. Data manipulation can be as
simple as the calculation of a patient age from the age of birth, or it can refer to more
advanced applications where data science methods are applied on data to generate prediction
models. Data manipulation may also refer to different representations of data for the end
user: it may be possible for the healthcare professional to view information about a patient,
based on a sequential time-line, which represents each clinical intervention and event
according to the time it occurred. At any given point, though, a healthcare professional may
opt for viewing the same exact information for that patient, based on the location of the
clinical interventions and events. For example, the output provides details about events,
classified by the location, e.g. events that occurred in a hospital ward, in the radiology
7
Dimitrios Zikos: Data Driven Health Informatics
department, in the surgery, etc.
Display: refers to the way that data may be displayed so that it can be easily understood and
used. Displaying information does not only refer to the physical output devices (monitors,
printers etc.), but primarily addresses the presentation of the information to the user via user
friendly interfaces, successful human computer interactions and functional dialogue systems.
The term ‘Medical Informatics’, while it precedes heath informatics, is still broadly used
nowadays and refers to the medical applications of health informatics, but is regarded to be
a subdomain of the latter. Today, there are many well defined sub-areas of health
informatics, which independently develop methods and new knowledge for their own areas
of specialization: dental informatics, nursing informatics etc.
The recent direction of health informatics is the focus on the integration of smart data
analytics and data mining algorithms for clinical and administrative decision making. This
direction is driven by recent technological and computer science advancements which make
it possible to analyze huge volumes of data with novel machine learning methods, to provide
accurate predictions and estimations which can be extremely useful during decision making.
Eventually, in the coming years, we will be witnessing the integration of predictive
algorithms into Electronic Health Records, which will be providing recommendations for
patients at the point of care. These recommendations will be a result of the analysis of
enormous amounts of historical patient data, in order to identify useful patterns, for the
diagnosis, therapeutic plan and prognosis of a patient, with the ultimate goal to improve the
quality of care in a patient centered health-care system.
Health Informatics uses information to improve health care. One needs to understand how
we define the concept of ‘improving healthcare’, and whether this concept is a quantifiable
one. This is a requirement, in order to meaningfully introduce health informatics
applications. We can understand, though, in the healthcare domain, that the ‘improvement’
concept is related to making more informed decisions which drive better patient outcomes,
and encourage safe and error-free provision healthcare of healthcare services, but also to lead
to improved hospital efficiency and an increase of revenue. As an interdisciplinary field,
10
Dimitrios Zikos: Data Driven Health Informatics
health informatics applies technology & information to enhance healthcare delivery,
biomedical research.
It is closely bonded with fostering education of health professionals and the public. Health
informatics can provide tools and methods for e-health literacy that is to reach out large
populations for disease prevention and health promotion, through targeted and personalized
interventions. Health informatics methods and systems, are also becoming important for the
education of health professionals and healthcare administrations.
Health informatics studies the process where health data, information, and knowledge are
collected, stored, processed, communicated, and used to support health care delivery to clients
and providers, administrators, and health organizations. Each one of those groups may be
utilizing the same data which have been transformed and processed in order to facilitate the
strategic goals of different professional groups in health care, and of patients as well.
Related terms
Consumer Health Informatics: both healthy individuals & patients want to be informed on
medical topics. MediQoC is one example of such systems, and will be discussed in appendix.
This system provides to the user the opportunity to navigate through Medicare healthcare
providers, by typing in their symptoms, in order to find appropriate and safe health care
services. The appendix describes the platform and the methods that have been used for the
development of MediQoC.
Health knowledge management: can prove to be extremely useful in an overview of latest
medical journals, best practice guidelines or epidemiological tracking. Nowadays there is a
wealth of new medical knowledge coming out every day. Hundreds of journal research papers
appear and it is virtually impossible for a health care professional to catch up with this
knowledge flow, in its raw form. Knowledge management tools provide categorized content,
classified on the basis of areas of interest, by the nature of findings, or by the impact of the
published results, making it easier for the researcher and the healthcare professional to
navigate through medical knowledge.
Bioinformatics: a branch of biological science which deals with the study of methods for
storing, retrieving and analyzing biological data, such as nucleic acid (DNA/RNA) and
protein sequence, structure, function and genetic interactions.
Biomedical Engineering: the application of engineering principles and design concepts to
medicine and biology.
Nursing Informatics: the "science and practice that integrates nursing, its information and
knowledge, with management of information and communication technologies to promote the
health of people, families, and communities worldwide." (IMIA Special Interest Group on
Nursing Informatics, 2009). The application of nursing informatics knowledge is empowering
11
Dimitrios Zikos: Data Driven Health Informatics
for all healthcare practitioners in achieving patient centered care.
Public Health Informatics: Chapter 18 discusses the principles of Public Health Informatics
in detail.
Bioinformatics: an interdisciplinary field that develops methods and software tools for
understanding biological data. As an interdisciplinary field of science, bioinformatics
combines computer science, statistics, mathematics, and engineering to analyze and
interpret biological data.
Example of Bioinformatics: applications in basic research
Human Genome Project – Scientists used fundamental research methods and techniques to map the
complete human genome
Provide enormous opportunity to understand human body in ways not previously possible
Relied heavily on IT to sort and manage the data to map human genome
Ability to identify and treat human diseases
E-Health
E-Health is a broad term for healthcare practice which is supported by electronic processes
and communication. It is a relatively recent term and can encompass a range of services in
healthcare and information technology. It is not clearly defined (for example some use it
instead of healthcare informatics, others use the term describing healthcare practice using
the Internet). A study published in the Journal of Medical Internet Research found 51
definitions for e-health. E-health offers a broader coverage of electronic/digital processes in
health, often including m-health applications too.
12
Dimitrios Zikos: Data Driven Health Informatics
Health informatics tools and methods
Health Informatics are not just “Computers in Healthcare”. It develops advanced methods to
develop and integrate seamlessly into the clinical practice components such as: clinical
guidelines, medical terminologies, clinical dictionaries and nomenclatures, information and
communication systems, decision support and recommendation systems.
The IT sector is very important enabling field for advanced health informatics applications and
methods. The introduction of ICT technologies has sky rocketed the discipline of health informatics.
Health informatics is not quite new, as you might think. Since healthcare services started to
become more systematic, the need for information management was eventually recognized
as an important priority. Here is some evidence:
The first version of the International Classification of Diseases (ICD) was initiated in the
year 1893. Ten updates followed, and we still use the same classification system, in its
10th edition
The first structured clinical guidelines appeared several decades before the widespread
use of computers
Hospital information management methods were evident before the 70s in hospitals,
whereas there are still hospitals in many developing countries around the globe, which
do not use computerized Electronic Health Record systems. It needs to be understood,
though, that, before the introduction of computers, these information management
methods, were typically limited to file maintenance & life cycle management of paper-
based files, other media & medical records.
13
Dimitrios Zikos: Data Driven Health Informatics
Health Informatics health care levels of interest
Single hospital The hospital needs A large area or a The healthcare system
department district at a national level
Health informatics also has application in the community health. The central objective of
community health is to improve the health characteristics of biological communities, in
geographical areas or within groups of people with common characteristics. Health
informatics, therefore contributes with specialized programs, methods and tools that are used
in population based healthcare services, for health promotion, disease prevention and
syndromic surveillance.
15
Dimitrios Zikos: Data Driven Health Informatics
Top Left: Knowledge Areas of health informatics Top Right: Technologies in health informatics Bottom
Left: Applications Architecture Bottom Right: Standards used in health informatics
The improvement of the Quality of Care is the ultimate goal of Health Informatics: this requires the
establishment of inteprofessional collaboration between different specializations and proper
education and training of users to be prepared to use new technologies effectively.
Interprofessional Collaboration
Presupposes AND Health
Education Healthcare Professionals
Reinforces Informatics
Patient Safety
Quality of Care
16
Dimitrios Zikos: Data Driven Health Informatics
Healthcare decisions are based on information
The figure below presents a typical example of a decision making process in a hospital. Every step
of this process produces data and, then, this data is communicated and transformed to useful
information.
17
Dimitrios Zikos: Data Driven Health Informatics
Telemedicine
Telemedicine (‘tele’= from distance) is the delivery of health-related services and information
from distance. Telemedicine could be as simple as two health professionals discussing over
the telephone about a patient, or more sophisticated as using videoconferencing to between
providers at facilities in two countries, or even as complex as robotic technology.
Tele health is an expansion of telemedicine. It encompasses preventive, promotive and
curative aspects. Today tele health addresses an array of technology solutions, simple or more
complex. For example, physicians use email to communicate with patients, order drug
prescriptions and provide other health services.
Some important clinical uses of tele health include (i) the transmission of medical images for
diagnosis (ii) groups or individuals exchanging real time health services or education live via
videoconference (iii) transmission of medical data for diagnosis or disease management
(remote monitoring) (iv) advice on disease prevention and promotion of good health by patient
monitoring and follow-up.
18
Chapter 2. The Nature of Data in Healthcare
Healthcare is Data-Intensive
As we already discussed in Chapter 1, healthcare is a data intensive process. Many processes
run at the same time producing new data, literally every single second. Some of these data
are high resolution, uncompressed images (x-rays, CT-scans) which take up a lot of storage
space and need to be further processed to become clinically useful. Multiple records of same
data are often created for each patient and these records are stored and maintained with
older measurements and observations. These longitudinal considerations are extremely
important for a patient evaluation, since they help clinicians reassess their treatment plan,
and make more accurate patient prognoses.
Reference data is data that defines the set of permissible values to be used by other data
fields. For example, the attribute icd-10 diagnosis uses as reference data a table of 60,000
diagnoses codes one can pick up from. Unfortunately, not all data are drawn from reference
data. In many cases, data produced during the clinical care are free text, like in the case of a
nursing assessment.
Very frequently, data comes from processing of other data: for example, clinicians would want
to know the in-hospital mortality ratio for a given disease when they need to provide
treatment to a patient with that same diagnosis. In our example the in-hospital mortality
ratio will be calculated by the formula NX’ / NX
where NX’ = Count of in-hospital deaths for patients with a medical diagnosis x and NX =
count of admissions of patients with a diagnosis x.
To make our example more interesting, let us now assume that the clinician wants to know
if this diagnosis x is a high risk one. In other words we now want to investigate if disease x
should lead the clinician to the conclusion of this patient being a high-risk case for in-hospital
mortality. For now, we will define high risk disease for in-hospital mortality, as a disease
which causes deaths at a higher ratio compared to the death ratio of the whole patient
population.
The Standardized mortality ratio1 would be calculated by the formula
(NX’ / N) / (NA’/NA), where
NX’ = Count of in-hospital deaths for patients with diagnosis x
NX = Count of admissions of patients with diagnosis x
1
Standardized Mortality Ratio (SMR) is a ratio between the observed number of deaths in a study population and
the number of deaths would be expected, based on the age- and sex-specific rates in a standard population and
the age and sex distribution of the study population.
19
NA’ = Count of in-hospital deaths for all hospital admissions
NA = Count of all hospital admissions
This information is easily calculated from data which has already been collected and stored
into the Electronic Medical Record. The data has been collected retrospectively, for the needs
of the clinical care of past patients. The health care professional will therefore be using
historical data in order to understand and evaluate the health of a current patient, at a
present time. When data is used for purposes other than the ones these have been collected
for, we call this use secondary use of data.
Discuss in class
“Reviewing the five most recent measurements of the blood pressure of a patient, to see if the new
medication schema works well”
“A afternoon shift nurse reading through the nursing assessment notes of the nurse who just finished
her morning shift, for a given patient”
“The estimation of bed availability in a specific hospital department, via the count of patient admissions
and dates, count of patient discharges and dates, and bed capacity, for that given department”
“A warning indicating that a given patient is at risk of developing in-hospital infections, through the
analysis of similar case-mixes of past patients”
20
Discussing with the patient and their caregivers: patient demographics, family history,
occupational history, allergies, pathology by system, past diseases and surgical operations
and information about the social health, all provide to the physician extremely important
input for the patient assessment
Routine ward measurements: e.g. vital signs (blood pressure, respirations per minute,
temperature, pulses per minute), fluid balance (from fluid intake-output)
Physical examination: including percussion, observation, auscultation, palpation
Laboratory tests: blood tests, urine analysis etc. These data have been ordered by the
medical doctor in charge and Laboratory Information System (LIS) receives this order, and
as soon as the samples arrive at the hospital laboratories, there is a variable required time
for each test to be processed. As soon as this is done, the results are uploaded via the LIS and
the Electronic Medical Record would then be updated with the laboratory test result. The
physician will then be notified and timely review the results, in order to make informed
decisions.
Radiology department: medical images, segmentation and handling of images using
DICOM systems, assessments from radiologists.
Pharmacy: including Rx, (re)-stocking and ordering
Patient assessment: medical diagnosis, ordering of laboratory examinations, decisions of
the appropriate medication. During a patient hospitalization, there is typically only one
primary patient diagnosis. This is the diagnosis which, in the majority of the cases, is
considered to be the main reason that led to the decision for a patient admission to the
hospital. Secondary diagnoses, are either pre-existing diseases (usually chronic conditions)
or were diagnosed during the hospital stay.
Tracking down medication: dosage, method of administration (e.g. intravascular,
intramuscular) and time-intervals. Nurses will be responsible for the management of the
medication administering to the patient, according to the physician guidelines.
Discharge data: discharge destination, discharge method, discharge outcome(s).
Data produced by the patient: patient experiences surveys have recently become the norm
and it is widely recognized that the patient feedback does matter. The “Hospital Consumer
Assessment of Healthcare Providers and Systems Survey” (HCAHPS) is the most commonly
used patient experiences survey and is widely used by many healthcare providers. The author
of this book was member of the working group to adapt the survey to other languages2.
Staff records: including but not limited to information about personnel shifts, department
capacity, distribution of human resources, hours of leave and other.
2
Squires A, Bruyneel L, Aiken L, Van den Heede K, Brzostek T, Busse R, Ensio A, Schubert M, Zikos D, Sermeus
W. Cross-cultural evaluation of the relevance of the HCAHPS survey in five European countries. Int J Qual
Health Care. 2012; 24(5):470-5
21
Hospital Budgeting: resources allocation, revenue projections, planning ahead for the fiscal
year. Hospital budgeting is not a trivial process by any means and requires multidisciplinary
work of people who know the healthcare market, understand health economics and
reimbursement challenges, and health professionals who can foresee the healthcare delivery
challenges.
Payments: including patient Diagnosis Related Groups (DRGs), insurance information,
Medicare or Medicaid information, and any other payer data. A Diagnosis-Related Group
(DRG) is a statistical system of classifying any inpatient stay, into groups for the purposes of
payment. The DRG classification system divides possible diagnoses into more than 20 major
body systems and subdivides them into almost 500 groups for the purpose of Medicare
reimbursement3.
Hospital Quality of Care Evaluation and Quality improvement data: we will devote
a separate chapter for this very important topic, to discuss the strategic goals of the
healthcare system, discuss the dimensions of the quality of care and patient safety and
strategies for the healthcare system to assess the quality of health care delivery.
3
Gillian I. Russell, Terminology, in FUNDAMENTALS OF HEALTH LAW 1, 12 (American Health Lawyers
Association 5th ed., 2011.
22
The most obvious example of derived data is that of the medical diagnoses. Since 2014, every
healthcare system in the United States started to use the 10th version of the International
Classification of Diseases (ICD). This is an enormous list of approximately 68,000 different
codes, following a hierarchical organization. Each code represents a medical condition. The
doctor needs to decide which code would accurately describe the patient condition, and to
select the appropriate ICD-10 code for that condition. There are several levels of depth in
ICD-10 and the diagnosis often does not reach the deepest level for a condition.
ICD is not the only classification system in use. For the vast majority of in-hospital data,
there is existing one or more classification systems, which standardizes the data entry and
retrieval process. Chapter 7 will cover some of the most important classification systems and
standards which are used by the major health care providers in the United States.
Numeric data, such as physiological measurements and laboratory test results are captured
and entered into the Electronic Medical Records without any modifications and without any
use of reference data. In this section, we will discuss health care data of various data types.
Numeric Data
Most of numeric data are clinical data “produced” by those directly involved in healthcare.
Numeric data allow a much more efficient data manipulation and therefore a more effective
use of data to produce aggregated information and make other simple of more advanced
calculations. Numbers often come from clinical measurements in hospitals, like, for instance,
vital sign measurements. These data have to be entered into the system indicating the exact
value of the measurement. Acceptable decimal precision varies, according to the nature of
the data and the precision of the measurement device, if used.
Many numeric data in everyday practice are derived data. For example, the 24h fluid intake
and fluid output are used to calculate the fluid balance of a patient. Laboratory examination
results are very often in numeric format. These are often accompanied by the reference
normal values. An Electronic Medical Record, nowadays, is expected to have those normal
values integrated. Therefore, during data entry, when a laboratory test result for a patient
is off bounds, this should be indicated with the use of a different color and potentially a
notification should be generated. It should be mentioned, though, that very often, there are
different normal value bounds for different age groups and patient gender. This should be
taken into account in such implementations, which should automatically recognize the
patient demographics and present personalized notifications for a patient, which are in
accordance with patient attributes such as gender and age group.
Comparison of new examination results with previous results of the current and recent
hospital stays is of uttermost importance for health care professionals, so that they can better
assess the disease progression, re-evaluate the therapeutic plan and make a more informed
assessment of the patient prognosis. It is therefore, important, for comparisons to be made,
and the result of these comparisons should be notified to the physician. We will use, in our
23
example hypertension, which is a very common condition and the major risk factor for
strokes. A female 77-year-old patient was recently admitted to the hospital for uncontrolled
hypertension and was found to have a blood pressure of 210/120 mm Hg. She was therefore
admitted as an emergency hypertension case. As soon as the patient was admitted, she
received hydralazine IV, and was been monitored with the use of a bedside monitor. Nurses
have been checking the monitor every hour and updating the patient record with the blood
pressure value. Few days, later the female patient was measured with a 140/85 mmHg blood
pressure, which, while still higher than normal, when evaluated with a temporal insight,
would indicate a significant improvement.
Examples of laboratory tests that produce numeric data are numerous. Numeric data are
also often accessed and utilized by professionals indirectly involved in patient care. An
example can the number of vacant beds in a hospital department. Again this may be a
number which has been calculated from other primary data: ‘Bed capacity’ MINUS ‘Number
of patients currently hospitalized’.
Some numeric data is used by the hospital quality department or healthcare policy makers.
Usually these data are in the form of indicators: for example, the mortality and morbidity
during the month June 2012, and the number of cases of infectious diseases divided by the
number of patients admitted during a set time period.
Boolean Data: Boolean data types in health care can only have two values (usually denoted
true and false), which represent the truth values of logic and Boolean algebra. Boolean data
should not be confused with categorical data with two categories (for example gender-
male/female).
Alphanumeric Data: also frequently generated during the healthcare process. In most
cases, these are data produced after the “mediation” of the human brain (healthcare
professional) and also in most cases during the interaction between the healthcare
professional and the patient.
Data in healthcare can be images: the medical imaging methods below produce data in
the form of images. Medical images are generated by medical imaging devices. Medical
24
images are nowadays digitally produced and stored into the storage device of the computer.
Images are usually compressed with simple lossless and near-lossless methods and usually
require large storage space. These are the six of the most common radiology tests:
(i) Radiography
(ii) Computer Tomography (CT) and the High Resolution Computed Tomography (HRCT)
(iii) Magnetic Resonance Imaging (MRI)
(iv) Ultrasound-Mammography
(v) Nuclear Medicine Imaging
(vi) Photo acoustic imaging
The most popular standards that are used to store and transmit medical images, are
PACS (Picture archiving and communication system): it is a medical imaging technology
which provides economical storage and convenient access to images from multiple
modalities (source machine types)4.
DICOM (Digital Imaging and Communications in Medicine): standard for handling,
storing, printing, and transmitting information in medical imaging. It includes a file
format definition and a network communications protocol.
There are other obvious data in the form of images in some healthcare organizations, like a
patient photo which is uploaded into the Electronic Medical Record.
The trend is to reduce the use of free text data as much as possible: Information in
the form of codes is assigned to each separate concept with significant benefits related to the
data quality. The advantages of using classification systems are also significant to
researchers, since they can save significant amount of time for data preparation, trying to
manually merge descriptions of conditions with different wording (different syntactic) but
same meaning (same semantic). Chapter 7 outlines the importance of classification systems
and discusses a selection of some critical standards.
Four important hospital data procedures and the method of information acquisition
Clinical Procedure Method of information acquisition
Nursing Evaluation Written nursing assessment, plan and follow-up
Diagnosis Combination of a series of practices: clinical measurements,
laboratory tests, clinical observation
Treatment plan Reviewing the patient condition, and comorbidities, past and current
medications that the patient takes and patient allergies
Patient History Discussion with the patient and/or his family
Various Reports Written analysis of events which are sometimes required by the
existing legislation
4
Choplin R (1992). "Picture archiving and communication systems: an overview". Radiographics. 12: 127–129.
25
Classification of Hospital Data based on their Source
The table below outlines examples of data types which have been organized by their source.
In other words, the table will provide to you an idea about the location where new data is
being generated in a hospital. Discuss the importance of each of those hospital locations and
to the quality of healthcare services and how each location contributes to the clinical and
administrative decision making
26
The fundamental data acquisition methods
(talking with patients, physical
examination, clinical measurements,
laboratory exams and radiology tests)
consist of the main source of data to
populate Electronic Medical Records.
Typically the patient history precedes the
physical examination which precedes the
physiological measurements.
Derived Data
Derived data are data elements derived from other data elements using a mathematical, logical, or
other type of transformation, e.g. arithmetic formula, composition, aggregation. Modern Healthcare
Information Systems should calculate these data automatically. Derived data are useful for:
27
involves a medical doctor who puts together and combines the physical examination, laboratory test
result and patient history data, to make a diagnosis.
Glucose levels of 125 mm Hg are assessed differently when combined
with different patient demographics information: for some 23 year old
patient with Type I diabetes, this is considered a normal value, but this
would not be the case for a non-diabetic person.
Health care professionals should, therefore, have at their disposal, tools which provide easy access
to patient data and generate reports summarizing all the clinical information that is available for a
patient, at any given point of the health care provision (e.g. patient history, clinical observations,
laboratory test results, medical imaging).
Cognition: Health care data should be assessed with human cognitive skills. Differential diagnosis
and other cognitive procedures based on knowledge and skill-sets of health professionals are always
crucial when new data about a patient becomes known and needs to be assessed. Clinicians acquire
medical skills and knowledge and a have a dynamic understanding on how the information they
have in their hands can direct them towards specific clinical decisions.
This cognitive process is systematic and varies across different categories of health care
professionals. Physicians perform differential diagnosis that is the process of differentiating between
two or more conditions that share similar signs or symptoms, while, for nursing practitioners, the
nursing diagnosis is a clinical judgment about individual, family, or community responses to the
health problem. Medical education and continuing professional development are important success
factors for this dimension.
Shareability: Health care data should be shared across the healthcare system and between
different health care professionals, to become more meaningful. No health professional should ever
act in an introverted manner within the healthcare system in that respect.
28
An MRI test cannot be solely assessed by the radiologist, but should be
shared with the physician who is going to review the MRI to make
informed decisions about the patient.
One of the most important requirements to seamlessly share data is to achieve a highly
interoperable environment. Business, technical and information interoperability, are all invaluable
requirements for the fundamental shareability property. Interoperability is the ability of a system
to work with other systems without special effort on the part of the health professionals: data should
be exchanged across the health care system seamlessly. Health Level 7 is the most important
interoperability standard nowadays, and addresses the business, technical and information
dimension of interoperability. Interprofessional collaboration is also crucial, since it is not always
sufficient for the information to be inserted into the records. Often, health professionals need to
discuss to understand qualities of the observations and exchange their insight on the condition of a
patient.
Longitudinality: Health care data should be assessed with a longitudinal insight. The progression
of a disease is not linear, neither are the therapy outcomes. In addition many of the health care
procedures are repeated during the course of a patient hospitalization (e.g. measurement of vital
signs, blood tests). When these data are reviewed, health care professionals need to recognize any
longitudinal changes and patterns over time and assess the disease progression and treatment
effectiveness. There are many tools available that can be used to visualize data. Nurses do not need
to complete manual charts of the vital signs, since these are auto generated from the data.
Longitudinal data can form the basis for predictive modelling of the patient outcomes and the
effectiveness of medical treatments.
Morning blood glucose levels of 135 mm Hg would seem to be elevated
for a given patient, but the clinician would not worry if, for that patient,
five preceding daily higher measurements, showed steadily decreasing
blood glucose levels day by day.
Source: raywinstead.com/bp
29
The four health data properties, namely non-atomicity, cognition, sharability and longitudinality
are not unique to healthcare, but their existence is undeniable, and indeed, all four appear in co-
existence in virtually any clinical health care environment. Hospital administrators and policy
makers can prioritize the organizational and technical requirements that need to be built around,
and support these fundamental properties, crucial to the clinical decision making process and
ultimately to the quality of health care services.
The information exchange is non-stop across the healthcare system; patients, being the main source of
health data are placed in the center of the care
31
Chapter 3. Data Workflow and Users of Health Data
The patient should be the main focus
and is placed in the center of the care.
The majority of health care data is
directly or indirectly acquired from the
patient using many different data
collection methods and is used in favor
of the patient in a constant effort to
achieve optimal outcomes, and improve
the quality of care.
32
In-class discussion
Below is an extended list of various health care data users. Try to identify the group (out of the five
above) each health care data user belongs to and then discuss in the classroom:
(i) What is the data of primary interest for each type of user
(ii) How each type of user would access these data and
(iii) What is the main use of this data for each type of user
Data warehouses
Data warehouses and knowledge base systems are used to facilitate decision making. A common
strategy to effectively govern hospital data is a data warehouse. This allows the hospital to merge
individual databases to a central location for robust reporting and analysis. Data sets from
Electronic Medical Records, disease registries etc. are stored in a data warehouse and data sets can
be pulled from there and analyzed. One common use of these data is to help with decisions for current
patients. Looking back, at the historical data, there can be identified many lines of past patient data
with similar clinical profiles. These past data, for example, can be used to decide which would be an
optimal treatment strategy for the present patient case. Hospital data warehouses offer an
extremely useful data source for researchers, who plan to conduct epidemiologic studies, and for
33
health care administrators, who want to study the health services utilization and payment patterns,
to plan and implement a more effective planning and budgeting strategy.
Object Oriented Modeling of the Health and Safety Process in the Case of the Agricultural Work
The agricultural work is associated with a series of adverse health effects. The causation of the work related
problems is being discussed in many research papers. These factors fall into five major categories, namely the
exposure to hazardous agents, the type of agricultural work, the level of use of Personal Protective Equipment
(PPE), specific demographic characteristics and finally cognitive factors. The above are of great importance for
Primary Healthcare and specifically in Occupational Health and Safety. The five parameters are interrelated
through a specific schema, which describes the occupational health and safety considerations. This is achieved
through an object oriented modeling procedure that may be used for the overall understanding of farmers’
health dynamic by primary healthcare professionals and as a tool to support the development process of Primary
Healthcare and Occupational Health and Safety Information Systems in rural areas.
5
Diomidous M, Zikos D. Object Oriented Modeling of the Health and Safety Process in the Case of the
Agricultural Work. AIM; 2009; 17(4):205-208
34
Object Name Object Code
Prevalence of work related diseases and symptoms among farmers OBJ_01
Frequency of use of PPE in the farming population OBJ_02
Farmers’ knowledge and perceptions on occupational health and safety OBJ_03
Demographic characteristics of farmers OBJ_04
Duration of exposure to the agricultural work OBJ_05
Type of agricultural production OBJ_06
In class practice
Try to provide examples of primary data which are being used to calculate the above derived data
examples. For instance, to calculate the absolute difference of the most recent blood glucose
measurement with the previous one, for a patient, we need to consider two temporally consecutive
values of the same attribute. In other cases, we need to consider two or more different variables,
which will be used in simple of more complex calculations.
We will now explore this simple example: what we see in the table below are the daily measurements
of blood glucose for a specific patient. Have a look at the table for 30 seconds. Then please write
down your observations about any patterns of blood glucose level changes and the linearity of any
variations over the seven-day period.
1/20/2016 9 AM 111
1/20/2016 6 PM 123
1/21/2016 9 AM 108
1/21/2016 6 PM 135
35
1/22/2016 9 AM 113
1/22/2016 6 PM 137
1/23/2016 9 AM 111
1/23/2016 6 PM 142
1/24/2016 9 AM 113
1/24/2016 6 PM 145
1/25/2016 9 AM 108
1/25/2016 6 PM 162
1/26/2016 9 AM 109
1/26/2016 6 PM 177
36
We will now try something else, which will be most revealing. We will calculate the average daily
change of the blood glucose levels, in the morning and the afternoon separately.
Average Morning Change of Blood Glucose Value= (M2-M1) + (M3-M2) + (M4-M3) + (M5-M4) + (M6-
M5) + (M7-M6) / 6 = 3 + 5 - 2 + 2 - 5 +1 / 6 = -2/6 = -0.3 (no change)
What about the afternoon?
Average Afternoon Change of Blood Glucose Value= (A2-A1) +(A3-A2) +( A4-A3) + (A5-A4) + (A6-
A5) + (A7-A6) / 6 = 12 + 2 + 5 + 3 + 17 + 15/ 6 = 54/6 = 9 mg/dl
This is an average increase of 9 mg/dl every afternoon. If we plot the values above table, this pattern
is obvious. This patient has been found to have a substancial and progressing increase to the
afternoon blood glucose levels, which would require immediate attention.
This rudimentary example indicates that, in order to successfully understand and communicate
knowledge on a disease progression, we need to (i) be provided with a longitudinal insight on the
fluctuation of health parameters (ii) identify interesting patterns and often seasonal trends. It is
interesting that many measurable levels of symptoms and conditions (such as pain level, mood and
depression, and various physiological parameters) frequently fluctuate during the course of the day,
often in clinically interesting patterns.
37
Questions for Discussion
1. Discuss about four user groups of health data and the primary use of data for each group.
2. Describe with a diagram, how clinical decisions are made in a hospital. Your diagram
should present the users, procedures, interactions and data communication during these
hospital procedures and interactions.
3. Explain what primary data is and what derived data is. Provide examples.
4. To successfully assess the health status of the population and to identify risk factors, will
the disease history, and the patient demographics suffice?
38
Chapter 4. Data Collection Methods in Healthcare
How data are collected
In this chapter we will discuss the data collection methods in healthcare. In general, these methods
can be classified into three broad categories:
Direct intrusive (e.g. vital sign measurements with traditional means, asking the patient to
provide information, patient history)
Direct non-intrusive (e.g. use of sensors to measure physical properties, instances clinical
observation)
Indirect intrusive (e.g. samples collected and analyzed in hospital laboratories)
As far as the means of data collection are concerned, we will start discussing the interview, which is
a common, fundamental technique for collecting data in healthcare. Interviewers ask respondents
in person and write down the responses. The interviewing process requires many considerations,
that need to be made and the interviewer needs to be aware of the context, the responder and a
series of human behavior and psychological factors, during the discussion. Building a trustful
environment is very important for the successful interview. Interviews are usually conducted one-
to-one in healthcare, usually between a health care professional (interviewer) and a patient or their
caregiver (responder).
In some cases, interviewers call respondents by phone and write down/record the replies on semi-
structured or well-structured questionnaires, depending on the scope of the data collection. In other
cases, this is done via mail; once the responders fill up the questionnaires, they mail them back.
Very often, questionnaires are self-administered. The responder is provided with the questionnaire
and the interviewer will return to the responder to take the completed questionnaire back. Often,
paper questionnaires are mailed to respondents. Web questionnaires are becoming more and more
popular recently; the interviewer can setup a questionnaire and disseminate it via online invitations
to responders who will then be completing the online forms. In some cases, OCR technologies, CATI
(computer assisted telephone interviewing), TDE (Touchtone data entry), IVR (Interactive voice
response), may be utilized.
Approaching the patient to collect the medical history. Clinicians or therapists (interviewers)
will consider introducing themselves in a friendly manner if the patient is a new case. The first
39
impression is important and any psychological defense mechanisms will be more easily handled
when an environment of trust is built. The interviewer would then consider to kindly ask if it is fine
to ask some questions & comment about their willingness to help. It is important that the
interviewer approaches the patient in a respectful manner, while at the same time, keeping the roles
of the interviewer (clinician) and the patient (responder) very well defined.
The latter requires is a very careful balance that needs to be maintained and there is a learning
curve during the clinical practice through gaining experience. Identifying the core problem of the
patient, will then be the starting point. Very often, patients will be overemphasizing on small aspects
of their health problem, by underestimating oftentimes more important issues (not always
deliberately).
Medical history
It is the most common data collection process which is conducted via the method of interview. The
medical history of a patient is typically collected by a physician who asks a series of questions either
to the patient directly or to someone else who can provide accurate feedback (e.g. family members).
The aim of the medical history is to obtain information useful to make a diagnosis and decide for a
treatment plan. The responder usually starts by reporting symptoms, and then responds to more
questions. The physician will ask the patient about past medical problems, hospitalizations and
40
surgeries, past injuries, medications, allergies, family history, social history (e.g. alcohol
consumption, smoking, use of drugs, sexual life) and occupational history.
Review of Systems: Screen for symptoms in each body system that have not already been
discussed. Skin, eyes, ears, nose, mouth, sinuses and throat, lungs, heart, digestive system,
genitourinary, hematologic, endocrine, musculoskeletal, neurological system and psychiatric
history.
For the majority of the interactions with a patient, the health care professional will capture the patient
response either using numeric scales, closed questions with ordered response scales or scales with
categorical items.
Questions with numerical data-type responses
How many fever waves do you have every day on average?
What is the maximum fever you had during the course of the disease?
How many days has it been since the symptoms appeared?
Closed questions with categorical response options
What is your ethnicity?
Closed questions with ordered response scales
How would you self-evaluate your health status? (Excellent-Very Good-Good-Fair-Poor)
6
Pappas Y, Anandan C, Liu J, Car J, Sheikh A, Majeed A. Computer-assisted history-taking systems (CAHTS) in health
care: benefits, risks and potential for further development. Inform Prim Care. 2011; 19(3):155-60.
41
taking. There are, though some noted advantages and potential limitations from using CAHTS to
gather the medical history.
Advantages
Decreased social desirability bias: patients will be unlikely to answer questions in a manner that
will be viewed favorably by a soulless computer
Patients may be more likely to report unhealthy lifestyle behaviors, there is no feelings of shame
when reporting to a machine
Easy high-fidelity portability to a patient's electronic medical record, using plug and play
computer-aided history taking devices
Limitations
Computer-aided history taking systems still cannot successfully detect non-verbal
communication, which may be useful for elucidating anxieties and treatment plans.
Patients may feel less comfortable communicating with a computer as opposed to a human,
although this becomes less of a problem with the widespread use of computer in the everyday
life of most citizens.
42
from a patient, are some examples where the use of Likert questions is frequent. In general, the
above use cases involve questions which require responses of subjective nature.
Discussion
Which of the questions below could use the Likert scale for the patient response?
Did the nurses respond timely to your request?
Did you also experience fever during the last three days?
Can you specify the level of pain that you experience using the 0-10 pain scale?
Did you take the evening pills?
Clinical Observation
The clinical observation is another important method for collecting data. Observations are
susceptible to observer biases, but during the clinical process, there is a typical and standardized
procedure during an observation, called physical examination. A clinical observation is used by
skilled clinicians, to obtain information, usually about their patients. These are observations of
behavioral and psychological characteristics that will be useful to make a diagnosis and decide upon
an optimal treatment plan. Very often, the clinician takes notes during (or shortly after) the
interaction with a patient. Clinical observations are widely recognized to be the basis of therapy and
treatment and an extremely useful data collection technique in healthcare. The information
obtained during the medical history, when combined with the physical examination, formulates the
basis for a diagnosis and treatment plan.
It is important to stress, at this point, that the two methods of data collection we discussed so far,
namely the interview (medical history) and the observation (physical examination) are the
cornerstone of data collection for clinical decisions.
43
Data Standards. Lack of data standards that contain definitions and taxonomy, may result in
making data acquisition from electronic systems very difficult. Multiple disparate systems co-
existing in a hospital, not communicating with each other will require labor intensive data mapping
by health providers to link systems. Similar challenges exist between different health care
organizations.
Discussion
What is a sensor?
A sensor is a converter that measures a physical quantity and converts it into a signal which can be
read by an observer or by an (today mostly electronic) instrument. A common example of a sensor
that we use in everyday life, is a mercury-in-glass thermometer, which converts the measured
temperature into expansion and contraction of a liquid which can be read on a calibrated glass tube.
A sensor receives and responds to a signal when touched. Sensors are used in everyday objects such
as touch-sensitive elevator buttons. Applications include cars, machines, aerospace, medicine,
7
Waghorn G, Lloyd C. Population Health Surveys: an introduction to basic concepts. International journal of therapy
and rehabilitation. 2009; 14(4): 191-8.
45
manufacturing and robotics. Sensors ‘sense’ and measure the property using different methods,
based on which sensors can be classified to electrochemical, electromagnetic, electromechanical,
photoelectric, thermoelectric and electroacoustic.
Sensors measure a physical A sensor has three The multiplexing step (Mux) involves the
property and transform an layers, to sense, selection one of the several analog input signals
analogue (continuous) to measure and which is forwarded into a single line.
digital (discrete) signal. communicate the
measured property
A sensor's sensitivity indicates the sensor's output changes when the measured quantity changes. For
instance, if the mercury in a thermometer moves 1 cm when the temperature changes by 1 °F, the sensitivity
is 1 cm per degree.
Sensors that measure very small changes must have very high sensitivities
We should minimize the impact sensors have on what they measure. Sensors need to be designed to have a
small effect on what is measured; making the sensor smaller often improves this.
Is sensitive to the measured property only
Is insensitive to any other property likely to be encountered in its application
A sensor should not influence the measured property
The output signal of a sensor is linearly proportional to the value of the measured property
The sensitivity (minimum input of physical parameter that will create a detectable change) should be high
46
monitored for their health conditions on a 24/7 basis, so there is no excuse for missed sensor readings
and sensor measurement errors. Energy conservation technologies, would help expand the
autonomy of a sensor, before a battery charge circle or even battery replacement would be required.
There are cases where operation in hospital buildings results in further interference due to walls,
etc. and this might be decreasing reliability. It is therefore important, prior to the installation of
hospital sensor network, to be verified that the signal can be transferred seamlessly, without any
interruptions.
In addition, since patient data is sensitive information, data packets transferred via sensors should
remain confidential and unreachable by third parties. For this reason, sensor devices should
integrate protocols and data encryption technologies, to be compliant with the HIPAA guidelines
during the data transmission.
Some additional considerations that need to be made include event ordering, timestamps,
synchronization and quick response in emergencies. All these requirements are especially important
in the health care domain, since at a given time point there might be a large number of sensor
measurements from multiple patients and the transmission needs to be prioritized, while the
transferred data need to be labelled with the exact time that the measurement occurred. Consider
that sensor data will be utilized for a longitudinal insight of the patient condition: sensor data
contribute significantly to the understanding of temporal patterns from the health professional, for
a patient.
The integration of many types of sensors demands new node architecture approaches: multi-sensor
networks have generated the need to develop novel data collection and integration technologies to
respond to the bandwidth requirements and the dynamic synchronization of sensor data.
Furthermore, the trend, for the coming years is to use sensor data for predictive modeling, for an
insight on how specific sensor data patterns can be related to optimal treatment decisions. In
addition, there is a widely recognized need to integrate available specialized medical technology with
wireless networks (for example, wearable accelerometers with integrated wireless cards for patient
monitoring). Many commercial applications, which have already become accessible and affordable,
recently appeared on the market, and target (i) healthy citizens who want to monitor health and
lifestyle parameters (ii) chronic disease patients who would monitor their disease related
parameters more efficiently and (iii) health care organizations, which utilize sensor networks for
patient monitoring.
Some of the most obvious benefits from the use of sensors in healthcare, include savings on medical
expenses, time (less face-to-face appointments are required), while the automated data collection
allows the participation of more participants in clinical trials. Measurement bias8 is eliminated and
no measurements are skipped due to possible negligence by a health care professional.
8
A systematic error that occurs when, because of the lack of blinding or related reasons such as diagnostic suspicion, the
measurement methods (instrument, or observer of instrument) are consistently different between the groups in a study.
47
Jawbone UP: a flexible Basis: wrist-worn
wristband packed with device that measures
vibration and motion the wearer’s heart
sensors to track and rate, caloric burn,
analyze exercise, diet, sleep patterns
and sleep data
Medical sensor applications include patient monitoring, environmental tests and diagnostics. Some
of the most popular medical sensors are being used for the measurement of:
Patient vital signs: temperature, blood pressure, heart rate, respiratory rate
Pulse oximetry: pulse oximeters are non-invasive devices used to measure a patient's blood-
oxygen saturation level and pulse rate
Blood Glucose sensors
Physical pressure: many applications for orthopaedic patients, in neurology (e.g. monitoring of
patients after stroke)
FMRI Sensors: functional neuroimaging procedure using MRI technology that measures brain
activity by detecting changes associated with blood flow
Mobile Phone sensors, such as accelerometers, gyroscope
The box below discusses challenges and potential considerations that need to be made when a new
medical sensor technology is considered for purchase by the hospital management. Discuss the
significance and validity of the arguments below and suggest meaningful and feasible strategies to
address these points.
The “cost” challenge: discuss the following opinions, about considerations driving decision for
investment of sensor technologies by healthcare organizations.
“Too expensive sensors will never be accepted by the healthcare management”
“Health professionals do not have the knowledge to use this new technology”
“Staff making the measurements would be a cheaper manual alternative”
“Sensors can have a real impact only if they are of low cost”
“Devices using sensors have to be portable”
48
Classification of healthcare sensor systems based on the number of sensors
Single sensor data: one single sensor measures a specific physical property (e.g. temperature or
acceleration). Single sensors are often used to automate basic clinical requirements, such as
measuring the vital signs.
Multi-sensor data: multiple sensors are combined, to measure a property or an event of interest,
which cannot be measured with a single sensor. For instance, in homecare, a device combining an
accelerometer and an ECG sensor can be used to detect cardiac arrest. A patient fell (acceleration)
due to losing consciousness seconds after a cardiac arrest (ECG). Multi-sensor data have recently
been the focus of research, since the combined used of different sensors and meta-data, can be used
to recognize health related events.
An example of a wireless vital sign sensing device is the
wireless pulse oximeter and wireless two-lead
electrocardiogram (ECG). These devices collect heart rate
(HR), oxygen saturation and ECG data and relay it over a
short-range (100m) wireless network to any number of
receiving devices, including computer tablets, laptops,
ambulance terminals. Data can be displayed in real time &
integrated in the developing pre-hospital patient care record.
The sensor devices themselves can be programmed to process
the vital sign data, for example, to raise an alert condition
when vital signs fall outside of normal parameters. Any adverse change in patient status can then
be signaled to a nearby Emergency Medical Technician (EMT) or paramedic.
Biosensors
A biosensor is an analytical device, used for the detection of an analyte that combines a biological
component with a physicochemical detector. The sensitive biological element (e.g. tissue,
microorganisms, organelles, cell receptors, enzymes, antibodies, nucleic acids, etc.), is a biologically
derived material or biomimic component that interacts (binds or recognizes) the analyte under
study. The biologically sensitive elements can also be created by biological engineering. Some of the
biosensor applications include glucose monitoring in diabetes patients, the detection of pathogens,
routine analytical measurement of folic acid, vitamin B12 and other, as alternatives
to microbiological assay, drug discovery and evaluation of biological activity of new compounds.
49
Biosensors-an example
The blood glucose biosensor the enzyme glucose oxidase to break blood glucose down. In doing so it first
oxidizes glucose and uses two electrons to reduce the FAD (a component of the enzyme) to FADH2. This in
turn is oxidized by the electrode (accepting two electrons from the electrode) in a number of steps. The
resulting current is a measure of the concentration of glucose. In this case, the electrode is the transducer
and the enzyme is the biologically active component.
9
Differential signaling is a method for electrically transmitting information using two complementary signals
50
Wearable Smart Clothes
Using the rapidly improving wireless communication technologies and advanced sensors available
today, many companies and universities are proposing solutions for healthcare applications. The
smart clothes use e-textiles, which are fabrics that enable digital components such as small
electronics and sensors to be embedded in them.
Left: an example of wearable jacket with health monitoring sensors (Source: IFE Wearable Computing),
Right: Lifeshirt system by ‘Vivometrics’
Patient Compliance
Considerations, especially for elderly patients, to improve patient acceptance and compliance should
be made. Since seniors often have a tendency to distrust, or even reject technology, the systems
should be intuitive and easy to operate. A study in which elderly residents of Sydney, Australia,
participated in an open-ended discussion found an overall positive view of wearable sensor networks
due to implications for independence. Many users reported that they would feel uncomfortable when
exposed with visible sensors; the design should therefore be as unobtrusive as possible. Compliance
issues due to forgetfulness can be overcome by integrating alarms and reminders, as part of the
10
Plethysmography measures changes in volume within an organ or whole body usually resulting from fluctuations in
the amount of blood or air it contains
51
systems. Finally, data privacy concerns can be reported; many users do not feel comfortable with the
idea that their data are collected by some automated process and transmitted in a matter of
milliseconds to remote locations. Patient education, should become an indisputable component of
the strategy to successfully establish and use sensor networks for patient monitoring.
52
Chapter 5. Hospital Information Systems
Information management in healthcare today
One can identify many problems with the information management in hospitals nowadays.
Healthcare is at least one decade behind in terms of e-health technologies adoption, when compared
to other non-health related industries. A series of problems are evident and most are related to how
the complicated and dynamic flow of health information is managed today. The problems can be
varying. Incorrect reporting, such as a wrong laboratory reports, may lead to erroneous and even
harmful treatment decisions. Repeated examinations or missing results that need to be available
are examples with serious implications to the cost of care, since many procedures may be repeated
while they should not. Information should be documented adequately, enabling health care
professionals to access the information as needed in order to make informed decisions. In general,
any type of patient related information should be available on time, and it should be up-to-date and
correct. The dynamic information processing is the key factor for improving quality and reducing
costs, therefore information processing in a health care organization should become a strategic
priority.
Nowadays, in the United States, medical errors account for more deaths than breast cancer, HIV
and motorcycle accidents, with the majority of medical errors being preventable with the use of
advanced information management technologies. In terms of health outcomes, the United States
lags behind in comparison to other industrialized nations, in many areas such as the infant mortality
and life expectancy, while the healthcare system has among the highest healthcare costs per capita
in world. The management of health information and the wide integration of systems is only one
consideration related to the above problems, but an important one.
What is a System
A system is a set of interacting or interdependent components forming an integrated whole with
relationships to other elements or sets. Every process that involves entities functioning in a specific
way and interacting with each other, may be described as a system.
Systems may consist of subsystems, which may have their own subsystems, and so on. All
subsystems work together to exchange data for a specific purpose. Typically, the elements of a
system can be (i) humans, (ii) machines and (iii) procedures, and determine the system’ s internal
environment. What is outside is called external environment. Those two environments are in
constant communication exchanging data (input-output).
53
Humans Data
Software Hardware
Processes
_________ _______
The above diagram presents the components
of a system and their interaction.
Information System
An information system (IS) - is any combination of information technology and people's activities
that support operations, management and decision making. The term is frequently used to refer to
the interaction between people, processes, data and technology. It refers not only to the information
and communication technology of an organization, but also to the way in which people interact with
the technology to support business processes. The most important concept of information systems is
that they transform data (input) to information (output). Although the legacy information systems
were not necessarily computer based, the modern high complexity information systems cannot easily
be implemented without computer and telecommunications support.
A Hospital Information System (HIS) is an integrated computer system which stores, manages and
recalls information related with the clinical and administrative healthcare expectations in a hospital.
This collection of systems and equipment manages all hospital information with the goal to:
1. Support health professionals to be efficient during their everyday practice, by providing to them
easy access to information and by connecting different departments seamlessly.
2. Improve quality of health services provided to the patients, with the integration of standards
which will be an asset towards an evidence based practice. At the same time the patient
information is easier to access and to make decisions for a patient: all considerations that need
to be taken, will be in place. Another dimension of improving quality is the continuous
uninterrupted data flow for a patient during a patient visit or at a follow-up appointment.
3. Reduce the cost of care, with a more efficient management of the patient information, and by
encouraging good practices and eliminating overuse of practice.
Related terms
There are several other information systems for the management of the flow and storage of
information in hospital routine services.
Healthcare Information System: this is a broader term that can be applied to any health context.
55
The term may be used to describe any system that manages and transmits information related
to the health of individuals or the activities of health organizations.
Clinical Information System (CIS): describes information systems with focus on a technology
their application at the point of care and to support the acquisition and process of clinical
information.
Patient Data Management System (PDMS): information systems which integrate all patient
data. Such system are often used in complex intensive care units and high care patient cases.
Personal Computers
Automation: Administrative
Central Computers
1960 1970 1980 1990 2000 2010 2020
Priorities change as health systems and technologies change: moving from “automation” to the “integration”
56
Scope and Requirements of a Hospital Information System
Hospital Information Systems (HIS) provide common registry for all patient information and the
different hospital departments have access to the patient information, from any place at any time.
The availability of the right information without waiting times in healthcare, as well as a guarantee
that the right information reaches the right person at the right time, is indeed very important. HIS
implementations must recognize the health professional as the main user, while, at the same time,
placing the patient at the center of care. An HIS integrates new information using multiple diverse
sources of information, and helps improve healthcare services to become cost efficient by facilitating
a better management of resources in healthcare settings. There are specialized tools which are
components of HIS implementations and have been specifically designed for cost and quality of care
assessment. The infrastructure of an HIS implementation should support the integration of
enhanced applications with add-ons and modules which facilitate the decision making process. A
really important consideration is that HIS implementations should be backwards compatible so that
old data can easily be migrated into the new system.
Primary goals of Hospital Information Systems: improve patient care and efficiently manage resources
Primary goals Related secondary goals
Better care services and improved quality of care Improvement of communications
Smaller waiting times
Better decision making
Cost management and improved cost-effectiveness Shorter length of stay
Lighter administrative workload
Better use of resources
Reduction of staff costs
Humans (users): people who produce the information and use it for decision making during their
everyday practice. They can be healthcare professionals, hospital administrators, policy makers, or
even external parties, such as payers.
Data: raw data to be processed based on the needs of the above mentioned user categories.
Healthcare data have been extensively discussed in Chapter 2.
Procedures: series of guidelines that describe how humans will act under specific circumstances. In
healthcare, examples of procedures include clinical interventions, patient transfers and admissions,
orders for examinations and various administrative processes.
Equipment: in healthcare a combination of hardware and software is used for the collection, storage
and communication of health related information.
57
Network technologies are required for a Hospital Information System
High speed wired networks Voice over IP (VoIP)
Wireless Networks Web servers (client-server)
Internet services Intranets
Cloud based services Synchronous video conferencing
Structure (Architecture) of a Hospital Information System. Notice that the clinical subsection is core element
58
Functions of Hospital Information Systems
The specific operations of an HIS and its dedicated components, primarily focus on the management
of all the clinical, patient admission, administrative, knowledge management and financial
functions of a hospital.
Patient registration and management of the id information of a patient from the admission to the
patient discharge is supported with the use of specialized subsystems.
Billing systems integrate algorithms for the coding of the ICD diagnoses for each patient to a
Diagnosis Related Group and a billing amount.
Appointment and scheduling systems help manage the workload and the services availability and
provide to the patient an informed estimate of their future care plan.
Computerized Physician Order Entry (CPOE) systems help health professionals enter medication
orders or other instructions electronically instead of on paper charts. A primary benefit of CPOE is
that it can help reduce errors related to poor handwriting or transcription of medication orders.
Electronic Health Records (EHR) refer to the systematized collection of patient and population
electronically-stored health information in a digital format11. Chapter 6 is dedicated to the Electronic
Health Records.
Pharmacy systems manage prescriptions, the organization of pharmaceutical inventories,
prescription and ordering workflow and billing functions.
Telemedicine systems support the remote diagnosis and treatment of patients with the use of
telecommunications technology.
Decision Support Systems are computer-based information systems that support clinical and
administrating decision-making activities.
Gunter, Tracy D; Terry, Nicolas P (2005). "The Emergence of National Electronic Health Record Architectures in the
11
United States and Australia: Models, Costs, and Questions". Journal of Medical Internet Research. 7 (1): e3.
59
the use of tracking and monitoring sensors which record the clinical interventions and verify the
successful execution of the clinical practice.
Design, implementation and level of integration were investigated using a questionnaire, based on
literature evidence that success & failure factors are not only technical, but also related to the
existing organizational models, education, managerial and evaluation issues.
The hospital workflow process was examined in depth to identify factors related with the impending
the successful introduction of IS. The case study was performed in 2 mid-sized general hospitals and
one oncology hospital.
60
The qualitative assessment involves the analysis of the employee perceptions (2 phases):
During phase 1, the researchers started with a literature research to find the critical items with
regards to the design and implementation of hospital information systems. As a result, an
assessment questionnaire with closed-ended questions was created and distributed to 9 IT
department employees of the three hospitals. In phase 2 the researchers discussed in open interview
sessions the implementation process and met four different groups of hospital employees, in one of
the three hospitals of this study (hospital C).
61
The IT employees of the three hospitals believe that the most important problems during the
implementation have been the lack of central planning, difficulties in the user acceptance and
integration of the new system to the everyday practice and, finally the lack of use of standards.
In the case of two out of the three hospitals there was a consensus that specific “financial of personal
career interests” were involved to the purchase decisions and that there were insufficient IT
professionals and health informatics experts to guarantee the success of the system.
63
Chapter 6. Electronic Health Records
The need of Electronic Health Records: continuum of care
The top priorities for healthcare systems today include the continuity of care challenge, the ability
to provide quality of healthcare services, and equity to the access of services, and an increase of the
system efficiency, for all settings across the whole spectrum (primary, secondary, tertiary care).
Healthcare services are distributed across clinical environments where physicians, nurses and other
healthcare professionals work together but many functions primarily interest administrative and
public health services providers (such as hospital administrators, health authorities, and
epidemiologists). The patient information needs to be accessible from different providers, who have
variable roles within the system and need to have access to different views of the clinical
information. This information should follow patients for their whole life, including disease
prevention, treatment and rehabilitation aspects and would provide to the healthcare system a
dynamic, complete and longitudinal insight about the patient health. To satisfy these requirements,
healthcare organizations need to maintain high quality healthcare records.
64
Research and education: when the healthcare data are stored into records are transcribed into
spreadsheets, they can be proven to be a very useful data resource for clinical and epidemiologic
studies.
Public health policies: when data from different healthcare providers are combined, they can be
used as an enormous dataset with significant public health relevance, for the assessment of the
population morbidity profile.
Financial management: the available data kept into the record, may be used to estimate the use
of resources that the healthcare system has spent for each patient.
Records were kept in paper format during the 20th century and it was not before the last decades of
the past century when the first Electronic Health Record systems appeared. Some of the
disadvantages of traditional paper records are pretty obvious, such as the fact that they make
patient data available only once at a time and in one place. In addition, the handwritten entry
increases the possibility of transferring mistakes from the data source, while legibility issues and
misunderstanding of the information can happen too. With an increasing number of patient
hospitals used to deal with enormous volumes of paper and it was very difficult to use efficiently for
adequate follow-up of the patient health status. The physical security breach was huge, since any
natural disaster would cause the records to be destroyed forever. Finally, it is easy to see why paper
based records made it difficult to almost impossible to gather data for research purposes. This is
mainly because it would take countless hours for the paper records to be transformed to a digital
file, manually.
To support the management of patient data and other patient care related information, we will
introduce the Electronic Health Record (EHR) systems which aim to make the management of the
records not only easier but way more effective. These systems exist in every healthcare system
nowadays and integrate information about the management of patient care to support clinical,
administrative and financial requirements.
65
Table. Benefits coming from the use of EHR systems
Time Timely access to health data
Quick data retrieval for research purposes
Money Better management of health resources
Reimbursement is faster and more efficient
Quality of care Decision making support
Tools for distributed care
Research and Clinical & epidemiologic research, patient education, training of health
Education professionals
In-class discussion
Provide examples to show understanding of the EHR properties of ‘flexibility’ and ‘clinical views support’
1. How exactly two different healthcare professionals (for example a physician and a nurse) want to
see the information of the same patient in a different way?
2. How would the output of an EHR system be different between two healthcare procedures for the
same patient during the hospital stay?
66
Electronic
Computerized
Computer-based Medical
Record
Digital Patient
File
Distributed Health care
Folder
Multimedia Health
Automated
Virtual
Other similar terms have been used, with different meanings in terms of their scope and direction. Try to pick one word
from each column to create “new” terms!
Electronic Health Record (EHR) systems need to support both non episodic data i.e. patient history,
patient allergies) and episodic data of care which involve repeated measurements and data entry.
Examples of such data include the patient problem list, patient history, physical examination
information, allergies, vital signs, immunizations, medications, physician orders, diagnostic results
and medical images. The majority of healthcare related data and patient functional status are in
coded form.
An EHR facilitates the efficient data entry of all orders and documentation by authorized clinicians
and provides access to tools and displays that can be customized to end user preferences. Ideally,
this documentation includes the clinical reasoning and rationale for each decision, in an easy to
follow way. An EHR enables the automation of the typical clinician’s workflow, by tracking down
the clinical process pathway in an effective way. During its use, all decisions and interventions to a
patient are accountable and therefore EHR systems should support electronic signatures to avoid
non-repudiation. In addition, oftentimes an EHR provides tools to facilitate teamwork and
coordination.
A modern EHR system provides additional support of data collection for non-clinical uses, such as
billing, quality management, reporting and public health disease monitoring. This is possible by
providing user friendly back ends to enable data extraction, as well as data analytics functionality.
Many EHR systems provide easy access to knowledge sources at any point within the clinical
workflow. The healthcare provider can have easy access to clinical knowledge such as guidelines,
and clinical recommendations.
Access to the patient information is provided with the use of a variety of integrated views, specialty
specific forms, diagrams and flagging any patient information which lies outside of normal limits.
Modern EHR systems also provide tools for the management, communication and monitoring of the
completion of a physician order process. For ambulatory (out of hospital) care, with the use EHR
systems, healthcare professionals store data to support regulatory requirements.
There are many other functionalities of newer EHR systems, such as decision support tools to guide
67
and critique medication administration, recommendations tailored to the condition of an individual
patient and real-time patient surveillance and alerts. There are some systems that provide evidence
based, data driven information about the expected patient outcomes based on the patient condition,
treatment plan and care delivery information.
A good EHR system can accept information from external systems and data capture devices (e.g. bar
code scanners). It would also support reporting for the evaluation of healthcare services, the
compliance & process standards. EHR systems connect financial information and other external
data such as patient satisfaction for purposes of analyzing process and practice performance and
supports data modeling for evaluation of potential organizational changes and predictions of
resources allocation.
The diagram below outlines all the important functions of an EHR for the patient care and the
hospital procedures. A modern EHR system would incorporate the majority of these functions.
Patient care
Tracking down information about provided services (such as medication and treatments) becomes
easy and effortless with the use of an EHR system. Physicians and nurses have available decision
making support tools to patient diagnosis and treatment decisions. Risk factors for patients can be
tracked down, therefore risk assessment for an individual patient can include risk related indicators
such as the risk for an in-hospital acquired condition, readmission risk and other. EHR systems are
often used to facilitate providing high quality healthcare in line with clinical guidelines, since the
guidelines can now be easily incorporated into the EHR. Setting up guidelines for prevention is also
important. Patient satisfaction can be tracked down with all patient experiences survey responses
being stored into the EHR system.
With the use of EHR systems, the management and development of clinical care plans is easier,
since health professionals have available predefined options and care plan templates. Since nurses
spend most of their shift in clinical departments and are in close proximity to their patients and
their personalized needs, the EHR systems should support all aspects of the nursing care, with the
use of evidence based nursing assessment tools, to make the entry of vital signs an easy process, and
to facilitate the management of other clinical measurements and nursing observations. EHR
systems specifically help nurses create customized nursing plans.
69
In-class discussion
Explain how the use of EHR systems help achieve the following three milestones:
Patients will be admitted to hospital when this is required
The unneeded laboratory tests and radiology examinations will be reduced significantly
The hospital length of stay will be reduced
Electronic Health Record systems provide significant benefits not only to those who are directly involved in
healthcare, but to other professionals, such as pharmacists. With the use of EHR systems, medical
prescriptions (Rx) can be based on a specific predefined plan and the right Rx is prepared firsthand, with
no delays or comebacks. The on site assessment of possible drug interactions and the reduced adverse drug
effects is possible with implications to patient safety. In terms of the prescription patterns, since everything
is documented electronically, Rx extensions to EHR systems allow for an improved drug utilization review.
A patient has been admitted of February 21st 2013 to the hospital with shortness of breath, cough, fever and
very dark feces. The blood pressure was measured 150/90 mmHg, pulses were 95/min, temperature=102.7F.
The blood test measured and ESR of 25 mm/hr., Hb. equal to 7.8, and positive occult blood feces. The patient
was transferred for an x-ray to the radiology department. The exam showed not atelectasis and slight sign of
cardiac decompensation. The diagnosis made was acute bronchitis and the patient was prescribed with
Amoxicillin 500 mg, twice a day. A week later the patient was clear of cough, with only slight shortness of breath
and normal feces. The vitals were found to be 160/95 mmHg for the blood pressure and 82 pulses/min, and the
physical examination only showed slight rhonchi. Based on this assessment the patient was prescribed with
aspirin at 32 mg per day. The blood test showed an Hb of 8.2 grams per deciliter and occult blood feces.
Time oriented
The patient information is presented in a temporal order. For each date/time there is a list of all the
clinical interventions and decisions that have been completed/need to be made. The information of
our clinical scenario will therefore be transformed as follows:
70
Feb 21, 2013
Shortness of breath, cough, and fever. Very dark feces
Exam: RR 150/90, pulse 95/min, Temp: 102.7 F, Rhonchi, ESR 25 mm, Hb 7.8, occult blood feces +
Chest X-ray: no atelectasis, slight sign of cardiac decompensating
Medication: Amoxicillin caps 500 mg twice daily
Source oriented
Now the patient information will be transformed to a source oriented representation. The
information will be presented on the basis of the department or location of the activity.
Clinical Department
Feb 21, 2013
Shortness of breath, cough, and fever. Very dark feces
Exam: RR 150/90, pulse 95/min, Temp: 120.7 F. Rhonchi, abdomen not tender
Feb 28, 2013
No more cough, slight shortness of breath, normal feces
Exam: slight rhonchi, RR 160/95, pulse 82/min
Medication: keep Aspirin at 32 mg /day
Laboratory
Feb 21, 2013
ESR 25 mm, Hb 7.8, occult blood feces +
Feb 28, 2013
Hb 8.2, occult blood feces.
Radiology Department
Feb 21, 2013
Chest X-ray: no atelectasis, slight sign of cardiac decompensation
71
Plan: includes the treatment plan and the medication information
We will now transform our patient scenario to a SOAP note.
73
the flexible and evidence based support of the clinical care and to facilitate disease making, research,
communication and data sharing, management and statistical surveys, healthcare systems need to
use common classifications, terminologies and codifications.
Using common classification systems across the healthcare systems an organization can produce
analytics and performance statistics that can be compared to a baseline performance. Standards can
therefore provide the basis for quality improvement in healthcare services. Some other indirect uses
of classification systems include the fact that the statistical analysis of diseases and therapies
becomes possible, and the healthcare data can now be utilizes in knowledge-based and decision
support systems, as well as for the direct surveillance of epidemic or pandemic outbreaks.
Metathesauri
A metathesaurus makes available biomedical concepts and concept names, from many different
incorporated controlled vocabularies and classification systems. A famous metathesaurus in
healthcare is the Unified Medical Language System (UMLS). The UMLS, is a set of files and
software that brings together many health and biomedical vocabularies and standards to enable
interoperability between computer systems. The UMLS enables the intelligent retrieval of
biomedical information from various sources. The Metathesaurus is one of the three UMLS
75
components: the Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon. The
National Library of Medicine (NLM) updates the UMLS twice a year in May and November.
Meta, the UMLS metathesaurus and some of the medical vocabularies and classifications included
Standards do not solely refer to the standardization of healthcare information. There are numerous
standards addressing areas such as communications in healthcare, the characteristics of systems
design (for example Electronic Health Records), information exchange, interoperability, and other.
There are standards which refer to the continuity of care and the management of clinical
transactions. Chapter 9 describes one of the most well-known healthcare interoperability standards,
the Health Level 7 (HL7).
The table below provides examples for some of the more popular standards that are being used in
the United States.
ASTM: Continuity of Care Record, is a patient health summary standard. A record can be created, read and
interpreted by various EHR systems, allowing easy interoperability between otherwise disparate entities.
ANSI X12 (EDI): Transaction protocols used for transmitting any aspect of patient data. Has become popular
in the United States for transmitting billing information, because several of the transactions became required
by the Health Insurance Portability and Accountability Act (HIPAA) for transmitting data to Medicare.
CEN - CONTSYS (EN 13940), a system of concepts to support continuity of care.
CEN - HISA (EN 12967): a services standard for inter-system communication in a clinical environment.
DICOM: a heavily used standard for representing and communicating radiology images and reporting
HL7: HL7 messages are used for interchange between hospital and physician record systems and between
EMR systems and practice management systems; HL7 Clinical Document Architecture (CDA) documents are
used to communicate documents such as physician notes and other material.
ISO: ISO TC215 defined the EHR, and the technical specification of the EHR requirements architecture.
OpenEHR: next generation public specifications and implementations for EHR systems and communication,
based on a complete separation of software and clinical models.
76
Questions for Discussion
1. Explain why the ICD-10 is a single-level classification system while SNoMed is a multi-level
classification, by providing examples of entries for each one of these two systems.
2. How will standards in healthcare help transform the healthcare system to a more cost-efficient
model of practice?
3. Discuss how classification systems can act as enablers for hospitals to provide evidence based
services to patients.
4. Explain why it would not be possible to conduct large scale secondary data analysis research
without the use of classification systems.
5. Which are the main characteristics of a good classification system?
77
Chapter 8. Databases in Healthcare
Databases and Types of Data structure
Databases are collections of data with a specific well-defined structure and purpose. In hospitals
databases are the “spinal cord” of hospital information system. Databases in healthcare are the
collection of health data. Programs to develop & manipulate these data are called Database
Management Systems (DBMS).
Which of these data collections are databases?
An excel file with names and medication of patients within a hospital
A nurse’s agenda with to do’s
A schedule of the shifts for next week
A list of the medicines available
The medical record of a patient
There are many different types of data models for databases organization: Flat Data, Hierarchical
Data, Relational Data, Object-oriented data and more recently, NoSQL databases. The most
commonly used database model is the relational model, which can be found in the vast majority of
the health care databases. Generally, all data submitted into an Electronic Medical Record are most
of the times based on relational, object-oriented or, in fewer, more recent cases, on NoSQL databases.
78
able to locate patients and their information. Your empty “shell” which you will use to start adding
patient data, would look like this:
Pat_id Pat_name Pat_DOB Pat_Address Pat_Gender ICD_n1 ICD_n2 ICD_n3 ICD_n4 ICD_n5
… ... ... … … … … … … …
At first glance, the above shell seems to be a functional approach: we created the attributes and we
are ready to start adding lines with the information for the hospital patients. Actually, the above
approach has some very serious limitations. The six questions below will help you identify and
understand each one of these limitations.
1. What would happen if a patient has a sixth icd-10 diagnosis?
2. What would happen if a patient is readmitted to the hospital for a second time?
3. What would happen if a patient with two different admissions changes address?
4. How many empty cells will you spare for patient with less than 5 diagnoses?
5. What would you do if the hospital decides not to keep track of the patient address anymore?
6. What would you do to delete an icd-10 code which was not supposed to be used, for any patient?
Continue to discuss in class about some of the disadvantages of the use of flat data files. Database books
always mention that using flat files to organize complicated data, can cause problems with data insertion,
data deletion and data update. Can you discuss some of these problems in a data collection scenario involving
new patients and health care professionals assigned to one or more of these patients?
Does this structure actually effectively describe the health process realistically?
79
Conceptual design of healthcare using Entity-Relational Diagrams (ERD)
Before the actual development of a database, we need to first design a conceptual schema which
presents a high level model of how our healthcare mini-world functions. Designing an ERD is easy
and does not require any prior knowledge of computer science skills. These conceptual schemas,
though, are extremely important since they act as communication means between the healthcare
organization and the database developer. The ERD can then easily be transformed into a computer
database, when a transformation algorithm is applied.
An ER diagram has three components: Entities, Attributes and Relationships.
Entities: Specific objects or things in the mini-world that are represented in the database. For
example, the PHYSICIAN Dr. Willy, the Surgical DEPARTMENT etc.
Attributes: Properties used to describe an entity. For example, a PHYSICIAN entity may have the
attributes Name, SSN, Clinical Specialty, Sex, and Birthdate.
An attribute can be simple (sometimes called atomic) when each entity has a single value for the
attribute. For example, the SSN or Sex of an employee. In few cases an attribute may be composed
of several components. For example: Address (Apt#, House#, Street, City, State, ZipCode, and
Country).
Each entity may have one or more key attributes. A key attribute is a unique attribute, which cannot
accept the same value more than once. The SSN is a typical example.
Relationship: a relationship relates two or more distinct entities, with a specific meaning. For
example, the PHYSICIAN Dr. Willy works at the Surgical DEPARTMENT. For each relationship
we need to define two things
A. The relationship cardinality which can be:
• One-to-one (1:1)
• One-to-many (1:N) or Many-to-one (N:1)
• Many-to-many (M:N)
B. The relationship participation which can be:
• zero (optional participation)
• one or more (mandatory participation)
For the relationship PHYSICIANS work at DEPARTMENTS, the cardinality is Many-to-one, since
one PHYSICIAN can work at a maximum at one department, while a DEPARTMENT can have at
a maximum many PHYSICIANS.
For this same relationship, the participation is mandatory from both sides of the relationship, since
every single PHYSICIAN has to work at a DEPARTMENT and every single DEPARTMENT has to
have PHYSICIANS.
80
The table below show the most important components of an ER diagram
81
An example of a relational database. In this schema
there are three different entities: Patients,
1
Admissions and Diagnoses. There are four tables
though. In relational databases, database designers
are often required to add extra tables to express
specific relationship types. The most common
M scenario is that of the many to many relationship
N
type, where the extra (intermediate) table (see red
N arrow) stores the combination of the unique
identifiers of the two related tables. This is the only
way to match a specific admission id with a specific
diagnosis id, since both the diagnoses and the
admissions values can be repeated in that
intermediate table!
In the above schema, can you identify which are the referencing and which are the referenced tables for
each one of the existing relationships?
83
If we want the query to return the admission
dates of patients born after 1970, we need to
1 use the Patients and Admissions table. Our
condition will be Pat_Birth > 1/1/1970.
The query would look like this:
SELECT Pat_Name, Admission_Date FROM
M N Patients INNER JOIN Admissions ON
Patients.Pat_ID = Admissions.Pat_ID
N
WHERE Pat_Birth > 1/1/1970;
BLOBS-Binary Large Object Files: Very frequent in healthcare settings: Images (ct, mri), Audio
(heartbeat seq.), Video (ultrasounds…).
Data-less databases are distributed databases which have been set-up without any data, until
such a need arises. They may be useful in healthcare.
Less expensive than centralized registries (it requires no equipment and little personnel)
The use of the system does not require vague and time-independent patient consents
The system does not require duplication of data in different databases
84
Object Oriented Data Models
Use of real-life objects (entities) for a more efficient data organization. They use SQL but also provide
to the user much higher programming flexibility since there is the possibility to integrate the
database with object oriented programming languages (i.e. java, C# etc.). Object oriented data
models have not yet been fully standardized.
Hands on Practice: try, with the help of your professor to complete the tasks, in class
There is a small clinic where patients can be admitted multiple times. Patients, when hospitalized
are given medication during their stay. In addition, the hospital has various departments (i.e.
surgical, medical, orthopedic etc.). In each department there are a number of nurses who work there.
All those nurses make blood pressure measurements to the patients. Each measurement can be
performed by different nurses each time. There is a doctor in charge assigned to each patient upon
hospitalization. One or more diagnoses are assigned to each patient during every hospitalization.
Task 1. Define the entities for the process and, for each entity, the required attributes
For the patient, we only need to capture the patient name, gender and date of birth
For each hospitalization, we need to know the admission, discharge date and discharge status
We only need to capture the diastolic blood pressure and the exact time of each measurement
The nurses and doctors have clinical specializations
Each department has a title and a bed capacity
We need to know the diagnosis code the diagnosis description, and the diagnosis date
The description implies the existence of patients (Entity 1) who can be admitted multiple times. Therefore,
admissions (Entity 2) have to exist in the database as a separate entity, too. Not all patient are admitted.
There are departments (Entity 3) and nurses (Entity 4) working in departments. Nurses make blood
pressure measurements (Entity 5) to patients. Medical doctors (Entity 6) are assigned to patients.
85
Task 2. Design an ER Diagram to conceptualize the clinical mini-world requirements. Use the
appropriate notations that we explained earlier in this chapter.
Task 3. Design the database schema using arrows to define the appropriate references
Task 4. Prepare SQL code to define the above mentioned database schema. Each relation
should have the appropriate fields. Think of the appropriate fields based on the information provided
in Task 1. Try not to over-do it with many attributes. We are only building a sample database, so
just include those attributes that have to be there.
86
Chapter 9. Interoperability in Healthcare
Interoperability is the ability of diverse systems and organizations to work together (inter-operate).
The term is often used in a technical systems engineering sense, or alternatively in a broad sense,
taking into account sociopolitical, and organizational factors that impact performance between two
systems.
The IEEE Glossary defines interoperability as “the ability of two or more systems or components
to exchange information and to use the information that has been exchanged”
Interoperability is achieved when we are “able to accomplish end-user applications using different
types of computer systems, operating systems, and application software, interconnected by different
types of local and wide area networks” (O'Brien et al.).
The diagram below presents some of the recent forces that make the need for reforms to achieve an
interoperable healthcare a priority. Population is ageing, more citizens suffer from chronic
conditions and therefore they need to access the healthcare system on a frequent basis. The need for
the continuity of care in inevitable and the healthcare system should be connected to support the
health needs of these populations. At the same time, people today know more about healthcare
symptoms, healthcare resources and have access to online information portals. There are more
services for lifestyle management and rehabilitation available and an increasingly growing demand
for such services, which need to be interconnected with the electronic patient record.
87
Dimensions of Interoperability
Controlled Terminologies: they express and define concepts and allow for the navigation of
concept-to-concept relationships. They provide a basis for determination whether two items are the
same or different. Controlled terminologies provide a basis for knowledge representation, capture,
discovery, and management knowledge use.
89
Interoperability is not yet resolved in healthcare
Information interoperability is a key ingredient for modern health information technology. It is
therefore essential for healthcare information systems to communicate critical data. Another
dimension of the importance of interoperable environments, is that they allow vast amounts of data
to be gathered for research and trends analyses. The absence of a robust set of standards to resolve
data incompatibility issues is becoming increasingly costly to the U.S. healthcare system. Savings
of ~$78 billion could be achieved every year if data exchange standards were used across the
healthcare sector in the United States. Identifying problems related to healthcare information
systems standardization and interoperability and potential issues for future research that can
address the given problems. One component for interoperability is the availability of data standards.
Problems with current standards include:
Gaps: missing information which cannot be utilized, because standards cannot accept it
Redundancies: the same information is repeated twice into a standard, with different ways
Data exchange requirements: the same standards need to be used during data exchange and
this is not always the case since different systems might be using different standards
With interoperable Electronic Health Records we will be able to improve health care delivery by
making the right data available at the right time to the right people.
90
Some important interoperability standards
Health Level 7 (HL7): a collection of message formats and related clinical standards that define
an ideal presentation of clinical information, and together the standards provide a data exchange
framework. HL7 is a standard for healthcare specific data exchange between computer applications.
The name comes from "Health Level 7" (top layer of the Open Systems Interconnection layer protocol
for the health environment).
Cross-enterprise Document Reliable Interchange (XDR): It is used for the exchange of health
documents between health enterprises using a web-based, point-to-point push network
communication, to permitting direct interchange between electronic health records, patient health
records and other systems without the need for a document repository. Example: A nurse at Hospital
A enters a patient's information in the local EHR, and then sends the CCD (a clinical document
exchange standard) directly to Hospital B's system.
Picture Archiving Communication Systems (PACS): These are devoted to the storage,
retrieval, distribution, and presentation of images. The medical images are stored in an independent
format, most commonly DICOM.
Logical Observation Identifiers Names and Codes (LOINC): LOINC applies universal code
names and identifiers to medical terminology related to the electronic health records and assists in
the electronic exchange and gathering of clinical results (laboratory tests, clinical observations,
outcomes management, research)
Electronic Data Interchange (EDI): A standard format for exchanging business data, widely
used in healthcare too. Each element in an EDI message represents a singular fact, such as a price,
product model number. A transaction set often consists of what would usually be contained in a
typical business document or form. Parties who exchange EDI transmissions are referred to as
trading partners.
91
The Office of Standards & Interoperability
The goal of the Office of Standards & Interoperability (OSI) is to help build nationwide Electronic
Health Record interoperability. The office is under the umbrella of the U.S. Department of Health
& Human Services. The main goals of OSI are summarized below:
Achieve seamless exchange of health data across: federal agencies, governments, private sector
Encourage the further development of health IT standards
To achieve these goals, OSI's roles include:
Enabling stakeholders to utilize simple, shared solutions to common information exchange
Overseeing a set of standards, services, and policies that accelerate information exchange
Enforcing compliance with validated information exchange standards, services, and policies
Additional Study Material
Healthcare Interoperability Glossary: a very good source for further reference
www.corepointhealth.com/resource-center/healthcare-interoperability-glossary
12
OSI is a conceptual model that characterizes and standardizes the communication functions of a telecommunication or
computing system without regard to their underlying internal structure and technology. Its goal is the interoperability of
diverse communication systems with standard protocols. The original version of the model defined seven layers.
92
Vocabulary in HL7: is the set of all concepts that can be used as valid values in an instance. For
example, the Living_subject class has a coded attribute called administrative_gender_code. A
message instance is subsequently created as part of an implemented interface, one would expect the
administrative_gender_code attribute to convey male or female. Male and female are concepts and
there may be several coding schemes that contain concepts for male and female.
If you want to browse through an HL7 vocabulary, you can download a vocabulary tool here:
http://hl7-vocabulary.pilotfishtechnology.com/HL7/index.html
13
UML is a modeling language based on object-oriented modeling methods
93
Participation: An association between an Act and a Role with an Entity playing that Role. The
class ‘Participation’ has only one possible sub-class: Managed Participation
Act: Describes the actions and events in health care services. Examples from healthcare include a
clinical observation, an assessment of health condition (i.e. a diagnosis), treatments (medication etc.),
and patient education. The class ‘Act’ can be one the following sub-classes: Account - Control Act -
Device Task – Diagnostic Image - Diet - Financial Contract – Financial Transaction – Invoice
Element – Observation – Participation – Patient Encounter - Procedure- Public Health Case -
Substance Administration – Supply - Working List
Act Relationship: An association (with direction) between one Act (source) and another Act
(target). It may be an association of a later instance to an earlier instance OR an association from
collector instance to component instance.
Take a look at the HL7 message below. The segments contain the following information:
The MSH (Message Header) segment contains information about the message itself. This
information includes the sender and receiver of the message, the type of message this is, and the
date and time it was sent. Every HL7 message specifies MSH as its first segment.
The PID (Patient Information) segment contains demographic patient information about the
94
patient, such as name, patient ID and address.
The NK1 (Next of Kin) segment contains contact information for the patient's next of kin.
The PV1 (Patient Visit) segment contains information about the patient's hospital stay, such as
the assigned location and the referring doctor.
MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|199912271408|CHARRIS|ADT^A04|1817457|D|2.5|
PID||0493575^^^2^ID 1|454721||DOE^JOHN^^^^|DOE^JOHN^^^^|19480203|M||B|254
MYSTREET AVE^^MYTOWN^OH^44123^USA||(216)123-4567|||M|NON|400003403~1129086|
NK1||ROE^MARIE^^^^|SPO||(216)123-4567||EC|||||||||||||||||||||||||||
PV1||O|168 ~219~C~PMA^^^^^^^^^||||277^ALLEN MYLASTNAME^BONNIE^^^^||||||||||
||2688684|||||||||||||||||||||||||199912271408||||||002376853
In this segment, the fifth composite is the patient name, which is DOE^JOHN^^^^. (The four ^^^^
characters at the end of this composite indicates that it has a total of six sub-composites, and that
only the first two of the sub-composites are defined.) In this composite, DOE represents the family
name of the patient, and JOHN is the patient's given name.
96
Chapter 10. Security and Privacy of Data in Healthcare
Personal Protected Health Information (PHI)
The Personal Protected Health Information should be protected and considered private since
possible acquisition by unauthorized parties may reveal the health status of individuals. The tables
below presents the Protected Health Information. Below is a list of information that is considered
PHI and should be protected.
97
US Government Healthcare Security Regulations
In the United States there are acts and regulations describing the use of electronic patient
information, the privacy of personal identifiable information and accountability issues in electronic
records. In addition, there are in effect various state security and privacy laws and regulations.
Privacy Act (1974): a United States federal law, establishes a Code of Fair Information Practice
that governs the collection, maintenance, use, and dissemination of personally identifiable
information about individuals that is maintained in systems of records by federal agencies. A system
of records is a group of records under the control of an agency from which information is retrieved
by the name of the individual or by some identifier assigned to the individual. The Privacy Act
requires that agencies give the public notice of their systems of records by publication in the Federal
Register. The Privacy Act prohibits the disclosure of information from a system of records absent
the written consent of the subject individual, unless the disclosure is pursuant to one of twelve
statutory exceptions. The Act also provides individuals with a means by which to seek access to and
amendment of their records, and sets forth various agency record-keeping requirements.
Health Insurance Portability and Accountability Act-HIPAA (1996): was enacted by the
United States Congress and then signed in 1996. Title I of HIPAA protects health insurance coverage
for workers and their families when they change or lose their jobs. Title II of HIPAA, known as the
Administrative Simplification (AS) provisions, requires the establishment of national standards for
electronic health care transactions and national identifiers for providers, health insurance plans,
and employers. You can visit the address below to learn more about HIPAA security and privacy
rules and requirements http://www.hhs.gov/ocr/privacy/hipaa/understanding/summary/index.html
Electronic Signature Act (2000): facilitate the use of electronic records and electronic signatures
in interstate and foreign commerce by ensuring the validity and legal effect of contracts entered into
electronically.
Health Insurance Portability & Accountability Act (HIPAA) Privacy & Security
HIPAA aims to protect the confidentiality, integrity, and availability of electronic Protected Health
Information (ePHI). The so called Security Rule of HIPAA addresses three areas (i) administrative
(ii) physical (iii) technical aspects of ePHI. The rules apply to the security (keep secure) and integrity
(keep intact) of electronically created, stored, transmitted and processed personal health
information.
According to HIPAA, healthcare facilities will monitor logon attempts to the network. Inappropriate
logon attempts should be reported to the respective departmental level security designee. All
computer systems that have been installed in the hospital are subject to audit. The same applies to
the access to the hospital intranet, which will also be monitored. As far as the access to protected
health data is concerned, this should only be granted to authorized individuals. Installation of
software without prior approval is prohibited and disclosure of ePHI via electronic means is
forbidden without authorization. Lastly, all computers should be manually logged off when the
98
authorized user is not in front of the computer, for any reason (for example returning from the
screensaver back to the desktop should require a password).
Strong passwords should be “meaningless”, and should contain a combination of numbers, letters
and symbols and a couple of one or two capital letters. They should also be longer than eight
characters. On the contrary, weak passwords are either too short and/or are real dictionary words.
To remember a password, you can apply your own rules: e.g. ‘my favorite hobby is baseball’ gives
this password: Mfhib!
Very often access to a system is based on the role of the employee. For example, all nurses share the
same role, therefore they use the same credentials to gain access to the Electronic Heath Record.
Each user, as an individual can belong to a group, while groups should be granted access rights.
Different categories of healthcare professionals are granted different access rights and specific
policies should be established for regular audits and updates of group membership, for example on
a yearly basis.
Administrative directors are responsible for informing the IT administrator of any employment
status changes. Upon termination of employment the employee’s network and PC access has to be
terminated without any delay; that means that the former employee can no longer access the system
using their passwords. All ePHI & computer equipment of the former employee (laptops, PDAs)
should be retrieved. For those reasons, the use of a prior employee’s user-ids and passwords should
100
be strictly forbidden. “Generic” user-ids are strictly forbidden and a new clean account should be
provided to the new employee. According to HIPAA, known and suspected security violations must
be reported to the Administrative Director or their designee. Security incidents must be fully
documented to include time/date, personnel involved, cause, mitigation, and preventive measures.
Data Encryption
Technological solutions are required to protect ePHI where applicable. Examples include data
encryption and secure data transfer over the network. All wireless networks require security
protocols and encryption and any electronic transmission of ePHI must be encrypted. Encryption
must be achieved through software approved by the IT Department Security designee. Data
encryption is the method of using algorithms and mathematical calculations to transform plain text
into ciphered text, to make it non-readable for unauthorized parties. To decrypt an encrypted
message the recipient must use a special key that transforms text back to the original version. No
particular encryption technology-no matter how ‘strong’ it may be can ever, ensure that information
remains secure. Instead, a variety of circumstances need to be taken into consideration to ensure
that personal information is protected against unauthorized access. Data encryption is a
requirement for many data transactions. Earlier, pre-internet era, people rarely used encryption.
Nowadays with banking, online shopping and other services data encryption is a primary
requirement. Connecting to a secure server with a web browser automatically encrypts data to
prevent intruders. In case one attempts to capture encrypted information successfully, it will be
scrambled and unreadable, since the intruder does not have the reverse algorithm to read the data.
Data Encryption algorithms are constantly advancing. Until recently, 64-bit encryption was
considered strong enough, but nowadays at least 128-bit solutions are used. A newest standard
called Advanced Encryption Standard (AES) allows a maximum of 256-bits.
Encrypted data should always stay encrypted when not used and encryption keys must be of
sufficient length to resist attempt to break the encryption. Secure authentication of users is
required. Prior to decrypting, authorized users must be securely authenticated with the use of robust
passwords; only authorized users can decrypt data. No file containing decrypted data should keep
existing after a user had accessed encrypted data and viewed or updated it in decrypted form. Health
information security professionals determine which users have access to encrypted information on
a given mobile device or on mobile media.
For the encryption implementations of ePHI there should be considered that ePHI must be 24-h
accessible. If an encryption system makes data unreadable when a user is unavailable (e.g. death,
illness etc.), or when a user forgets a password, then that encryption is unsuitable for healthcare
environments. Products from well-known vendors provide centralized management of passwords;
remote password resets etc. to facilitate the efficient management media without fearing loss of
data. Encryption systems must backup the encrypted data files on a regular basis. Poorly designed
encryption systems may leave temporary file copies of encrypted data in unencrypted form on
disks/mobile devices.
Symmetric Encryption. This method uses a single key that is shared by the pair of users who
want to communicate a message. It is also called ‘Secret Key encryption’, since the key has to remain
102
secret, because its acquisition is enough to retrieve the original message. Restrictions of symmetric
encryption include that it does not scale very well. Also there is the risk of having intruders ‘grab’
the key through by trespassing into a network or the internet. This is simply because this key is
enough to reveal all our data. Many people have the key and this increases the risk of having some
of them losing it. On the other hand, there are important advantages of symmetric encryption such
as the fact that there is no overhead, making it very fast encryption process. The method can also
be used together with other encryption methods.
Asymmetric Encryption. It is called “Public key cryptography” and is a relatively newer
technology. The idea of asymmetric algorithms was first published in 1976 (Diffie and Hellmann).
In asymmetric encryption there are used 2 different keys. Firstly, a private key that should be kept
secret. None needs this but the message sender. Secondly, a public key that can be seen by anyone.
The private key is the only one that can decrypt data in asymmetric cryptography and there is no
way one can “retrieve” or reverse engineer the private key if they have the public key in their
possession. The role of the public key is to decrypt the data encrypted by the private key. To use
asymmetric encryption, there must be a way for people to discover other public keys. The typical
technique is to use digital certificates. A certificate is a package of information that identifies a user
through his id information (i.e. name, user's e-mail address and the user's public key). During a
secure encrypted communication, both ends send a query over the network to the other party, which
sends back a copy of the certificate. The other party's public key can be extracted from the certificate.
Symmetric and
asymmetric encryption
methods can be used in
combination together, to
provide fast, efficient
and secure encryption to
the sensitive health data
103
Common Encryption Protocols and Algorithms
Strong encryption like TLS (Transport Layer Security) and SSL (Secure Sockets Layer) will also
keep data private (but they can't always ensure its security). Websites that uses these types of
encryption may be verified with the procedure of checking the digital signature on its certificate that
in turn must be validated by an approved Certificate Authority.
The Advanced Encryption Standard (AES) is based on “substitution-permutation network” and is
based on a 4×4 column-major order matrix of bytes. Most AES calculations are done in a
special finite field. AES has a fixed size of 128 bits, and a key size of 128, 192, or 256 bits. The key
size specifies the number of repetitions of transformations that convert the input (plaintext) into
output (ciphertext).
1. The need for health data privacy existed since the early steps of medical science. In your opinion
which are the two major challenges of privacy, that are driven by the extensive use of networks
and computer technology in healthcare?
2. Outline any possible negative effects that a possible unauthorized reveal of personal protected
information would have for a patient.
3. Rank the five passwords below in terms of strength (weak-average-strong).
(i) newyork
(ii) happymanie
(iii) etEs!$pr99
(iv) logon
(v) 01081985
4. In your own words describe the symmetric and asymmetric encryption in brief. Which method(s)
would you implement to ensure the protection of ePHI in your organization and why?
104
Chapter 11. Big Data in Healthcare and Emerging Challenges
There are many sources of information which could be used by clinicians for decision making, but
these are not always available. This lack of complete information affects decision making, treatment
and patient outcomes. Healthcare Information systems unable to recognize clinicians as the main
users and, especially, they do not succeed to foreseeing the clinicians’ need for complete and up-to
date information. Even when systems use multiple sources of rich data, their sources often include
outdated, or incomplete and disorganized information. Clearly healthcare costs continue to increase;
therefore by simply implementing new systems without any considerations on how to integrate
diverse and distributed datasets, will not solve the problem.
It is also evident that in healthcare new systems adoption is slow and clinicians continue to lose
valuable time hunting for information. It has been estimated that healthcare professionals actually
waste 20-40% of their time for such procedures. Patient registration systems are not connected and
since the data are widespread and disconnected, across the healthcare system. As a result, each time
a patient visits a new setting, their information has to be reentered into the system. Re-entering
demographic and other registration information is error prone, time consuming and is related to an
additional burden for the employee. As a result, communication of information is not timely and can
be inconsistent, with negative effect on the quality of care.
Big data is a collection of large and complex data sets which are difficult to process using common
database management tools or traditional data processing applications. The challenges related with
the big data management, include the process to capture, storage, search, share and analyze data.
O’Reilly, a multi-faceted media company and publisher defines big data as: “the data that exceeds
the processing capacity of conventional database systems. The data is too big, moves too fast, or
doesn’t fit the limitations of database architectures”. Big data refers to the tools, processes and
procedures allowing an organization to create, manipulate, and manage very large data sets and
105
storage facilities. It is not appropriate to quantify the meaning of big data in terms of storage size.
Big data is not just about storing huge amounts of data; it is the ability to mine and integrate data,
extracting new knowledge from it to inform and change the way providers, even patients, think
about healthcare. An organization facing hundreds of gigabytes of data for the first time may be in
front of a big data challenge and a need to reconsider data management options. But for a larger site
which utilizes a distributed computing framework and uses advanced data management methods,
may take tens or hundreds of terabytes before data size becomes a significant consideration.
Extremely large data volumes were originally an issue for supercomputers, nuclear physics,
meteorology, and space travel. Late in the 20th century airline and bank operations, entered the ‘big
data family’, while during the mid-1990s, the Human Genome Project was initiated and this was
the first large scale project to use big data in healthcare. Later big data started to be used in finance,
research, marketing and entertainment. Nowadays big data is considered as a challenge and an
opportunity and provides great potential for most industry sectors. Data sets grow in size because
of information-sensing mobile devices, remote sensing, software logs, cameras, and microphones
storing audio and video digitally, radio-frequency identification readers, and wireless sensor
networks. Data also grow since, nowadays every information is captured in digital format and stored
in databases. Most of the data entry procedures nowadays are paperless. Big data is difficult to be
used with relational databases, desktop statistics and traditional visualization packages. They
require instead massively parallel software, on tens to thousands of servers.
In healthcare he modern trend is to utilize larger datasets in health analytics and during decisions,
because of the additional information from related data, as compared to separate smaller sets with
the same total amount of data. This makes it possible for correlations to be found and to identify
trends in the health of an individual of a specific population, prevent diseases and organize health
promotion activities, as well as to determine and improve the quality of health related research. The
box below lists some of the most common uses and sources of big data.
Data for genetic research Internet text, web logs, internet indexing
Radio Frequency Identification (RFID) Storage and data warehouse
implementations Risk management and modeling
Data from Sensor networks 360 View of the Customer
Social networks and their data Mass amounts of e-mails for e-mail Analysis
14
Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014; 2: 3.
107
Those who are directly and indirectly involved in healthcare will benefit from the use of big data
and data-driven systems. Clinicians want real-time access to patient, clinical and other relevant
data to support improved decision-making and facilitate better quality of care. Researchers and
epidemiologists want to have, at their disposal, novel data-driven tools to improve the data workflow
e.g., predictive modeling, statistics and algorithms that improve the design and outcome of their
experiments and epidemiologic studies. Pharmaceutical companies want to better understand the
causation of various diseases and the factors related with the pharmaceutical response, in order to
find more targeted drugs, and design successful clinical trials to introduce new medicines into the
market. Medical device companies collect data from hospital based and home based devices for
monitoring of the patient safety and also to predict possible adverse events. Therefore they need this
data, to integrate it with old and new forms of personal data and make safer and accurate medical
devices for diagnosis and therapy. Finally, patients themselves want their everyday use of
technology to flow into their medical care and to have better control of their own data.
An example of big Velocity is the case of “Credit Suisse”. Their business included the Processing of 1,000,000,000
transactions during working hours. These extravagant volumes of transactions raised the demand for: in-memory
architecture for performance, on-disk resiliency for availability and distributed architecture for data coherency
Variety: the data sources are many and diverse often located in different databases across the
healthcare system. The challenge is the integration of data, structured or not, to make them a useful
knowledge discovery resource. Data that are merged together for problem solving are not only text
and numbers, but often clinical orders, online interactions, medical images and videos and recorded
messages.
Veracity: this dimension of big data refers to the correctness, meaningfulness and the relevance of
the data in relation to their intended use. There are challenges related to the evaluation of the
accuracy of the data to become useful for decision making. Traditional data management assumes
that warehouse data which is stored in warehouses is certain, clean, and precise; but data is
sometimes uncertain, imprecise or wrong. Data quality issues are a particular concern in healthcare.
Veracity issues are unique to healthcare are about diagnoses, treatments, prescriptions, procedures,
and outcomes correctness.
110
groups (e.g. e-patient movement), where patients help each other to become active participants in
their own care alongside doctors. There is a peer-reviewed, open access journal titled ‘Journal of
Participatory Medicine’ with the aim to advance the participatory medicine among healthcare
professionals & patients. The Society of Participatory Medicine is a cooperative model of healthcare
that encourages active involvement by both patients and healthcare professionals
111
charting and scheduling. Recent projects of Practice Fusion are on cancer and heart disease. Practice
fusion analyzes aggregated data from the EMR and public health to monitor health on a population
level. These data include health population Surveillance and Education (e.g. flu, asthma), drug
surveillance, public health research, care plans and best Practices. Healthx develops and manages
online cloud based portals for health healthcare companies, focusing on enrollments, claims
management and business intelligence. The company uses vast data coming from benefits,
physician, prescription information and other information.
The Institute for Health Metrics & Evaluation (IHME) gathers large distributed data sets globally
for data analysis and health measurement data from disparate sources including censuses, surveys,
vital statistics, disease registries, hospital records. Aim is to support policy decisions and improve
population health. The most recent project of IHME is the ‘Global Burden of Disease’ and seeks to
identify the world’s major health problems, assess the response of the society to address these
problems and identify optimal methods to dedicate resources and maximize health improvement.
The University of California, Santa Cruz Initiative started a large scale 10.5 million project in 2012,
to create the world’s largest repository for cancer genomes, a huge database with biomedical
information is structured, which will allow to get a complete molecular characterization of cancer.
Sickweather LLC scans social media (Facebook, Twitter) to track outbreaks of disease, offering
forecasts to users, similar to keep individuals aware of outbreaks in their area. Humedica, a medical
informatics company connects clinical and patient information across varied settings and time
periods to generate longitudinal views of patient care, to provide accurate and detailed predictive
models over longer periods of time.
Humetrix’s iBlueButton is a mobile health information exchange app to access and exchange
medical records. It combines the convenience of mobile phones with Big Data and gathers medical
information, tracks sleep, manages diabetes, heart disease and asthma, to understand behavior
patterns and motivations for prevention. Asthmapolis collects patient data and provides them with
feedback to better manage their asthma. A mobile sensor tracking device attaches to asthma
inhalers to monitor the time and location of events. Asthmapolis aggregates real-time data for
epidemiologic and public health use. Finally, ZEO is a personal sleep coach device which analyzes
over a million nights of data to help consumers improve their sleep. The device tracks the quality of
sleep and gives personalized advice on how to improve sleep. ZEO shared sleep data with
universities, for a 360 degree understanding of sleep. Its limitation is that the sleep data is not
combined with blood pressure, weight, heart rate, and other sleep related measures.
1. Explain how the use of big data can improve health outcomes.
2. Discuss a clinical scenario and explain how the big data Vs are relevant to your scenario
3. Discuss how big data can contribute to high quality population health surveillance systems.
112
Chapter 12. Data Mining for Decision Making in Healthcare
As we discussed in Chapter 11, healthcare data is expanding. Huge amounts of data are generated
during healthcare transactions and large datasets are available from many different sources of data,
which are often not directly health related, but are useful to assess the health status of an individual
patient or a population as a whole. These may include environmental datasets, labor data, and other
socioeconomic datasets. Human beings exist in balance with their environment and this is the reason
why external factors affecting human health should be considered when we try to understand the
health dynamics and the causation of health conditions.
The aforementioned datasets and healthcare transactions are too voluminous and complex to be
processed by traditional methods. Recent technological advances in new computer and information
sciences knowledge make it possible to analyze large amounts of data in order to find useful
information and to create predictive models for clinical and administrative decision making. Recent
advances in computers are more than evident in everyone’s life. Computer systems become cheaper,
hard disk drive capacity increases and processors are faster. In addition parallel computing
architectures and advanced affordable networks make it possible to apply advanced data analytics
methods on large distributed data files.
15 MeSH is a comprehensive controlled vocabulary for the purpose of indexing journal articles and books in the life sciences
113
it is a new approach to data analysis and knowledge discovery. Data mining originated from work
of statistics and machine learning as an interdisciplinary field has advanced since, including the
areas of pattern recognition, database design, artificial intelligence, visualization and other.
114
Need for Data Mining in Medicine
Medical data are often noisy, incomplete and uncertain. Lots of data are collected due to
computerization (text, graphs, and images) while there exist many disease attributes available for
decision making. Evidently, there is nowadays an increased demand for health services because of
the greater awareness of citizens, increased life expectancy. At the same time, overworked
physicians and facilities, and stressful work conditions in ICUs and other settings, are often a
reality, and therefore providing health care professionals and administration with advanced data-
driven tools to help them with decisions can positively contribute to a more effective, evidence based
and time and resources efficient healthcare model.
Data mining methods can support the whole spectrum of medical procedures. From the patient
diagnosis, by classifying disease patterns, to the treatment, by selecting, from available treatment
methods the most suitable one, and the prognosis, by predicting future outcomes based on previous
data and present conditions. Information gained from data mining is expected to maintain a high
level of care and to improve organizational planning. This is evident by the fact that healthcare
organizations that perform data mining have better predictions about their mid and long-term
requirements. There exist numerous financial motivators in the health industry for the use of data-
driven, predictive tools; healthcare organizations slowly but steadily started to recognize the need
to make decisions based on the analysis of clinical and financial data. Health insurance companies
try to reduce money loss due to fraud by using data mining methods while prospective payments in
hospitals may be based on classifying patients into case-mix groups.
Data mining can also help advance research, since healthcare data analysts and researchers gain
up-to-date biomedical knowledge and they can more easily understand large biomedical datasets. It
is therefore possible, with the use of data mining, to generate scientific hypotheses from large
experimental data, clinical databases, and biomedical literature.
Clustering
Clustering is unsupervised learning and when implemented it observes only independent variables.
There is no target variable to be specified before a clustering experiment. For this reason, clustering
may be best used for studies of an exploratory nature, especially with large amount of data, with
little known about data. Clustering methods group objects in specific number of clusters: objects in
115
a cluster are similar and objects from different clusters are not similar. Clustering algorithms are
generally categorized into partitional and hierarchical methods.
Partitional (or centroid based) clustering algorithms require a user to select a desired number of
clusters (k), in order to relocate objects to the user provided number of k clusters. Partitional
clustering methods are categorized according to how they relocate objects, how they select a cluster
centroid among objects within a cluster or how they measure similarities between objects and cluster
centroids. The advantages of partitional clustering methods are that they are, in general, very fast,
often providing superior clustering accuracy as compared with hierarchal clustering algorithms,
while they can handle large data sets which hierarchal algorithms cannot (better scalability). Their
major drawback is that their clustering results depend on the initial cluster centroids (which are
random); in other words, clustering results are a little different each time the partitional algorithm
runs. K-means is the most widely-used partitional algorithm and generates clusters in a two-step
process. Firstly it randomly selects k centroids (objects) then it decomposes objects into k disjoint
groups (based on the similarity between centroids and objects). A cluster centroid is the mean value
of objects in the cluster.
Hierarchical clustering algorithms merge the most similar two groups of objects based on pair wise
distances between two groups of objects, so that objects are hierarchically grouped. There are
different methods of hierarchical clustering, based on the selection methods of the representative
object of each group for similarity calculation (e.g. single-link, complete-link, and average-link). The
main advantage of hierarchical clustering is their visualization capability which shows how many
objects are similar one another. By reviewing dendrograms, researchers can reasonably guess the
number of clusters. The disadvantages of this family of clustering methods are that they are very
complex algorithms and therefore require a huge amount of system memory to calculate distances
between objects. It takes for a hierarchical algorithm approximately 60 seconds to cluster 1000
objects (records); in order to cluster 2000 objects it takes 480 seconds (2000/ 1000)3 *60), provided
that there is enough system memory available. This complexity is such that the algorithms are very
much limited for very large data sets. Among the hierarchical algorithms the average-link algorithm
provides the best clustering accuracy in most cases.
116
Another type of clustering following a different approach is the distributional clustering where data
is modeled with a fixed number of randomly generated Gaussian distributions. The most well-known
distributional clustering algorithm is the Expectation-Maximalization (E-M). The ‘E’ phase
estimates the expectation of log-likelihood using the current estimate for the parameters while the
‘M’ phase computes parameters maximizing the expected log-likelihood of the ‘E’ step. In order to
obtain a hard clustering, objects are often then assigned to the Gaussian distribution they most
likely belong to, for soft clusterings this is not necessary.
When performing clustering analysis, repeated sampling and analysis is a good practice, to avoid
sampling bias and to correctly determine the correct number of clusters. After determining the
number of k, sophisticated partitional algorithms can be used. During a clustering experiment, data
outliers can influence the number of resulting clusters. Some datasets contain only a few outliers
while some samples are noisy and contain many outliers. In each case, even a single outlier forms a
cluster. Such outliers should be eliminated, especially when partitional clustering methods are used,
since these are mathematically mean is sensitive to outliers. Sometimes though, outliers contain
useful meaning. It should be noted that most clustering algorithms only handle numeric data.
However, most healthcare databases have a number of categorical attributes. Although it is possible
to convert categorical into numerical data for clustering, this conversion distorts distances between
categories. Imagine three discrete values, A, B, and C being converted into 1, 2, and 3, respectively.
In the real world let’s assume that distance from A to B is 1 point, from B to C is also 1 point, but
the distance between A and C will be 2 points. This conversion indicates that A and C are more
dissimilar than either A and B, which is not true. There exist few clustering algorithms that can
handle categorical data, such as the FarthestFirst (found in Weka), and the two-step cluster analysis
(found in SPSS).
Clustering has been widely used to study genes when very little information is available and when
microarray data can be used for clustering genes. Gene clustering information is valuable for
researchers who study genes. When very little or no information about data is known, hierarchical
clustering algorithms should be used first because they do not require the input of k (# of clusters).
117
the number of rules for very infrequent transactions. This property also significantly limits the
search for frequent item sets and considerably improves the efficiency of the algorithm. In healthcare
applications, though, the selection of support levels needs to be done carefully, since an extremely
low threshold will exclude possible strong rules for rare events (e.g. male breast cancer).
Techniques to make searches faster in huge datasets include the use of hash tables, sampling
techniques and transaction reduction (transactions without frequent items are not read further). A
good association mining system should provide tools that help domain experts eliminate
meaningless association rules (e.g., prostate cancer → male) and organize raw association rules
using the concept of hierarchy.
118
Classification methods belong to supervised data mining since the user needs to provide a target
variable of interest. Before applying classification, redundant attributes and irrelevant attributes to
class (e.g., sex attribute on prostate) should be discarded and methods to select features that are
relevant to the classification process need to be selected using feature selection algorithms. Without
discarding irrelevant attributes, these would increase the noise and slow performance. The feature
selection is, in most cases, performed with the use of statistical methods (such as correlation
analysis) to find the most important attributes, which are those correlated in a statistically
significant way to the class under investigation. Noise reduction should be used carefully: a
drawback of eliminating variables during feature selection based on simple correlations, is that we
may miss an important relationship between a set of independent variables and a dependent
variable. Individually, smoking and an infection might not affect stomach cancer so that they might
be eliminated. However, their combination may be significantly related to the cancer. This is why
there are available more advanced feature selection methods which investigate the feature relevance
following multivariate approaches.
Classification is the core data mining method in bioinformatics. Researchers have managed to
distinguish between similar diseases if they can have the DNA expression microarray data of sample
cells infected with similar diseases and can correctly classify microarray data. Golub et al. correctly
distinguished acute myeloid leukemia and acute lymphoblastic leukemia by applying classification
algorithms on gene expression data. In another study, Harper compared the performance
classification algorithms such as Discriminant Analysis (DA), regression models (multiple and
logistic), tree-based algorithms (CART), and artificial neural networks, on healthcare datasets.
121
Abuse Detection System, recovered $2.2 million & identified 1,400 suspects for investigation after
operating for less than one year. ReliaStar Financial Corp. reported a 20% increase in annual
savings, while the Wisconsin Physician’s Service Insurance Corp noted significant savings. Finally
the Australian Health Insurance Commission estimated tens of millions of dollars of annual savings.
Highmark, a health insurance company, built classification models, based on claims, customer and
provider data, to identify potential fraud instances. Their fraud detection system aimed at real-time
analysis to build predictive models that can detect fraud and stop it before it occurs. Highmark has
found that conducting decision making regarding fraud is carried out more quickly than before, as
the classification system is automated to avoid labor-intensive work. This updating cycle of data
mining led to savings of up to $11.5 million.
In another study, van’ T Veer used the DNA microarray data of 98 primary breast tumors to cluster
the tumors using a hierarchical algorithm and classified the 34% “relapse” patients in the upper
cluster (62 tumors) and the 70% of “relapse” patients in the lower cluster (36 tumors) for the
development of distant metastases. Upper cluster is considered to be “poor prognosis tumors” while
lower cluster is “good prognosis tumors”. After clustering, the researchers also used classification to
predict poor prognosis; according to their results, the prediction of cancer outcomes using microarray
data was better than the prediction using clinical parameters.
In another case, a decision tree model (Security Blue Reimbursement Model) was built using patient
symptoms, health history, and patient demographics to predict the risk for developing diseases and
to rank patients based on the risk to develop one of 13 diseases. The objective was to enable proper
Medicaid and Medicare reimbursements. The cost of care for patients detected at an early stage can
be lowered as providers and insurers do not resubmit claims. These decision trees were annually
revised because of the growth of the number of diseases modeled.
The next example is an association rule mining application, and comes from South Korea; scientists
used the Korea Medical Insurance Corporation (KMIC) database to identify relationships between
two drugs, or between diseases to help formulate a government policy on hypertension management.
The KMIC used healthcare utilization data, demographic, clinical data (e.g., blood glucose) and
lifestyle data (e.g., smoking and drinking) from a nation-wide-health-promotion program. The
second example of association rule mining comes from Taiwan. Antacids are used to alleviate the
gastric ulcer and relieve heartburn and do not require prescription. Despite this, the Taiwanese NHI
reimburses them. Researchers were interested to know how antiacids are used with other drugs and
analyzed the use patterns of antacids using association mining. 526,693 patient visits and 2,574,739
prescription records were analyzed. With the use of a support level of 1% and a confidence level of
52.2%, the model output 36 association rules out of which the researchers manually extracted the
five most frequently used drug sets with antacids.
122
attributes, what questions need to be answered and what is the available data that can be utilized
to develop predictive algorithms.
Data preparation: as soon as priorities have been defined, the next step involves merging all the
data files that will be used for the analysis, which might come from different systems and databases.
In some cases, a random data sample is selected and data transformation methods are applied. The
above data preparation steps will contribute to forming the target dataset.
Modeling stage: this is the actual data analysis step, which can include one or more data mining
methods (cluster analysis, regression analysis, decision trees etc.). In this step, the development of
the model (testing phase) will be followed by the testing phase to estimate the model accuracy.
Evaluation stage: when the models have been created and their performance is known to the
researchers, the comparison of the models will facilitate decisions on which model(s) should be kept
in order to be incorporated into the under development decision support systems. The data mining
models will be compared by using a common yardstick, such as lift charts, profit charts, or diagnostic
classification charts. Factors to consider do not only include the model performance but also the
computational efficiency.
Deployment: this is where the data mining models are actually deployed and start to be used as
part of the healthcare function. The developed models are built around an intuitive graphical user
interface. An ideal data mining software should support intelligent data preprocessing that
automatically selects data for data mining and uses domain knowledge for various data processes
and should fully automate the knowledge discovery process so that it understands and utilizes
existing knowledge in data mining processes for better knowledge discovery.
1. Discuss how ‘the use of data mining in healthcare can contribute to improved patient
outcomes’.
2. Provide an example of knowledge discovery in healthcare data, with the use of supervised
methods and another one with the use of unsupervised methods.
3. Explain in what ways the model performance affects the potential feasibility of an algorithm
in a clinical environment.
4. Explain the training and testing phases of a classification experiment. Why is it unreliable
use a model that you only trained, without further testing it?
5. You want to find case-mix groups of your patients. Which family of data mining algorithms
would you use and why?
6. You want to predict the patient length of stay using clinical and demographic information.
Which family of data mining algorithms would you use and why?
124