You are on page 1of 387

Second Edition

QUALITY
HEALTH
CARE
Second Edition

QUALITY
HEALTH
CARE DEVELOPING
A GUIDE TO

AND USING
INDICATORS

Robert C. Lloyd, PhD


Vice President, Institute for
Healthcare Improvement
World Headquarters
Jones & Bartlett Learning
5 Wall Street
Burlington, MA 01803
978-443-5000
info@jblearning.com
www.jblearning.com

Jones & Bartlett Learning books and products are available through most bookstores and online booksellers. To contact Jones & Bartlett
Learning directly, call 800-832-0034, fax 978-443-8000, or visit our website, www.jblearning.com.

Substantial discounts on bulk quantities of Jones & Bartlett Learning publications are available to corporations, professional associations,
and other qualified organizations. For details and specific discount information, contact the special sales department at Jones & Bartlett
Learning via the above contact information or send an email to specialsales@jblearning.com.

Copyright © 2019 by Jones & Bartlett Learning, LLC, an Ascend Learning Company

All rights reserved. No part of the material protected by this copyright may be reproduced or utilized in any form, electronic or mechanical,
including photocopying, recording, or by any information storage and retrieval system, without written permission from the copyright owner.

The content, statements, views, and opinions herein are the sole expression of the respective authors and not that of Jones & Bartlett Learning,
LLC. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not
constitute or imply its endorsement or recommendation by Jones & Bartlett Learning, LLC and such reference shall not be used for advertising
or product endorsement purposes. All trademarks displayed are the trademarks of the parties noted herein. Quality Health Care: A Guide to
Developing and Using Indicators, Second Edition is an independent publication and has not been authorized, sponsored, or otherwise approved
by the owners of the trademarks or service marks referenced in this product.

There may be images in this book that feature models; these models do not necessarily endorse, represent, or participate in the activities
represented in the images. Any screenshots in this product are for educational and instructive purposes only. Any individuals and scenarios
featured in the case studies throughout this product may be real or fictitious, but are used for instructional purposes only.

This publication is designed to provide accurate and authoritative information in regard to the Subject Matter covered. It is sold with the
understanding that the publisher is not engaged in rendering legal, accounting, or other professional service. If legal advice or other expert
assistance is required, the service of a competent professional person should be sought.

Production Credits
VP, Executive Publisher: David D. Cella Cover Design: Timothy Dziewit
Publisher: Michael Brown Director of Rights & Media: Joanna Gallant
Associate Editor: Danielle Bessette Rights & Media Specialist: Merideth Tumasz
Vendor Manager: Nora Menzi Media Development Editor: Shannon Sheehan
Senior Marketing Manager: Sophie Fleck Teague Cover Image: © Michal Steflovic/Shutterstock
Manufacturing and Inventory Control Supervisor: Amy Bacus Printing and Binding: Edwards Brothers Malloy
Composition and Project Management: S4Carlisle Publishing Cover Printing: Edwards Brothers Malloy
Services

Library of Congress Cataloging-in-Publication Data


Names: Lloyd, Robert C., author.
Title: Quality health care : a guide to developing and using indicators /
Robert Lloyd.
Description: Second edition. | Burlington, MA : Jones & Bartlett Learning,
[2019] | Includes bibliographical references and index.
Identifiers: LCCN 2017022526 | ISBN 9781284023077
Subjects: | MESH: Quality Indicators, Health Care--organization &
administration | Program Development | Quality of Health Care
Classification: LCC RA399.A1 | NLM W 84.41 | DDC 362.1068/5--dc23 LC record available at https://lccn.loc.gov/2017022526

6048

Printed in the United States of America


21 20 19 18 17 10 9 8 7 6 5 4 3 2 1
“To Gwenn and Devon…thanks for giving my life
balance, quality and love!”
Contents
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Focus Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64
Personal Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Introduction to the Second Edition. . . . . . . . . . . . . . xiv
Leadership Walk-Rounds. . . . . . . . . . . . . . . . . . . . . . . . . 65
About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Unsolicited Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Chapter 1 
Setting the Context High-Tech Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
for Measurement . . . . . . . . . . . . 1 The Experiential Shopper. . . . . . . . . . . . . . . . . . . . . . . . 66
Surveys. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
The Growing Demand for Transparency. . . . . . . . . . . 2
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
The Growing Focus on Patient-Centered Care
and Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 4 
Milestones in the Quality
The Quality Funnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Measurement Journey. . . . . . 93
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Developing a Measurement Philosophy. . . . . . . . . .93
Chapter 2 Why Are You Measuring? . . . . 25 Measurement Roadblocks. . . . . . . . . . . . . . . . . . . . . . . 95
Connecting the Dots!. . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Milestones in the Quality Measurement
Types of Studies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Journey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Research for Efficacy, Efficiency, and Selecting a Specific Indicator . . . . . . . . . . . . . . . . . . . 104


Effectiveness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Developing Operational Definitions. . . . . . . . . . . . . 109
The Three Faces of Performance Measurement. . . 33 Developing Data Collection Plans. . . . . . . . . . . . . . . 113
CASE STUDY #1: Being a Translator. . . . . . . . . . . . . . . 39 Probability Sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . 117
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Nonprobability Sampling. . . . . . . . . . . . . . . . . . . . . . . 120
The Indicator Development Worksheet. . . . . . . . . 127
Chapter 3 
Measuring the Voice CASE STUDY #1: Transcription
of the Customer . . . . . . . . . . . 45 Turnaround Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
It All Starts with Listening. . . . . . . . . . . . . . . . . . . . . . . . 45 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Creating a Service Excellence Culture. . . . . . . . . . . . 46
Who Are Your Customers?. . . . . . . . . . . . . . . . . . . . . . . 55
Chapter 5 
Organizing Indicators into
Defining Key Quality Characteristics. . . . . . . . . . . . . . 57
a Strategic Dashboard. . . . . 143
Listening Three Times. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Evolution of the Strategic Dashboard. . . . . . . . . . . 144
Understanding VOC Tools . . . . . . . . . . . . . . . . . . . . . . . 61 Focusing on the Vital Few. . . . . . . . . . . . . . . . . . . . . . . 151

Image Credit © Michal Steflovic/Shutterstock

vi
Contents vii

CASE STUDY #1: East London National Key Questions About Shewhart Charts. . . . . . . . . . 214
Health Service (NHS) Foundation Trust’s Deciding Whether a Special Cause
Strategic Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . 152 Is Present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
The Role of Benchmarking. . . . . . . . . . . . . . . . . . . . . . 155 Deciding Which Shewhart Chart
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Is Most Appropriate. . . . . . . . . . . . . . . . . . . . . . . . . . 225
Types of Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Chapter 6 
Tapping the Knowledge Types of Shewhart Charts. . . . . . . . . . . . . . . . . . . . . . . 229
That Hides in Data . . . . . . . . 159 Defining the Key Terms. . . . . . . . . . . . . . . . . . . . . . . . . 230
Data Versus Information. . . . . . . . . . . . . . . . . . . . . . . . 159 You Make the Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Static Versus Dynamic Approaches to Additional Shewhart Charts. . . . . . . . . . . . . . . . . . . . . 247
Data Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Using Shewhart Charts Effectively. . . . . . . . . . . . . . . 249
CASE STUDY #1: The Monday Morning References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Dilemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Chapter 10 
Applying Quality
Measurement Principles. . 259
Chapter 7 
Overcoming Numerical
Illiteracy. . . . . . . . . . . . . . . . . 171 CASE STUDY #1: Predicting a
Cardiovascular Event. . . . . . . . . . . . . . . . . . . . . . . . . 259
Understanding Variation Conceptually. . . . . . . . . . 171 CASE STUDY #2: Sampling Central
Distinguishing Common from Special Line Infections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Causes of Variation. . . . . . . . . . . . . . . . . . . . . . . . . . . 179 CASE STUDY #3: Sampling Medicare
Making the Appropriate Responses to Insurance Audits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Common and Special Causes of Variation. . . . . 182 CASE STUDY #4: Tracking Patient Falls. . . . . . . . . . . 268
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 CASE STUDY #5: Pressure Ulcer
Prevention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Chapter 8 
Understanding Variation CASE STUDY #6: Evaluating Staffing
with Run Charts . . . . . . . . . . 187 Effectiveness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
What Is a Run Chart?. . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 CASE STUDY #7: To Flash or Not to
How Do I Construct a Run Chart?. . . . . . . . . . . . . . . 188 Flash—That Is the Question. . . . . . . . . . . . . . . . . . 283

How Do I Analyze a Run Chart?. . . . . . . . . . . . . . . . . 190 CASE STUDY #8: Clarifying the Operational
Definition of Readmission. . . . . . . . . . . . . . . . . . . . 286
CASE STUDY #1: Hand Hygiene
Compliance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 CASE STUDY #9: Managing a Breast
Cancer Patient’s Clotting Levels . . . . . . . . . . . . . . 287
A Few Closing Thoughts on Using
Run Charts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 CASE STUDY #10: Group B Streptococcus
in Pregnant Women. . . . . . . . . . . . . . . . . . . . . . . . . . 290
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
CASE STUDY #11: Emergency Department
Fast Track . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Chapter 9 Understanding Variation
CASE STUDY #12: Tracking Patient
with Shewhart Charts. . . . . . . 211 Complaints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Run Charts Versus Shewhart Charts. . . . . . . . . . . . . 211 CASE STUDY #13: Reducing Ventilator-
What Is a Shewhart Chart?. . . . . . . . . . . . . . . . . . . . . . 211 Associated Pneumonia. . . . . . . . . . . . . . . . . . . . . . . 300
viii Contents

CASE STUDY #14: Pain Management for Linking Measurement to Improvement. . . . . . . . . 339
Hip and Knee Replacement Patients. . . . . . . . . . 306 Building Capacity and Capability for
CASE STUDY #15: Hospice/911 Paramedic Improvement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
System Partnership to Improve Care. . . . . . . . . . 315 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
CASE STUDY #16: Improving Access to
Community Services for Mental Health
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
and Community Health Patients. . . . . . . . . . . . . . 317
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

Chapter 11 Connecting the Dots. . . . . . 331


Adopting Quality as a Business Strategy. . . . . . . . . 332
Developing a Learning System to Support
Improvement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
Acknowledgments
A
nyone who thinks that writing a book The team that Mike assembled to support
is a solitary activity is sorely mistaken. this second edition has been very skilled and
Sure, authors do squirrel themselves away extremely patient. In particular, I want to rec-
when developing the content, conducting research ognize and thank Danielle Bessette, associate
for the book, and writing chapters. But writing a editor at Jones & Bartlett Learning, for her
book is more of a team sport than anything else. I leadership in coordinating all aspects of this
have had the pleasure of being part of the Jones & endeavor. Danielle organized a wonderful team
Bartlett Learning team since the fall of 2000. of individuals skilled in marketing, media and
That was when Mike Brown, publisher at Jones & design, production, rights clearance, and co-
Bartlett Learning, informed me that the group I pyediting. Special recognition and thanks are
thought I was writing a book for decided to get extended to the Jones & Bartlett Learning team
out of the healthcare publishing business. The including Shannon Sheehan, media development
rights for the first edition of this book were then editor; Sophie Fleck Teague, senior marketing
transferred to Jones & Bartlett Learning and, as manager; Merideth Tumasz, rights specialist;
the expression goes, this was the beginning of a and Nora Menzi, vendor manager. In addition to
beautiful relationship. the Jones & Bartlett Learning team, I also want
Over the past 17 years, I have met many to thank several independent contractors for
talented and dedicated professionals at Jones & their significant contributions to the editorial
Bartlett Learning. The one constant in this jour- process. They include Palaniappan Meyyappan,
ney, however, has been Mike Brown. He has not project manager at S4Carlisle Publishing Services;
only been a wonderful sponsor of my work but Maria Leon Maimone, rights clearance editor;
when you are collaborating with an individual and Sandra Kerka, copyeditor, who painstakingly
for 17 years you move beyond a writer–publisher read and carefully edited the text. This team of
relationship and become friends. We have shared well-coordinated and courteous professionals
many professional and personal stories. Mike has allowed me to focus on the content and
has had to gently and at times not so gently use messages of this book while they attended to
his diplomatic motivational skills (and “sharp the technical and editorial details.
elbows” as he calls them) to get me to pick up Other individuals who need to be acknowl-
the pace. There have been times when I missed edged are the many dedicated professionals I
deadlines because of international travel, speak- have had the pleasure of working with over
ing engagements, and family commitments. But the years in many countries around the world.
Mike has always been able to maintain a positive This group includes participants in the quality
approach to our work and be very tolerant of my improvement (QI) workshops I teach, improve-
rather hectic schedule. For your professionalism, ment teams that I have had the pleasure of
support, and friendship, Mike, I thank you. coaching, senior leaders and board members

Image Credit © Michal Steflovic/Shutterstock

ix
x Acknowledgments

who have shared their visions for what is possible dedicated to the quality measurement journey
as well as the challenges they face in terms of ever since. In 2005, Don invited me to join the
realizing these visions, and, most important, IHI as a full-time employee. We have traveled
those who work tirelessly each day to deliver many roads together over the years and I am
care and support to those in need. These are grateful for his support, guidance, and willingness
the individuals who have been kind enough to contribute to this edition.
to share their stories and experiences with me Finally, I want to thank my wife Gwenn
and helped to set the context for this book. The and daughter Devon for their moral support,
case studies presented in Chapter 10 all stem tolerance, and input to this process. Many
from interactions with these individuals. Those evenings and weekends I disappeared into my
who have made substantive contributions to office to work on “the book.” This happened
the content or case studies are recognized in so often that they would actually just nod at
various chapters and the footnotes. each other and say “the book” when I started
I also want to thank Dr. Don Berwick for to head to my office. Unlike the first edition,
kindly writing the foreword to this second however, I was successful in making them
edition. I have known Don for almost 30 years. part of this process. They provided reviews of
In 1992, he invited me and a colleague, Dr. Ray various charts and graphs as well as played a
Cary, to teach a daylong program on the ap- major role in selecting the cover design. Gwenn,
plication of statistical process control (SPC) I especially appreciate all the time you spent
methods to healthcare situations at the Institute reading the draft chapters and offering edito-
for Healthcare Improvement’s (IHI) National rial comments and suggestions. Your outside
Quality Forum. This was one of the first expo- view and practical perspectives contributed
sures many healthcare professionals had to SPC. significantly to making the text flow and be
These initial workshops were so successful that easy to understand. I always looked forward
Don invited us back to the forum each year and to you saying, “Now this just does not make
then encouraged us to write a book for healthcare sense.” This provided an opportunity for im-
professionals focused on quality measurement provement. Thanks to both of you for making
and in particular SPC applications. I have been my life a quality journey!
Foreword
I
n the now 30-year history of bringing modern and improvable. My teachers in medicine taught
quality methods into the control, improve- me to be a heroic individual problem solver. My
ment, and planning of health care, skeptics mentors in organizational management taught me
sometimes comment on the “religious” tone to use incentives, hierarchy, and accountability
of that movement. Leaders and others in the to extract excellence. The language and tools of
workforce who get the quality “bug” seem to improvement revealed the underlying theory that
buzz with their enthusiasm. They adopt phrases “trying harder” was the route to success and that
like “joy in work,” “pursuing perfection,” and a metrics somehow—magically—led to results.
“never-ending journey” and sprinkle their vocab- Ideas like that now seem to me to be a
ulary with unfamiliar technical expressions, like pervasive form of system illiteracy. They are
“PDSA cycles,” “high reliability organizations,” not scientific. I simply did not know that for
and “statistical process control.” And, they seem much of my early career because it was, before
to think they are right, lamenting together that I studied systemic quality sciences, as evident
too many others do not see what they, at last, see. to me that effort is the root of results as it was
So it does, indeed, seem to newcomers as if to most people before Newton that apples fell
a religion, or at least a cult, has arrived in town. because they just moved toward the center of
The “immune reaction” can be strong. the universe.
If you are of that mind, think again. Imagine, It took breakthroughs in a number of sciences
maybe, that what these newfound enthusiasts to reach today’s level of understanding of how
are evincing is not religiosity but intellectual things get better, or worse, in complex systems.
excitement. To overstate, what might Galileo have That understanding—call it “quality sciences”
felt when, for the first time in human history, he if you need a name for it—came through even-
saw those moons of Jupiter and realized that they tually intersecting lines of progress in statistics,
must be orbiting a sphere? Or, more mundane, general systems theory, cognitive and behavioral
what did my 5-year-old grandson, Caleb, feel psychology, epistemology, and more. It also
last weekend, when all of a sudden at a Sunday continues to be dynamic. Like all sciences,
lunch he “got” the idea of letters spelling a word. quality sciences are in continual evolution and
(It was the word “bark.”) Like Galileo, maybe, increasingly powerful.
he laughed out loud. Happenstance introduced me to these sciences
I do not know why for so many decades health in my mid-forties, and I have never looked back.
care called itself “modern,” which it technically By understanding systems better, by relearning
became in the era of bioscience, but remained how to interpret and learn from variation, by
distinctly “unmodern” in its understanding of realizing how informative very small-scale, local
its work as a system—complex, interdependent, tests of change can be, by rethinking my theories

Image Credit © Michal Steflovic/Shutterstock

xi
xii Foreword

of human motivation and communication, I The prior, hegemonic, view of measurement


was able to see more clearly where defects were is that it causes improvement and is therefore
coming from and how to find and change their a powerful tool for implementing a theory of
causes. Those subjects, mastered over time, gave exhortation, accountability, and incentive. Not
me lenses and tools far more persuasive and so. Get in touch with the workforce and ask
helpful than the atheoretical approaches of the them how it is going with the metrics in their
first part of my career. work lives, and they will tell you how scrutinized
Maybe it was an epiphany of sorts. But there they feel and how demoralized, threatened, and
was nothing at all “religious” about it. I just misunderstood they feel by that scrutiny. They
learned things I had not previously known—new will equate measurement with waste and risk,
guides to effective action. Someone showed me not growth and learning.
Jupiter’s moons. The scientific approach to improvement
That’s not comfortable, at least at first. It is also values measurement, but it is measurement
not easy to let go of theories closely held, even for learning, not measurement for judgment.
when shown logically to be wrong. Galileo It knows, in the words of an African proverb,
paid a huge price for that in a public that found that “weighing a pig does not make the pig
misconception less disruptive than changed fatter.” But it also knows that careful, respectful
perception. And so, the jargon and excitement metrics, linked with sound interpretation of
of the quality sciences are easy prey to those variation, trust in the workforce, methods for
whose beliefs are time honored, though wrong. local trials and tests, celebration, and supports
To accept the change in understanding for innovation, can be invaluable in continual
depends in part on teachers—people with the improvement. And that all of this matters in the
patience to meet learners where they are and search for knowledge, put to use.
walk them down the path of new perception. Bob Lloyd is the best teacher I have ever
Like religion this takes empathy and compas- met in those vineyards of measurement-for-­
sion. But, far from faith-based change, this job improvement. He is stunning in the classroom. I
also takes rigor and commitment to science. have teased him often about how relentlessly at
I love the quotation from Albert Einstein at the the top his ratings are in the many Institute for
entrance to the Keck Building headquarters of Healthcare Improvement conferences where he
the National Academies of Sciences, Engineering, is an instructor. It’s very hard for the rest of us,
and Medicine: “The right to search for truth like watching a gymnast do what is for normal
implies also a duty; one must not conceal any people impossible. We watch him in awe as he
part of what one has recognized to be true.” If takes novices by the hand and in days shapes
that be a religion, sign me up. them into expert interpreters of variation and,
And that brings us to this book, Quality therefore, far more helpful quality champions
Health Care: A Guide to Developing and Using in their home organizations.
Indicators, and to its author, my longtime friend, This book is a resource for that change,
mentor, and colleague, Bob Lloyd. No topic is written by a master. Bob has been able to
more thoroughly a battleground between the skillfully blend the quantitative aspects of the
older, and unscientific, methods of improvement ­science of improvement with the more qualitative
and the newer, more theoretically grounded and strategic aspects that allow organizational
methods that I call “modern” than is the topic transformation to flourish. In the final chapter
of measurement. And few topics generate more he provides clear guidance on how to “connect
controversy at first. the dots” by linking measurement efforts to
Foreword xiii

improvement. As he points out, “Data without It’s not a religion. It’s intellectual progress,
a context for improvement are useless!” We have personal and cultural. So, welcome it, and read on.
a long way to go yet in grafting quality science
into the core of our healthcare systems, but those Donald M. Berwick, MD, MPP
who really want to do it, to help our patients, President Emeritus and Senior Fellow
their families, and our communities, have no Institute for Healthcare Improvement
better place to turn for their development than Cambridge, MA
to this book and this teacher.
Introduction to the
Second Edition
T
he previous edition of this book outlined 8 chapters whereas this one has been expanded
the foundation for applying statistical pro- to 11 chapters. The context for this edition
cess control (SPC) methods to healthcare has also been expanded. In the first edition,
situations. As I have used this book to teach classes the context was based on events occurring in
and seminars on SPC, however, it soon became the U.S. health system. My work over the past
clear to me that a number of measurement chal- 12 years has allowed me to expand my perspec-
lenges were still facing healthcare professionals. tives by working in many different countries
Specifically, I discovered that the key concepts and experiencing a variety of health and social
related to SPC, such as common and special care delivery systems. In this current edition,
causes of variation, variables and attributes data, therefore, I have included examples of my work
and proper control chart selection, were quickly and experiences in other countries.
being grasped and understood. What healthcare Chapter 1 provides a context for understand-
professionals seemed to be struggling with most ing quality measurement with a central focus
often, however, was something I had taken for on four key issues: (1) the growing demand for
granted. I had assumed that people knew what greater transparency of healthcare performance
they wanted to measure and how to organize indicators; (2) the increasing role that patients,
data collection efforts. It is difficult to apply families, and caregivers are playing in making
control chart knowledge properly if you have healthcare decisions; (3) the increasing use of
not defined appropriate indicators and collected quality improvement (QI) concepts, tools, and
data that accurately represent the process being methods; and (4) the definition of quality. This
measured. SPC charts developed from incom- chapter provides an overview of some of the
plete operational definitions and poor data key historical as well as recent forces that have
collection strategies may look nice graphically shaped the transformation of the healthcare
but they have limited utility for improving care industry throughout the world. It concludes
processes and building knowledge. Helping by discussing three key definitions of the term
healthcare professionals address issues related quality that often lead organizations to send
to indicator development, data collection, and mixed messages about the role of quality and
statistical analysis, therefore, are the primary what they are trying to achieve.
objectives of this edition. In Chapter 2, a simple question is posed to
This edition is a completely revised version challenge the reader. Specifically, “Why are you
of the first edition. The first edition contained measuring?” There is constant and ever-growing

Image Credit © Michal Steflovic/Shutterstock

xiv
Introduction to the Second Edition xv

demand to measure healthcare performance across on transcription turnaround time to demonstrate


the globe. But to what end? Is it to do academic how a successful QMJ can be completed.
studies, pass judgment on performance, and hold After a number of specific indicators have
healthcare providers accountable for their actions been developed, there is value in organizing
or to improve the efficiency, effectiveness, and them into a cogent and parsimonious format
safety of healthcare delivery? that can be shared with those who do not need
Measuring the voice of the customer (VOC) to know all the details involved with indicator
is the topic of interest in Chapter 3. This chapter development. Chapter 5 provides guidance on
is greatly expanded from what I covered in the how to accomplish this by building strategic
first edition. All quality journeys should begin dashboards. The discussion begins by character-
by listening to the VOC, the point at which you izing the similarities and differences between two
can learn what your customers want, need, and popular ways of organizing indicators—report
expect. The growing role of the customers and cards and dashboards. The argument is made that
methods for documenting and analyzing their the dashboard concept is much more relevant to
expectations are the central topics in this chap- healthcare performance measurement than is the
ter. Three key listening points in the customer notion of the report card. A case study provides a
experience are identified (preservice, point-of- practical example of how one healthcare organiza-
service, and postservice), and a variety of tools tion in London has built a very effective strategic
designed to capture data at these critical junctures dashboard. Chapter 5 concludes with a review of
are reviewed. Because surveys are used so much the role of benchmarking in healthcare settings.
in health and social care settings I have added a After collecting data and organizing your
large section in this chapter to address the design, indicators, the next major milestone in the QMJ
development, and use of surveys. This chapter ends is to tap the knowledge that hides in the data.
with a brief example of how VOC measurement Chapter 6 addresses two fundamental issues related
needs to be connected with voice of the process to this challenge: (1) the differences between data
(VOP) measurement in order to make continuous and information and (2) how static and dynamic
QI a practical reality. approaches to data analysis provide fundamentally
Chapter 4 provides details on the milestones different perspectives to data analysis. A case
in the quality measurement journey (QMJ). All study on presenting data to a management team
too often organizations take a serious detour in is used to illustrate the importance of building a
their QMJ because they try to measure a broad dialogue on these issues.
concept (e.g., patient harm) rather than identi- With these distinctions established, Chapter 7
fying specific indicators that could be used to provides a way to start immunizing yourself
measure actual performance (e.g., medication and others against numerical illiteracy. This
errors per 1,000 doses dispensed or the per- requires that everyone in the organization needs
centage of patients appropriately placed on bed to have a clear understanding of what Dr. Walter
alarms). This chapter provides specific guidance ­Shewhart established as a fundamental principle
in selecting indicators, developing operational for interpreting data aimed at QI. Specifically, his
definitions, and executing data collection strate- distinction between common and special causes
gies (including stratification and sampling). An of variation needs to be the primary way that
Indicator Development Worksheet is offered to everyone in the organization approaches data
assist the reader in working through all the issues analysis and interpretation. Yet, the frequent
involved with selecting and building a few good use of aggregated data and summary statistics
indicators. This chapter ends with a case study compounded by data presentations that rely on
xvi Introduction to the Second Edition

red/yellow/green displays of the data create major In other case studies I have respected individual
challenges for most healthcare providers when it wishes to remain anonymous. For all of those
comes to understanding variation. Options for involved with sharing their stories, I am grateful.
moving to a new way of thinking are offered in In the concluding chapter, Chapter 11, I move
this chapter. away from the technical details of the QMJ and
Chapters 8 and 9 provide details on how address how an organization can s­ trategically
to understand variation statistically with SPC and operationally build quality thinking and
methods. Specifically, Chapter 8 focuses on the action into the very fabric of the daily life.
run chart and Chapter 9 dives into the details Four key strategies for accomplishing this are
of the Shewhart control charts. The statistical discussed: (1) adopting quality as a business
and operational foundations for using these strategy, (2) developing a learning system to
practical statistical tools are discussed as well as support improvement, (3) linking measurement
the basic elements of the charts, decision criteria to improvement, and (4) building capacity and
for selecting the appropriate chart, and rules for capability for improvement.
detecting special causes of variation. These chapters My design for this book has been to start
provide case studies as examples and exercises to rather broadly by establishing the context for
test the reader’s knowledge of selecting the most measurement. It then proceeds to become more
appropriate Shewhart chart for various indicators. detailed and specific as I describe the milestones
Chapter 10 presents 16 case studies de- in the QMJ and their technical components.
signed to demonstrate how the various quality It then concludes by moving back to a broad
measurement principles and tools can be applied perspective describing how an organization can
to healthcare situations. These examples span a establish the structures, processes, and culture
wide range of topics, including indicator selec- required to demonstrate that every day is truly
tion and development, sampling applications, a quality journey. Some readers may want to
and control chart analysis. The case studies start at Chapter 1 and progress in a logical
address clinical as well as operational aspects sequence through the chapters. Others should
of healthcare delivery. All of these case studies feel free, however, to jump around from chapter
are grounded in actual experiences I have had to chapter depending on their interests. Each
in applying the QMJ principles. In some of the chapter has been written to stand more or less
cases the organization or individuals who were on its own. So, follow your own path and enjoy
kind enough to share their stories are identified. the journey.
About the Author
Dr. Robert Lloyd is vice president at the Institute the University of Chicago’s graduate program in
for Healthcare Improvement (IHI). Dr. Lloyd health administration, the Healthcare Forum, the
provides leadership in the areas of performance International Quality and Productivity Center,
improvement strategies, statistical process control the American Health Information Management
methods, development of strategic dashboards, Association, the Joint Commission, the Group
and building capability for quality improvement Practice Improvement Network, the Ontario
(QI). He also serves as lead faculty for various Hospital Association, the Vancouver (British
IHI initiatives and demonstration projects in Columbia) Quality Forum, the Medical Group
the United States, the United Kingdom, Sweden, Management Association, the BMJ European
Denmark, Norway, New Zealand, Australia, Quality Forum (London, Barcelona, Berlin,
­India, Malaysia, Qatar, Dubai, and Africa. Before Amsterdam, Goteborg, and Paris), and numer-
joining the IHI, Dr. Lloyd served as the corporate ous QI organizations around the United States.
director of Quality Resource Services for Advocate Dr. Lloyd has also presented his seminars on
Health Care (Oak Brook, IL). He also served statistical thinking to physicians and adminis-
as senior director of quality measurement for trators from the Federation of County Councils
Lutheran General Health System (Park Ridge, in Stockholm and Jonkoping, Sweden, to leaders
IL), directed the American Hospital Association’s of the National Health Service throughout the
Quality Measurement and Management Project United Kingdom and New Zealand, and to
(QMMP), and served in various leadership roles patient safety leaders throughout Denmark,
at the Hospital Association of Pennsylvania. The South Africa, and Ghana.
Pennsylvania State University awarded all three He has published numerous articles, chapters,
of Dr. Lloyd’s degrees. His undergraduate degree and reports on a wide range of topics, including
is in sociology, his master’s degree is in regional QI theory and implementation, measurement
planning, and his doctorate is in rural sociology. and statistical methods, clinical outcomes,
Dr. Lloyd has addressed over 2,500 national customer satisfaction, information systems, and
and international meetings of professional parish nursing.
groups and associations. Over 200,000 par- Dr. Lloyd is coauthor of the internationally
ticipants from the United States and abroad acclaimed book Measuring Quality Improvement
have attended his classes and presentations on in Healthcare: A Guide to Statistical Process C
­ ontrol
QI. He has served as faculty for the Harvard Applications (American Society for Quality Press,
School of Public Health, the American College 2001, fifth printing). Dr. Lloyd’s second book,
of Healthcare Executives, the American Society Quality Health Care: A Guide to Developing and
for Quality (ASQ), the University of Wisconsin’s Using Indicators was first published by Jones &
graduate program in administrative medicine, Bartlett Learning in 2004.

Image Credit © Michal Steflovic/Shutterstock

xvii
CHAPTER 1
Setting the Context
for Measurement
B
elieve it or not, there was a time when between those who provide care, those who
data and measurement were not at the receive it, and those who pay for it. Whenever
center of the healthcare debate. For society begins to lose confidence in an insti-
example, when a patient went to see a doctor tution, however, there is typically a demand
back in the 1960s, 1970s, and 1980s, he or she for greater oversight of the institution and a
usually said, “Thank you very much, doctor. Do related push for more data on the products
I need to schedule a follow-up appointment?” or services it provides.1 This is precisely what
Today it is more likely that the patient will ask has happened and is happening within the
questions such as: “Where did you get your healthcare industry.2 As consumers, political
medical degree? How many times have you leaders, the media, and the public in general
done this procedure? What is your compli- have increasingly asked questions about the
cation rate for this procedure? Do you have efficiency, effectiveness, and reliability of
data to show that this procedure will improve healthcare services, there has been a concom-
my health? Have you had a malpractice claim itant growth in the demand for more data on
filed against you in the past 5 years?” Then, healthcare providers and results.
if the patient is really actively involved in his For decades, providers have been collecting
or her own healthcare delivery, he or she will what is classically referred to as administrative
probably pull out a piece of paper and inform data (e.g., in a hospital these data included the
the doctor, “This is what I found on the Internet number of admissions and discharges, percentage
last night and here is what I think is wrong of patients who die in the hospital, percentage of
with me. . . .” patients discharged to home or nursing homes,
Over the past two decades the healthcare number of lab tests performed, visits to the
industry has moved away from being an industry emergency department, number of setup and
based on high levels of trust and partnership staffed beds, resource usage, average length of
between physicians and their patients to one stay, and selected infection rates). These data
of gentle (and at times not so gentle) tension have routinely been used internally to make

© Michal Steflovic/Shutterstock

1
2 Chapter 1 Setting the Context for Measurement

management decisions and then submitted experiences in caring for the war’s wounded
externally to various oversight or regulatory but also provided detailed recommendations
bodies at the state, regional, or national levels for reforming the organization and delivery
at designated points in time for aggregation. of care within the British military hospitals.
Typically, these data are lagged anywhere from She developed statistical graphs to summarize
a few months to a year or more.3 Analysis of the causes of mortality, known as Polar Area
administrative data is usually performed using Diagrams (FIGURE 1-1) and then used these
summary statistics (e.g., the mean, median, data to criticize the unsanitary conditions that
mode, minimum, maximum, range, and stan- prevailed in military hospitals and decried the
dard deviation) and comparisons by quarter incidence of preventable deaths. Although
or year are often the preferred approach to she did have her critics, she was not vilified
determine whether time period 1 is different nearly as much as the most ardent proponent
from time period 2. of transparency, Dr. Ernest A. Codman.
Healthcare administrative data are still Dr. Ernest Codman (1869–1940) is best
collected on a regular basis around the world. known for his controversial “end results” sys-
Yet over the past 15 years, three key develop- tem. “The end result idea is simply that doctors
ments within the healthcare industry have had should follow up with all patients to assess the
a profound impact on moving healthcare mea- results of their treatment and that the outcomes
surement and data collection in new directions: actively be made public” (Swensen and Cortese,
(1) the growing demand for greater transparency, 2008, p. 233). As a prominent surgeon in Boston,
(2) the growing focus on patient-centered care and Codman was a firm believer in making data not
concerns over poor service, and (3) the growing only more available to those who deliver care
role of quality improvement (QI) concepts, tools, but also that it should be provided openly and
and methods.4 freely to the public so they could make informed
decisions on which doctors and hospitals they
would use. His colleagues at Massachusetts
▸▸ The Growing Demand General Hospital and Harvard Medical School
(he graduated in 1895) favored many of his sug-
for Transparency gestions and recommendations for improving
treatment procedures and outcomes as long
Even though the discussions on transparency as the information and data stayed within the
in health care have increased significantly medical profession. It was Codman’s insistence
over the past few years many people fail to that the public have access to these results
realize that this is not a new topic within that created the major controversy. Codman
the healthcare profession. Three individuals held fast in his beliefs but it was at a price. He
specifically played key roles in the earlier received stern criticism from his colleagues
debates on data disclosure in healthcare set- at Massachusetts General Hospital when he
tings: Florence Nightingale, Ernest Codman, proposed the outlandish notion that data on
and Francis Peabody. Florence Nightingale physician and hospital performance should be
(1820–1910) is probably the best known of this released to the public:
threesome. She was one of the first advocates
for data collection and release. Her work in I am called eccentric for saying this
the Crimean War is legendary. She wrote in public; that hospitals, if they wish
Notes on Matters Affecting Health, Efficiency to be sure of improvement, must find
and Hospital Administration of the British out what their results are, must analyze
Army (1858), which detailed not only her their results, to find out their strong
The Growing Demand for Transparency 3

Death from wounds in battle


Death from other causes
Death from disease AU
April 1854 to March 1855 JULY

G
E

US
JUN

T
BULGARIA
Y

SE
A
M

PTE
APRIL

MBER
1854
CRIMEA

OCTOB
1855
RCH

ER
MA

NO
VE
M
BE
R
Y
AR
RU

B
FE
DEC
EMBE
R

855
NUARY 1
JA

FIGURE 1-1 Florence Nightingale’s Polar Area Diagram Showing Mortality in the Crimean War by
Cause (1855)
Nightingale, F., Notes on Matters Affecting the Health, Efficiency, and Hospital Administration of the British Army. Founded Chiefly on the Experience of the Late War. Presented by Request to the Secretary of State
for War. Privately printed for Miss Nightingale, Harrison and Sons, 1858.

and weak points, must compare their could further his practices of transparency and
results with those of other hospitals disclosure of patient results. In 1916, he pri-
and must care for what cases they can vately published a compendium of his hospital’s
care for well. Such opinions will not be results from 1911 through 1916 including the
eccentric a few years hence. (Codman, “clinical misadventures” that he and others had
1917, p. 183) made (Codman, 1916, 1924; Donabedian, 1989;
Neuhauser, 1990). Dr. Codman was clearly a
In 1914, his hospital refused his plan to man ahead of his time. He saw the medical
evaluate the skill and outcomes of surgeons. profession as a caring and compassionate
Eventually, he was denied privileges at Massa- discipline but one in which there was an obli-
chusetts General Hospital, which led Codman gation on the part of the providers of care to
to resign and establish his own hospital (inter- be open and forthright with the results of their
estingly enough called the Codman Hospital work. Unfortunately, this was a message that
and also the End Results Hospital) where he was not appreciated in the early 1900s by his
4 Chapter 1 Setting the Context for Measurement

colleagues. In 1914, Dr. Edward Martin wrote only guidance but also a challenge for today’s
to Codman the following: medical professionals:

Dear Codman: The treatment of a disease must be


God bless you! I suppose I should completely impersonal; the treatment of
hate you if I lived in the same town, but a patient must be completely personal
my feeling, being remote, is quite other. (1927, p. 878).
Indeed the very enemies who lurk in The good physician knows his
second story windows with muffled patient through and through, and his
rifles are waiting your passing, are the knowledge is bought dearly. Time,
ones who take off their hats in deepest sympathy and understanding must
respect as your cold, but beautiful, corpse be lavishly dispensed, but the reward
is carried away. (Mallon, 2000, 63) is to be found in that personal bond
which forms the greatest satisfaction
Codman remained steadfast in his beliefs of the practice of medicine. One of
and practices. In the end, however, the profes- the essential qualities of the clinician
sion’s dissatisfaction with him and his insistence is interest in humanity, for the secret
on the end results system continued to grow. of the care of the patient is in caring
Although no one actually took shots at him, for the patient (emphasis added by
he did spend the last years of his life with few author) (1927. p. 887).
patients, no referrals from his medical colleagues,
and little money. During his final year of life, The fact that Dr. Peabody stressed the need
he wrote that in the future he hoped that he for transparency between the physician and the
would receive more favorable reviews than he patient is probably one of the key reasons that
did while he was alive (Mallon, 2000). Berwick he was highly regarded by his colleagues and
in a Milbank Quarterly article (1989, p. 266) Dr. Codman was not. Codman’s insistence on the
offered a postscript on Codman’s life and work, public disclosure of all hospital and individual
“Codman looked ahead. He looked, indeed, practitioner data challenged the existing status
beyond us. Seventy-eight years ago he began his quo. Peabody on the other hand was seen as a
life’s work; forty-eight years ago he died. Are we compassionate and caring physician who was
ready for him yet?” devoted to the individual patient’s needs and
The third individual who helped to set the expectations. This was something that could be
stage for a dialogue on transparency is Francis addressed quietly in the privacy of the doctor’s
Peabody (1881–1927). Although less well known office and in one-to-one conversations with
than Florence Nightingale and certainly less the patient. Such behaviors did not threaten
controversial than Ernest Codman, Dr. Peabody the status quo of the medical profession as
made major contributions to the healthcare did public postings and disclosure of results
field as a clinician, teacher, and researcher. proposed by Codman. Furthermore, Peabody
He is best known for stressing transparency did not stress that the individual physicians
with individual patients and involvement who did not engage in these open exchanges
of the patient in making care decisions. His with patients should be publicly identified. It
basic philosophy was captured in a series of was, in his view, all a matter of personal taste
lectures to students at Harvard Medical School and preference, and the results should be
in 1926 and subsequently published under the discussed only between the physician and the
straightforward title, “The Care of the Patient” patient. For additional detail on the life and
(Peabody, 1927). Several key quotations from practice of Dr. Peabody, see Oglesby (1991)
Dr. Peabody’s classic 1927 article provide not and Hurst (2011).
The Growing Demand for Transparency 5

If we fast forward from the transparency quality measures. If you were a patient
pioneers of the early 1900s, we realize that many living in the Livonia, Michigan area and
of the barriers they campaigned against are still trying to decide which hospital to select
with us today. Despite the growing push for for your knee replacement surgery, how
greater transparency in healthcare data, there is would you use these results? The data are
still considerable confusion and lack of consensus made public, so this is not so much an issue
surrounding this topic. The classic definition of about reporting data to the public but rather
transparency (Webster’s Dictionary, 1984) is that a lack of consistency in developing public
it is something that is easily detected; obvious; reporting methods and systems, which is one
readily understandable; clear; without guile or of the serious issues related to transparency
cover, candid; clearly perceived; lucid. This seems in the United States.
to provide a reasonably straightforward set of ■■ The April/May 2013 issue of AARP: The
definitions and criteria. But in a healthcare set- Magazine made its first foray into the
ting, the question of transparency has not been transparency arena when it published an
quite so straightforward. A colleague of mine, for exclusive report titled “America’s Safest
example, told me he thought that transparency Hospitals: Is Yours on the List?” When you
is “A frequently used term in healthcare settings read the article, however, you discover that
that has no apparent consistent definition or AARP (formerly American Association of
meaning.” Retired Persons) did not do the assessments
Transparency has been and will continue to but partnered with the Leapfrog Group to
be a challenge conceptually, operationally, and prepare the report using the Leapfrog Group’s
politically. A few examples of recent transparency methodology and their 26 measures. A total
challenges are worth reviewing. of 66 “safety superstar” hospitals are listed
in the AARP report. Although singling
■■ An article in the Chicago Tribune (March 28, out 66 of the 5,724 hospitals in the United
2013) had the following headline, “Do Hos- States may bring comfort to those who
pital Ratings Bring Clarity or Confusion?” live near these facilities and recognition
The story led with the following conclusion: to the hospitals, it makes you wonder how
“Patients left to judge credibility of rankings well these 66 institutions would fare if they
groups and an array of data.” The article were assessed by the other organizations
focused on St. Mary Mercy Hospital in that rated St. Mary Mercy Hospital (i.e.,
Livonia, Michigan. The hospital received Healthgrades, the JC, U.S. News & World
rankings in terms of quality and safety by Report, and Consumer Reports).
four different organizations. The results ■■ Even though AARP The Magazine published
showed that the Leapfrog Group gave the the safest hospital list in their April/May
hospital an A rating. Healthgrades named 2013 edition, the July/August edition of
the hospital as one of the top 50 hospitals another AARP publication, AARP Bulletin,
in the United States. At the other end of the included an article titled “Lifting the Veil on
spectrum, however, the Joint Commission Hospital Rates.” This piece lauded the recent
(JC) and U.S. News & World Report excluded releases by the government showing what
St. Mary Mercy hospital from their lists of U.S. hospitals charge for various inpatient
“best hospitals.” Finally Consumer Reports procedures, noting that this “may be ushering
gave the hospital an average rating of 47 in a new era of transparency in healthcare
out of a possible 100 points. The article also costs, which have long been closely guarded
describes how the Medicare program will by hospitals” (AARP Bulletin, July/August,
be reducing the hospital’s payments owing 2013, p. 4). Although the release of such data
to poor performance related to Medicare is a step in the right direction, the challenge
6 Chapter 1 Setting the Context for Measurement

is that what hospitals charge for a procedure Emanuel, the University of Pennsylvania
may have little to do with what it costs them bioethicist and oncologist who helped
to produce this procedure or service and the White House draft the healthcare
may be a far cry from what Medicare (or reform law. In the July 25, 2014 edition
an insurance company) pays for the proce- of The Wall Street Journal Dr. Emanuel’s
dure. As the article points out the data show sharp criticism of the U.S. News & World
“wild disparity in billing rates for the same Report’s “Best Hospitals” rankings starts
procedure.” One example cited references with the claim that they are “flawed to the
a Houston hospital that charged patients point of being useless.” He continues his
$126,157 for a hip replacement whereas a critique by pointing out that the rankings
hospital in Appleton, Wisconsin charged are based on an overreliance on reputation
$26,787 for the same procedure. They point and a failure to take into account quality
out that Medicare pays a hospital about indicators, such as hospital-acquired in-
$14,000 for a hip replacement procedure. fections and the incidence of preventable
The article concludes with a quotation from falls and pressure ulcers. His conclusion
Stuart Guterman of the Commonwealth is that hospital rankings are good only for
Fund: “the data demonstrate that hospital “media hoopla and a few chest-thumping
charges have nothing to do with the cost or press releases from hospitals at the top of
quality of care. It’s becoming less and less the list.” Reading this story caused me to
clear what hospital charges are based on. The reflect on when I worked for the Hospital
long term goals should be to make hospitals Association of Pennsylvania during the
charges relate to what the actual cost of care mid-1980s. The first hospital ratings re-
is” (2013, p. 47). A similar article in the port issued by the Pennsylvania Health
Chicago Tribune (“Hospital Sticker Shock: Care Cost Containment Council (PHC4)
Prices, It Turns Out, Can Vary Widely” May was received with mixed reactions. Yet, a
18, 2013, 10) provides charges for different few of the hospitals that were ranked in
procedures by hospital and arrives at the the top decile of about 300 hospitals took
same basic conclusion as the AARP Bulletin it upon themselves to place ads in local
article, that charges show wide variation for newspapers extoling their good outcomes
the same procedure. Yet, they take a positive and high quality. Interestingly enough
stance on the benefits of such attempts at the next report released by the PHC4 was
transparency. They conclude that any data not so kind to some of these top-ranking
on doctor and hospitals performance has hospitals. In turn, there were a few ads in
value. They write: “The more information local newspapers placed by competing
about doctor and hospital prices and quality hospitals pointing out that those who were
of care the better the choice. No people won’t previously ranked in the top decile had now
visit or shun a hospital only because of its fallen from grace. The ranking sword can
price list. But the pressure is on hospitals. cut both ways! The Healthcare Business
They need to explain why they’re charging Blog from Modern Healthcare documents
double or triple for the same procedure as this story at http://www.modernhealthcare
the hospital down the street.”5 So, even when .com/article/20130725/BLOG/307259986.
there is a spirit of transparency the data and ■■ If we move from hospital transparency
hence the conclusions being derived from to physician transparency the challenges
the released data may in fact be confusing, become even more pronounced. If Dr.
especially for consumers. Codman was alive today he would prob-
■■ A more dramatic response to the release ably feel that not much progress has been
of hospital ratings comes from Dr. Ezekiel made since he advocated for his end results
The Growing Demand for Transparency 7

system over 110 years ago. Finding data the way physician payments are derived a
on physician volumes and results can be bipartisan group of legislators has drafted
a real challenge. Some of the state data a bill that would require the reshaping of
commissions (e.g., the PHC4) do offer data the way Medicare pays doctors.
on the volume of procedures for selected ■■ The issue of transparency as I mentioned
surgeries for surgeons by name. For hip earlier, is not restricted just to the United
and knee surgery, for example, the PHC4 States. In England, for example, The Daily
lists individual surgeons by name and the Telegraph (June 13, 2013, no. 49, 157) had
number of procedures done, but their results the following headline on the front page:
on quality measures (i.e., infections, blood “How Bad Doctors Can Hide Failings.” The
clots, readmissions, and length of stay) are story details how the National Health Service
not made public. Postings of doctor-specific (NHS) developed “league tables” that would
results have been received well by the public show doctor-specific results for selected
but are still being resisted by many state surgeries and treatments. All this seemed
medical societies. like a good idea designed to assist patients
■■ A recent issue for transparency in the in selecting “top performing doctors.” But
physician arena is the manner in which the newspaper’s reporters discovered that
physician payments are determined. The the NHS has been contacting doctors and
American Medical Association (AMA), asking whether “their” data can be released.
the chief lobbying group for physicians in Those who refuse to have their results re-
the United States, meets confidentially each leased will not have data relating to their
year to determine what values (time and performance published and they will not be
intensity of a procedure) should be assigned identified for refusing to participate in the
to the services and procedures performed release. A high-ranking government official
by physicians. These value assessments are told the newspaper that this was a “farcical
then turned over to the Centers for Medi- situation.” Jeremy Hunt, the Health Secre-
care and Medicaid (CMS) to establish the tary for the English NHS, stated that “The
actual payments being made to physicians medical profession has closed ranks to stop
for treatment of eligible Medicare patients. patients finding the truth.” He concluded
A recent investigation into this practice that “transparency and participation must
(Chicago Tribune, July 26, 2013, section be the operating principles of the NHS.
2,10) revealed that the values and subse- They can lead to more effective health care,
quently the payments to doctors are “so better outcomes, greater accountability and
exaggerated that many doctors ‘averaged’ efficiency.” In response the Royal College
more than 24 hours of work per day.” In a of Surgeons said that doctors are covered
study of 340 doctors at outpatient surgical by data protection laws and that therefore
clinics in Florida, it was discovered that they must give their consent before their
the doctors performed at least 16 hours of data can be released to the public. Yet Mr.
procedures each day, even though most of Hunt indicated that the NHS is proceeding
the clinics were open for about 10 hours to release individual surgeon data within
each day. The study also discovered that several months on 10 different surgical spe-
78 of the 340 doctors actually fit 24 or more cialties. They then expect to release data on
hours of work into a day. Twenty-one of the all doctors, including general practitioners,
doctors actually filed reports (and claims over the next year and a half. Referencing
for payment) that showed they fit over how the release of similar data on individual
30 hours of procedures into a 24-hour day. surgeons in New York State put pressure on
As a result of this new transparency in poor performing doctors and led to improved
8 Chapter 1 Setting the Context for Measurement

treatment and outcomes, the NHS Depart- They sue us because of broken relationships.
ment of Health concluded: “Patients should How can we get on with the job of making
be able to see how individual senior doctors care better if we can’t talk openly about
are performing. If there are legal grounds for what goes wrong?” (p. 42). Increasingly
individual doctors opting out, any patient healthcare providers are being more open
and their family would be entitled to ask with their data despite the legal concerns
why and may prefer their operation to be of some. It is not uncommon, for example,
carried out by someone who was prepared to see data on the number of days since the
to be fully transparent” (Daily Telegraph, last pressure ulcer or fall posted on the wall
June 13, 2013, no. 49, vol. 157, 2). of a hospital unit. This is occurring more in
countries outside the United States where
In the midst of all this controversy and the malpractice claims are not as prevalent but
challenges surrounding transparency, there are there are plenty of organizations that are not
some guiding lights. There does seem to be a hiding behind their lawyers.
growing realization, even on the part of providers ■■ Keep it simple. This point is an extension of
of care, that greater transparency in health care is the previous recommendation. The essence
not only a good thing but an inevitability. So, how of it is that you do not need to purchase and
can leaders become better prepared to ­address install an expensive, complex, and automated
the transparency issue? Dr. Jim Reinertsen, data repository in order to understand
a senior fellow at the Institute for Healthcare the current status of quality and safety
Improvement (IHI), developed guidance for within your organization. Plotting run or
increasing an organization’s awareness of and control charts by hand and posting them
practice in transparency. He calls it “how to go on the unit for staff and the public to see
naked” (Reinertsen, 2012). Reinertsen identifies provide the foundation for improvement.
four specific things leaders can do to motivate Rather than wait for monthly or quarterly
greater transparency: data to be aggregated in the expensive,
■■ Start undressing at the top. By this he means complex, and automated data repository,
that board reports are too often all about verified, and then distributed back to the
good news, which leads the board members departments and units frequently weeks or
to think that quality and safety are much months after the fact, this recommenda-
better than they really are. He encourages tion stresses (1) establishing a baseline on
senior leaders to clearly identify the good, current performance, (2) posting data as
bad, and ugly events for the board without close to production as possible (e.g., daily
rationalizations and language that “explains or weekly), and (3) making sure that even
away suboptimal performance.” suboptimal performance is posted when it
■■ Don’t hide behind your lawyer. Dr. R
­ einertsen occurs rather than calculating averages that
stresses that too many healthcare organi- meld the anomalies into large aggregations
zations’ lawyers “require that safety data and make them invisible.
discussions occur behind a thick veil in ■■ Shape up! The final recommendation on
apparent belief that they are protecting how to go naked is based on a phrase from
the institution from legal risks” (p. 42). He Tapscott and Ticoll (2003), “If you’re going
suggests having a dialogue on this stance is to be naked, it’s good to be buff.” Basically
critical to an organization’s orientation to this refers to the fact that patients do not
transparency. He also points out that as one seem to make choices about healthcare
health system lawyer told him, “Patients don’t providers (doctors or hospitals) based on
learn they got hurt by seeing a PowerPoint publicly reported data, even when these
slide. They don’t sue us because of our data. data are available. But when staff sees data
The Growing Demand for Transparency 9

that show their department or unit is not demonstrates transparency with its data while
performing well they do take notice and other hospitals hold back, this will undoubtedly
start to ask how they can shape up. Because send a message to patients, political leaders, and
staff and managers are frequently engaged groups that commission healthcare services
in the details of day-to-day work, they that at least one provider is not hiding behind
frequently do not step back and ask, “How its lawyers or being secretive. The transparent
are we performing over time?” Caregivers hospitals will benefit from increased patient and
struggle with the N = 1 challenge; that is, public trust, which is especially valuable in this
they think of the individual patient, surgery, time of growing mistrust of healthcare providers.
or procedure and generally do not look at how Reinertsen’s recommendation on “how to
their systems and processes are performing go naked” provides a starting point for dialogue
for groups of patients over time. This would on where the organization stands with respect
be similar to a person who wants to get in to transparency. To dive even further into the
shape and lose weight saying, “I ate only a specifics of how transparency will play out in
salad yesterday and some fruit, so why do your organization, I suggest that you address
I weigh the same today?” The question of three critical questions:
getting into shape is not answered by the ■■ Transparency of what?
detailed analysis of one meal or even one ■■ Transparency for whom?
exercise session or the average hand hygiene ■■ Transparency at what level?
compliance for the year. The answer lies in
being able to track performance over time, I developed the Transparency Assessment
not at a single point in time. Improvement is Tool (TABLE 1-1) as a way to help build a dialogue
more like running a marathon rather than around this topic. There is no right or wrong
engaging in a sprint. answer to each of the 15 questions in this as-
sessment. Without such a dialogue, a provider of
Some organizations have started to adopt healthcare services will be placed in a defensive
some of these suggestions for “going naked.” posture when asked to reveal the quality of the
For example, in an effort to promote price care they offer or their results. The time to think
transparency Steven Sonenreich, president and about transparency is not when the Channel 5
chief executive officer of Mount Sinai Medical TV mobile unit parks outside your hospital or
Center in Miami Beach, Florida, made a promise clinic, cranks up their satellite dish, and asks to
during an interview with a local radio station “talk with someone in charge” about why your
to post the contract rates Mount Sinai Medical data placed you in the bottom decile of a recent
Center pays private payers for diagnoses and rating and ranking of providers.
treatments. Sonenreich then went a step further In order to gain a clear understanding of the
by challenging all other hospitals in the Miami full range of opinions on transparency within
area to do the same. Sonenreich gained national your organization, the following questions need
attention for his candid promise of transparency to be addressed:
and his challenge to the other hospitals (Gamble, ■■ Do you know your data better than anyone
2013). This example illustrates what can be done else?
to change the status quo and move toward greater ■■ Do you use data that are made available
transparency not only of pricing structures but to the public to identify opportunities for
also results and processes. Hospitals that make improvement?
public promises, set measurable goals, and ex- ■■ Or, do you look for ways to deny the public
ecute strategies to share price information and released data and develop rationalizations
results can use this transparency as a competi- as to why you think you are actually better
tive advantage. If one hospital in a community than the reported outcomes?
10

TABLE 1-1 Transparency Assessment Tool


Strongly Strongly
Level Frequency of Transparency Agree Agree Not Sure Disagree Disagree

1. Greater transparency is needed across all healthcare settings


and providers.

2. Patients should be able to compare hospitals as easily as they


do cars and other products.

3. Results on hospital outcomes (mortality, infections, falls,


med errors, etc.) should be made public once a year.

4. Results on hospital outcomes (mortality, infections, falls,


med errors, etc.) should be made public twice a year.
Chapter 1 Setting the Context for Measurement

5. Results on hospital outcomes (mortality, infections, falls,


med errors, etc.) should be made public four times a year.

6. Results on groups of doctors (surgeons, GPs, intensivists,


dentist, etc.) should be made public once a year.

7. Results on individual doctors should be made public


once a year.

8. All clinical outcomes on hospital performance should be


made available to the public.
TABLE 1-1 Transparency Assessment Tool (continued)

Strongly Strongly
Level Frequency of Transparency Agree Agree Not Sure Disagree Disagree

9. Operational outcomes on hospital performance (wait times,


referral times, access) should be made available to the public.

10. Patient satisfaction results for each hospital should be made


available to the public.

11. Financial results (including salaries) for each hospital should be


made available to the public.

12. Mortality rates for individual surgeons should be made


available to the public.

13. Infection rates for individual physicians should be made


available to the public.

14. Errors and harm rates for individual physicians should be made
available to the public.

15. Salaries of individual physicians should be made available to


the public.

©2017 R Lloyd All Rights Reserved. Reprinted by permission of Robert Lloyd.


The Growing Demand for Transparency
11
12 Chapter 1 Setting the Context for Measurement

■■ Do you share all your results openly with staff? justify or rationalize why they end up where
■■ Do you share all your results with patients, they do in the rating or ranking system. Life is
family members, and caregivers? full of choices. The choices you make over the
■■ If not, why not? coming year related to transparency of your
The specific steps I recommend in using the performance will have a major impact on your
Transparency Assessment Tool include: organization’s future.
■■ Distribute the assessment tool to the organi-
zation’s leadership team. This could include
the governing board and senior and middle ▸▸ The Growing Focus on
management groups.
■■ Ask each person to complete the tool Patient-Centered Care
without talking to other members of the
leadership team. and Service
■■ Have the completed assessments tabulated The second major development that has had
to show the spread of responses for each and will continue to have a dramatic impact on
question and distribute the summary back the role of measurement in healthcare settings
to the participants for review and reflection. is the growing concern by both consumers and
A graphic summary rather than a tabular political leaders that healthcare professionals
or numeric summary is a preferable way to not only do not listen very well but in many
report the results. instances act as if they just do not care about
■■ Convene a leadership meeting to discuss those they serve. This is a stark criticism for an
the results and the variation within and industry that supposedly is designed to “care for
among each of the questions. people” and a far cry from what Francis Peabody
■■ Finally, create a work group to begin drafting was teaching back in the early 1900s about the
an organizational position on transparency “care of the patient.”
and how the organization will respond to I am always intrigued at the response I get
the specific types and levels of transparency when I ask participants in one of my workshops,
referenced in the assessment tool. “Why did you get into health care?” The most
In summary, the work that Nightingale, frequent response is something along the lines
Codman, and Peabody started years ago con- of, “Because I want to help people and make a
tinues today. This issue will not be going away difference.” Although this is a fairly consistent
anytime soon and will only grow in intensity over response, irrespective of the country in which
the next 5 years. Organizations certainly have a I ask it, it is interesting to follow up on such re-
choice in building a philosophy and strategy for sponses by unobtrusively walking the halls of a
dealing with the growing pressures for a more hospital, clinic, or nursing home and observing
transparent healthcare industry. Some will meet the interactions of staff and patients to gain a
this challenge and approach it from a proactive sense of the quality of these interactions. Even
perspective. They will build measurement sys- though most healthcare professionals will say
tems that enable them to understand their data that the patient and family are at the center of
better than anyone else and share it internally what we do, the casual observer walking around
and externally before they are forced to do so. the hospital will probably conclude that there
Others will sit back and “hope” that their data seems to be a conflict between what healthcare
will at least meet standard expectations or that professionals frequently tell you about their rea-
their performance is more or less “about average.” sons for entering the profession and the degree
They will wait till data are released on them by to which this intention is demonstrated moment
external bodies and then circle the wagons to to moment and patient to patient.
The Growing Focus on Patient-Centered Care and Service 13

All too often, the patient is not really placed healthcare delivery. One of the early leaders in
at the center of the healthcare universe. Instead, this area has been a rather controversial physician
we frequently place ourselves in that center by the name of Charles Inlander. Inlander, with
and expect the patient and family to rotate backing from Robert Rodale, board chairman
around the structures and processes that meet of Rodale Press and publisher of Prevention
our wants and needs. For example, we tell a magazine, became president of the People’s
patient, “Oh, I’m sorry, we are open here at the Medical Society (PMS) in 1983. Inlander, Levin,
outpatient testing and therapy center only from and Weiner then wrote Medicine on Trial in
9 a.m. until 4 p.m. Monday through Friday. We 1988. This book was not merely a criticism of
are closed from noon to 1 p.m. for lunch. No, I the medical status quo; it was a direct attack
am sorry, we are not open during the evenings on the very foundations of medical training,
or on the weekends.” If we really placed the practice, and the business of medicine. The
patient at the center of our universe, we would foreword to Medicine on Trial states:
offer hours that accommodate patient needs
and schedules. Just imagine if restaurants, It is a catalog of ineptitude, malfeasance,
movie theaters, or shopping malls took this gross neglect, indifference, and incredible
approach and were open only during times arrogance. The indictment is not ours.
that fit the workers’ schedules. They would be It is an unabridged exposé of medical
out of business very quickly. Customers would mistakes taken directly from the annals
walk away from these businesses and look for of medicine and public research. No
those that accommodated their requirements. one can count on reform from within
It is actually rather surprising that it has the medical establishment. There are
taken the consumers of health care so long to simply too many vested interests in
become concerned about the lack of listening in the system trying to protect the status
health care and start to take action. Historically, quo. (p. 11–12)
the healthcare profession has been characterized
as a profession that tells patients what they want, The authors conclude, “It is amply clear
need, or can expect rather than asking them for that the medical care system is not capable of
their opinions or desires. The standard approach significant and sustained efforts to improve the
historically has been that healthcare providers quality of its services. It simply has too much at
do things to people or for people rather than with stake in preserving benefits to its own members.
people. Francis Peabody argued that engagement Their proposal for improvement lies with the
with the patient was essential. But it has taken people who are the recipients of care” (p. 15).
decades to chip away at the old paradigm that They write, “The people who should—and, indeed,
places the providers of care at the center of the must—alter the practice of medicine are the
universe. people it is practiced on” (p. 19). The remainder
A variety of books, reports, and consumer of the book provides detailed examples of how
organizations supported by the growing push the healthcare system has failed those it serves.
for transparency and the availability of data and Chapter 12 of this book has an intriguing
information on the Internet, all have set the stage title: “What Did He Say? What Did I Hear?
for creating greater involvement of the patient What Did We Do?” In this chapter, the authors
and family in making healthcare decisions. It cite case after case of how providers (especially
has also forced healthcare providers to rethink physicians) have demonstrated a total lack of
how they measure the voice of the customer listening to patients. They cite a study at UCLA’s
(VOC). Patient advocacy groups, in particular, Cancer Rehabilitation Project, for example, that
have been very instrumental in driving the concluded, “9 out of 10 physicians had never
emergence of a more participatory model of received any formal training in how to disclose
14 Chapter 1 Setting the Context for Measurement

a cancer diagnosis to an afflicted patient” the serious flaws and fallacies in contemporary
(p. 192). Although many of the clinical recom- medicine. Through many examples, quotations,
mendations of the PMS have been called into and case studies, Millenson traces the growing
question for promoting “unscientific methods” role of data and outcomes research in health care.
(Barrett, 2011), Inlander et al. at the PMS did He concludes that “the largest barriers to system-
contribute to rejuvenating the early messages atically measuring and improving the quality of
of Codman and Peabody about the role of the American medicine are not technical but cultural”
patient in medical practice. (p. xvi). He argues that the healthcare profession
In 1988, a second book was released that lacks the will to objectively measure and monitor
served as a complement to Medicine on Trial. the quality of what it produces and that it is only
Taking Charge of Your Medical Fate (Horowitz, a matter of time before the consumers become
1988) provided a roadmap to guide the patient more engaged in changing the profession.
through the healthcare maze. This book actually One of the more consistent advocates for
received more recognition than Medicine on patient involvement over the past 20 years has
Trial especially when Senator Edward Kennedy been Consumers Union and their related pub-
claimed that “the Horowitz method can save lication Consumer Reports. Starting in 1996, for
your life—the way it saved my son’s.” The au- example, Consumer Reports began publishing
thor, Lawrence Horowitz, MD, was director of ratings and rankings on health maintenance
the U.S. Senate Subcommittee on Health and is organizations (HMOs). In January 2003, Con-
considered a very astute observer of the healthcare sumer Reports added a new dimension to its
industry. He meticulously laid out the steps for focus on health care by leading with the cover
patients to take control of their own healthcare page headline, “How Safe Is Your Hospital?” In
decisions and how to behave when interacting 2013, Consumer Reports and AARP both started
with healthcare professionals. Despite Horowitz’s publishing what they consider to be the “top”
efforts, however, consumers of the late 1980s hospitals in America.
did not take charge of their own medical fate. The publications that probably have had the
It was not until the early 1990s that the most impact worldwide on patient involvement
consumer movement in health care actually as well as quality and safety have been those
started to gain serious momentum. Television released by the Institute of Medicine (IOM).
talk shows, magazines (e.g., on January 22, 1996, In 1999, the IOM released To Err Is Human:
Time magazine had the following cover page: Building a Safer Health System, which addressed
“Special Investigation—What Your Doctor Can’t primarily the fact that health care is essentially
Tell You”), investigative reports (e.g., the TV show an unsafe enterprise. The chair of the committee
60 Minutes), and the Internet all contributed to that produced the report, William Richardson,
the growing concern that healthcare providers states in his preface: “This report describes a
really did not listen or pay much attention to serious concern (i.e., medical error) in health-
those they serve. care that, if discussed at all, is discussed only
A third book that made a significant contri- behind closed doors” (p. vii). After outlining
bution to the role of consumers and measurement the current state of medical error in health care,
in health care is Demanding Medical Excellence: offering estimates of the number of patients
Doctors and Accountability in the Information Age experiencing needless death and harm while
by Michael Millenson (1999). This book picks up under care, and explaining error theory, the
where the works of Inlander et al. and Horowitz report provides a series of recommendations on
left off. Millenson provides a detailed account ways to improve the healthcare system, several
of the major historical and political events that of which address measurement and transpar-
have shaped the U.S. debate over cost and qual- ency issues. In the second IOM report, Crossing
ity. He then proceeds to offer a critical review of the Quality Chasm: A New Health System for the
The Growing Focus on Patient-Centered Care and Service 15

21st Century (2001), the focus was on how the The Growing Role of Quality
healthcare system can be redesigned to promote
innovation and improve care. Redesign refers to Improvement Concepts, Tools,
“a new perspective on the purpose and aims of and Methods
the healthcare system, how patients and their
physicians should relate, and how care processes There is no doubt that the healthcare industry
can be designed to optimize responsiveness to is under tremendous pressure to demonstrate
patient needs” (p. ix). The report proposes six that it can transform itself. We have responded
aims for healthcare improvement. These aims extremely well in many arenas. For example,
are aligned with key dimensions in which today’s the technological advances in medicine have
healthcare system “functions at far lower levels been dramatic.7 The industry has also been very
than it can and should (p. 5). The six IOM aims creative in developing a variety of outpatient
are that health care should be: clinical and support services (e.g., mobile dental
clinics, home care services for individuals with
■■ Safe special needs, and collaborative initiatives that
■■ Effective have brought together healthcare providers as
■■ Patient centered well as community groups to improve population
■■ Timely health). What has not occurred, in my opinion,
■■ Efficient is the full-scale adoption and diffusion (Rogers,
■■ Equitable 2003) of the concepts, tools, and methods of
Although all six aims ultimately are directed QI. The same approaches to QI that have made
toward the patient’s healthcare experience and many manufacturing companies very successful
outcomes, the third aim explicitly addresses and known throughout the world are not being
the patient-centered issue. The IOM defines fully embraced by the health, social services,
patient-centered care as “providing care that is and educational industries. There are numerous
respectful of and responsive to individual patient health and social service organizations that
preferences, needs, and values and ensuring can be singled out for their adoption of QI
that patient values guide all clinical decisions” strategies, concepts, and applications. But the
(2001, p. 6). Inclusion of this explicit aim on widespread adherence to quality principles and
patient-centered care provided a much needed constancy of purpose for QI has been spotty at
stimulus for rekindling the vision that Francis best (Deming, 1992).
Peabody had for the healthcare profession.6 Over the years a number of national and
The IOM reports raised the consciousness international organizations have worked tire-
of many healthcare leaders to realize that poor lessly to spread the word about QI and how it
quality, unsafe processes, and indifference to can contribute to long-term survival, increase
patients and their situations were unacceptable. profits, and add value for customers. Probably,
The reports moved the dialogues about trans- the longest standing of these organizations is the
parency, service excellence, and measurement American Society for Quality (ASQ). Founded
to the forefront of healthcare professionals’ in 1946, ASQ has been leading the application
minds. They also provided the basis for the of QI thinking throughout the world and across
third key development within the healthcare industries. With 25 different topic and industry
industry that has had a profound impact on divisions in over 140 counties and nearly 300,000
moving healthcare measurement and data members worldwide ASQ can rightfully claim
collection in new directions, namely, making that they serve as the “Global Voice of Quality.”
QI concepts, tools, and methods part of the In the 1980s, ASQ began expanding beyond
operating strategy of many healthcare providers manufacturing and working to get quality
around the world. thinking into education, service industries, and
16 Chapter 1 Setting the Context for Measurement

health care. Although the healthcare division is The JC was officially established as a not-for-
one of the smaller subgroups within ASQ, it has profit organization in 1951. It emerged as a
become a viable group of professionals who are result of a collaborative effort by the American
keen to learn from other industries.8 College of Physicians, the American Hospital
Within the healthcare field the IHI has been Association (AHA), the AMA and the Cana-
a leading innovator in health and healthcare dian Medical Association who joined with the
­improvement worldwide. For more than 28 years, American College of Surgeons as corporate
the IHI has partnered with national and local members to create the JC on Accreditation of
leaders as well as frontline practitioners to im- Hospitals (JCAH) based in Chicago, Illinois.
prove the health of individuals and populations. At the time of its creation, the JCAH’s primary
The IHI work is focused on five key areas to purpose was to provide voluntary accreditation
advance its mission: (1) building capability for and standardization of practice grounded in
improvement within the healthcare professions; the “end results” thinking of Dr. Codman.
(2) enhancing person and family-centered care; Over the years, the JCAH changed its name to
(3) making patient safety more reliable; (4) ad- reflect a broadening of its scope and coverage.
dressing issues of quality, cost, and value; and Although one of many accrediting bodies,
(5) advancing the triple aim for populations the JC is a leading force in promoting quality
(i.e., simultaneously enhancing the health of and safety across a very broad spectrum of
populations, reducing costs, and improving the healthcare services. In the United States alone,
quality and service components of the hospital it accredits over 21,000 healthcare organiza-
experience).9 tions and related healthcare services. JCI is
Another influential leader in promoting QI currently accrediting healthcare organizations
within the healthcare field, especially through in Asia, Europe, the Middle East, Africa, and
quality measurement, has been the National South America. To learn more about the JC, its
Quality Forum (NQF). The NQF was created history, and work around the world visit their
in 1999 in response to recommendations from website at https://www.jointcommission.org
the President’s Advisory Commission on Con- /about_us/history.aspx.
sumer Protection and Quality in the Health Care In addition to the numerous organizations
Industry. The commission recommended that a that focus on the concepts and methods of
forum for healthcare quality measurement and QI, there are many professional organizations
reporting should be established to (1) develop that are oriented toward supporting specific
a plan for implementing quality measurement, interest groups (e.g., the AMA, the AHA, the
data collection, and reporting standards through- Voluntary Hospital Association, the American
out the healthcare community; (2) establish Nurses ­Association, the Veterans Hospital
measurement priorities focused on national ­Association). If you drop down one more level,
aims for QI; (3) endorse quality measures and you start to discover a myriad of organizations
standardized methods for measurement and and groups that advocate for various clinical
reporting; (4) ensure the public has access to and medical specialties and subspecialties
quality measurement and performance data; (e.g., the Society for Thoracic Surgery, the
and (5) support the development of health American Osteopathic Association, the Royal
information technology systems to advance College of Surgeons [United Kingdom], the
measurement efforts.10 Royal College of Surgeons of Edinburgh, the
Finally, it should be noted that the JC and Pharmacists’ Association of Saskatchewan).
its affiliate Joint Commission International (JCI) My point in providing this brief historical
have a long history of establishing standards review is to demonstrate that today there are
for quality and performance that can be traced many groups and organizations supporting,
back to the pioneering work of Dr. Codman. endorsing, and sponsoring QI meetings,
The Growing Focus on Patient-Centered Care and Service 17

conferences, publications, and programs for QI, I frequently start the class by asking everyone
healthcare professionals. Quality has become to take a sticky note and complete the following
a very popular phrase from the boardroom statement: “Quality is . . .” and let them fill in the
to the frontline staff. Who does not want to blank. I then take all the notes and place them
support quality? In many ways, quality has on a flipchart so that the participants can review
become a very popular buzzword that lacks a them during a break. BOX 1-1 provides a sum-
consistent and universally accepted definition. mary of some of the definitions I have received
Part of the challenge in defining quality is that over the years. As you review these comments
everyone is essentially an expert in quality. We you will notice that people participating in an
have all experienced good, bad, and downright introductory class on quality have keen insights
ugly quality. We may not be able to define it or into the nature of quality.
articulate the criteria precisely, but the general Although numerous formal definitions of
opinion is that “I know it when I see it or experi- quality have been offered (see Schultz, 1994),
ence it.” When I teach introductory workshops in I favor the simple yet straightforward perspective

BOX 1-1 Definitions of quality

Quality is. . .
■■ a combination of value and outcome in the eyes of the consumer
■■ a process with minimal opportunities for improvement
■■ a product or service delivered with 100% satisfaction the first time, every time
■■ a product or service that provides an expected value
■■ a product that lasts, for the best price
■■ a very good product—one you would want again
■■ above-standard results
■■ accountability
■■ an excellent product delivered by professional, friendly, knowledgeable people in a timely manner
at the appropriate time
■■ an unending struggle for excellence
■■ anticipation and fulfillment of needs
■■ attention to detail, timeliness, competence
■■ compassion
■■ completing a job in an accurate, efficient, and timely manner
■■ customer-focused service at a reasonable price
■■ data driven
■■ difficult to define
■■ doing the job right the first time
■■ going above and beyond what is expected
■■ listening/responding
■■ listening to feedback and, it is hoped, making changes to meet customer needs
■■ exceeding expectations
■■ making others feel important
■■ meeting and exceeding customer needs
■■ meeting our customer/patient needs in a cost-effective manner
■■ providing a product/service marketed to your customer above and beyond his or her expectations
■■ providing the best we can to our customer through kind and understanding dealings

(continues)
18 Chapter 1 Setting the Context for Measurement

BOX 1-1 Definitions of quality (continued)

■■ striving for excellence


■■ superior performance
■■ taking care of customers’ needs
■■ the best possible service; the most durable goods; the presence of a program for customer-focused
education and service recovery when expectations fall short
■■ the high human motivation to perform their job at their best
■■ the highest form of any service with satisfaction reflected in outcome that is measurable (I don’t
agree with this strike out) timely, optimal service that provides the best outcome and satisfaction
for the receiver of the service
Reprinted by permission of Robert Lloyd.

that Dr. Deming presented. He basically said three fundamental activities that form the
that he refused to define quality in a few words foundation of QI:
or a sentence. What he did say was that “Quality ■■ Listening to the VOC
begins with intent, which is fixed by management” ■■ Listening to the voice of the process (VOP)
(Deming, 1992, p. 5). Dr. Deming based his entire ■■ Using statistical process control (SPC)
approach to QI on the assumption that quality methods (i.e., using data to make decisions)
has no meaning without listening to the VOC. He
stated that “Quality can be defined only in terms Organizations that clearly demonstrate
of the agent” (i.e., the customer or end user of a quality and excellence (e.g., the Baldrige winners
process). “Who is the judge of quality?” he would or organizations that have won state or inter-
often ask in his 4-day seminars. He clarified his national quality awards) are able to skillfully
position with the following statement: blend all three activities together. A singular
focus on one or even two of these activities
The difficulty in defining quality is to is not going to achieve QI. It is when all three
translate future needs of the user into activities are combined simultaneously and
measurable characteristics, so that a on a daily basis that quality, as envisioned by
product can be designed and turned Deming and his contemporaries, will be real-
out to give satisfaction at a price that ized. FIGURE 1-2 depicts the interconnectivity
the user will pay. This is not easy, and as of these three activities.
soon as one feels fairly successful in the When considering the implementation of QI,
endeavour, he finds that the needs of the it must be remembered that it is a never-ending
consumer have changed, competitors process of continuous improvement. Deming
have moved in, there are new materials
to work with, some better than the old
ones, some worse; some cheaper than Using SPC methods
the old ones, some dearer. The quality
of any product or service has many
CQI
scales. (Deming, 1992, p. 169)
Listening to Listening to
the VOP the VOC
When you step back from all the theoretical
and philosophical underpinnings offered by the
recognized experts in quality (e.g., in Schultz, FIGURE 1-2 Three key activities of quality
1994), they all discuss, in one way or another, improvement
The Quality Funnel 19

made this point abundantly clear in his writings ■■ Plans for testing, implementing, ­sustaining
and seminars. “If I have to define it,” Deming and spreading improvement (Langley et al.,
would say in his 4-day seminars, “it would be 2009)
meeting and exceeding the customer’s needs and These are critical aspects of an organiza-
expectations, and then continuing to improve” tion’s quality journey. When they are in place
(Shultz, 1994, p. 47). It is acknowledging that and sustained over time, QI will be part of
past success is no guarantee of future success.
the very fabric of the organization. Anything
Quality does not happen by accident or because less will relegate quality to just a word in the
you want it to, wish it to, or hope that things will organization’s lexicon or a mere focal point of
get better. Remember, hope is not a plan! Qual- a banner or poster.
ity results from the deliberate and intentional
actions of individuals within an organization.
Quality is not a program or a single project,
nor the responsibility of one individual (e.g., ▸▸ The Quality Funnel
the ­director of quality) or those assigned to the
One final point about the use of the word
quality department (Lloyd, 2016). In short, quality
“quality” especially as it relates to the delivery
is a way of thinking about work, approaching
of healthcare services. As I work with teams
its improvement, and getting everyone involved.
and organizations to help them develop
Quality is about achieving and sustaining
capacity and capability for improvement I
excellence—nothing less.
usually ask, at an early stage of our work
If quality is viewed as something that has
together, “What aspect of quality do you
to be done, “in addition to everything else I
want to improve?” Quite often I get quizzical
have to do,” then the organization will never
looks and interesting responses. I point out
understand quality, be able to achieve it, or
that quality is not a unidimensional concept
demonstrate excellence over time.11 The essential
and that there are three basic modifiers that
ingredients, therefore, that enable an organization
historically have been associated with this
to achieve quality and excellence and sustain it
word: quality assurance (QA), quality control
over time include:
(QC), and QI. After making this distinction,
■■ A commitment to quality starting with the most healthcare professionals acknowledge the
board and senior management three approaches but they do not see how they
■■ A strategically defined role for quality relate to each other. Some even point out that
throughout the organization they are separate and distinct and do not have
■■ A model or framework upon which to build any connections. This is when I show them the
QI strategies quality funnel (FIGURE 1-3). The characteristics
■■ A plan for deploying quality thinking and of the three approaches to defining quality are
application throughout the entire organization summarized in this figure.
■■ QI education and professional development The three aspects are shown in a funnel
at all levels of the organization including because they do have close connections. In
the board, senior leaders, middle managers, health care, there is often almost a singular
and frontline staff focus on QA. QA has also been a central ap-
■■ A measurement philosophy and the use of proach to the delivery of healthcare services
SPC methods for a longer period of time than either QC or
■■ Strategies to determine when a process or QI, which in my opinion has created many of
system needs to be improved (e.g., reduc- the challenges we face in our industry today.
ing variation) or redesigned because it is The problem is that QA should be done only
fundamentally broken when there is such a high level of QC that
20 Chapter 1 Setting the Context for Measurement

Quality Control is a process by


which procedures and methods are
established to review and
Control Assurance
standardize the reliability and Quality Assurance is any
quality of all factors involved in the systematic process of checking
production of products or services. or auditing periodically to see if
a product or service being
developed is meeting specified
Improvement requirements, targets, or goals.

Quality Improvement is the combined and unceasing


efforts of everyone (e.g., healthcare professionals,
patients and their families, researches, payers, planners,
and educators) to make the changes that will lead to
better patient outcomes (e.g., health), better system
performance (e.g., care), and better professional Quality
development.

FIGURE 1-3 The quality funnel

you can afford to check or audit the process they detect through a QA audit, however, that
or system performance periodically. What I the product is starting to drift away from the
find is that QA, especially point prevalence production specifications or tolerances then
audits, are conducted on healthcare processes they initiate QI strategies and methods. This
and systems that have very low levels of QC approach was laid out very nicely by Joseph
and, therefore, low levels of reliability. Then Juran in his quality trilogy (Juran, 1992).
when the results of the audit are not favorable Juran’s quality trilogy has served as a standard
the organization’s management team becomes for decades and describes clearly the critical
very concerned and wants to know why there roles of quality planning, QC, and QI.
is poor performance. Well, the answer is easy In health care, we seem to have this se-
and should be obvious, but it is frequently quence backwards. We engage heavily in QA
not acknowledged or discussed. Namely, if and apply it to processes or systems that have
a process does not have high levels of QC, very low levels of QC to begin with. Then we
defective products and services will naturally become surprised when the QA audits reveal
be produced. The manufacturing industry that either internal or external expectations
knows this principle very well. Manufactur- or targets are not being met. The result is that
ers of everything from cars to computers to more QA is put into place when in fact what
ballpoint pens or processed lunchmeat first is needed is better QC. Even after a QA audit
establish systems for QC before going ahead reveals poor performance, applying principles
with ongoing production of the item. When and methods of QI to the process frequently
they have a high degree of control of the pro- does not produce the desired results. This is
duction of the product (e.g., they are confident because the QI approach is being applied to
that 99% of the time the product is produced a process that is most likely not stable and
exactly as it was designed or planned), then therefore not predictable. Trying to improve
they can periodically conduct audits on a a process that is not in a state of statistical
sample of the product to check for consistency control will only make things worse. So, as
and reliability. QA is not the first thing they you begin your quality journey, make sure you
do. This is done only after they have put in have agreement on what approach to quality
place reliable QC standards and procedures. If is guiding your journey.
Notes 21

Notes and its providers to a perspective that


demands data as proof of the quality,
1. Regardless of the industry in question, bad safety, competency, and value represents
outcomes, unethical behavior, or scandal a worldwide concern. It does not matter
rapidly lead to increased scrutiny and the how healthcare services are structured,
demand for more data to discover “what organized, delivered, or financed (e.g., em-
is going on.” A classic example in the ployer-based insurance plans as we have in
healthcare industry can be found in the United States or national health systems
the Mid Staffordshire NHS Foundation as we find throughout a majority of the
Trust scandal (http://www.midstaffs rest of the world). All countries today are
publicinquiry.com/home). From January concerned about the value being delivered
2005 through March 2009, hundreds by its healthcare system and the money
of patients were routinely neglected being spent on that delivery of services in
and many have been classified as dying light of the results being tracked. There is
needless while in the Trust’s care. One of no such thing as “free health care” despite
the conclusions in the Francis Report as what some people think or say. Somebody
it is generally known is that the Trust’s pays for healthcare services whether it is
leaders were so preoccupied with cost through taxes to support a national health
cutting, achieving targets, and qualifying system, companies paying for the healthcare
for incentives that they lost sight of Mid benefits of workers, or individuals paying
Staffordshire’s fundamental responsibility out of their own pockets for coverage.
to provide safe care. One of the major Value, quality, safety, access, and service are
outcomes of the inquiry is the increased the driving factors challenging healthcare
demand for greater scrutiny of all the Trust providers today and they are concerns
in England and the need for more data, throughout the world.
so that the scrutiny bodies can track and 3. The AHA DataViewer is a classic example
assess performance of providers. Exam- of this type of administrative data: https://
ples from other industries reveal similar www.ahadataviewer.com/. Other examples
response patterns. For example, one only include the annual reports produced by
has to look at the television evangelist state data commissions in the United
scandals of the 1970s, the damaging claims States (e.g., the PHC4, http://www.phc4
(still disputed by the way) in 1978 that .org/) and the Quality and Efficiency in
the gas tanks of Ford Pintos were more Swedish Health Care reports published
susceptible to fire and explosion than annually (e.g., http://www.socialstyrelsen
the gas tanks of other cars, the Enron .se/publikationer2011/2011-5-18).
financial scandal of 2001, and the ongoing 4. In this text, I am using the term quality
investigation of Catholic priests. All of improvement (QI) as the generic reference
these examples have led to loss of trust to a broad set of terms that have been used
in the product, service, or institution over the years, including total quality man-
under the microscope, a demand for agement (TQM) and continuous quality
greater transparency of what the group improvement (CQI). Other terms that have
does, and a push for “more data.” also been used to refer to improvement in
2. It is important to note that this phenom- various forms are quality control (QC),
enon is not limited to one country, state, performance improvement, performance
region, or province. The move away from management, quality assurance (QA),
an absolute trust in the medical system quality management (QM), quality circles,
22 Chapter 1 Setting the Context for Measurement

quality of care, and clinical QI. If you study started a journey into a world she never
the history of the quality movement, you had thought about. Today, Sorrel and her
will find many terms used to describe and family are patient safety advocates working
define this notion. I would encourage the tirelessly to improve quality and safety in
reader to explore a bit of this history and Josie’s memory. More about the Josie King
become knowledgeable about the different Foundation may be found at http://josiek-
paths that the discipline of quality has ing.org/Home. At the IHI, we too have
taken. It is a fascinating journey that is been heavily engaged in patient-centered
quite enlightening. work for over a decade. It is one of the
5. For more information on hospital charges four IHI strategic aims: (1) Optimize
in the United States, visit the CMS and Healthcare Delivery Systems: Encourage,
Medicaid website: https://www.medicare empower, and enable healthcare delivery
.gov/hospitalcompare/. Similar profiles systems to provide truly value-based
can be obtained for each U.S. state by care that ensures the best healthcare
visiting the state’s healthcare report card outcomes at the lowest costs; (2) Drive
site. The actual titles of state organizations the Triple Aim for Populations: Strive to
responsible for tracking and reporting achieve the Triple Aim, simultaneously
healthcare services vary by state. A search improving the health of the population,
under the general title of “state healthcare enhancing the experience and outcomes
data commissions” or some variant of these of the patient, and reducing per capita
terms, however, will provide a good start- cost of care for the benefit of communi-
ing point. In my home state of Illinois, for ties; (3) Build Improvement Capability:
example, the data may be found at http:// Build improvement capability into every
www.healthcarereportcard.illinois.gov/. organization, healthcare executive, and
6. Since 2001 when Crossing the Quality professional, while driving innovation
Chasm was released a myriad of initiatives to dramatically improve performance at
both public and private have advanced the all levels of the healthcare system; and (4)
role of patients, families, and caregivers in Realize Person—and Family—Centered
making healthcare decisions. Some groups Care: Usher in a new era of partnerships
like AARP, National Quality Foundation, between clinicians and individuals where
the Agency for Healthcare Research and the values, needs, and preferences of the
Quality (AHRQ), and the CMS have individual are honored; the best evi-
developed programs and initiatives dence is applied; and the shared goal is
designed to increase the involvement of optimal functional health. Our focus on
patients and their caregivers in making person-centered care recognizes that one
healthcare decisions. Individuals who have of the reasons that the needs of individuals
experienced either personal harm or the are often overlooked when they enter the
loss of someone close to them through healthcare system is because they are de-
medical error have also been playing fined as patients not as people. Webster’s
major roles. The Josie King Foundation is dictionary (1984, p. 862) defines a patient
one of the early leaders in this area. Josie’s as one under medical treatment or one
Story, written by her mother Sorrel King who suffers. It also offers the alternative
(2009), tells the painfully honest story definition of patient as one who is capable
about how medical errors led to the pre- of bearing affliction calmly or one who
mature death of a vibrant 18-month-old is capable of bearing delay. These too
little girl. It recaps the family’s struggles seem like good definitions of what hap-
to deal with their grief and how Sorrel pens to individuals when they enter the
Notes 23

healthcare system. So IHI is striving to ambulatory surgery center and be back to


get healthcare professionals to realize that work in a day or two. In the 1970s, that same
person-centered care should address all patient would have stayed in the hospital for
the needs of the individual and do so in a at least a week and be off work for several
manner that respects their values, beliefs, more. Cataract surgery is routinely done
and culture. The individual seeking care on an outpatient basis. Many hospitals are
is the primary focus of this IHI aim but it getting patients with total hip replacements
also is encompassing enough to include discharged in 2 days or 3 days max. I had
the individual’s family or caregivers as a total hip replacement several years ago
well as staff, who are often the individuals and was out of the hospital in 2 days after
“bearing affliction and delay” as well. There surgery. In October 2016, my wife had a
is no doubt that progress has been made. total knee replacement procedure that
But we still a have long way to go before was done as outpatient surgery. At Rush
healthcare systems and providers of care St. Luke’s Presbyterian Hospital in Chicago,
emulate the words and actions of Francis they are now performing same-day hip
Peabody. A recent study (Lavizzo-Mourey, replacement surgery. Another techno-
2006) demonstrated that things have not logical advancement has been in the area
changed much since 1988. The study set of micro and robotic surgery. The use of
out to measure how well physicians are the da Vinci robot to repair mitral valves
practicing patient-centered care. Between in the heart, for example, is now common
80% and 90% of physicians surveyed said practice at many hospitals. We have the
they favored patient-centered care, 83% technology to look inside the patient
of the respondents said they supported without penetrating the skin. In the past,
sharing medical records with patients, it was not uncommon to do “exploratory
and 87% responded that they support surgery” to determine whether there was a
team-based care. But upon observation, less problem. Today, we rely on CT scans and
than a quarter of doctors actually practice MRIs as primary noninvasive diagnostic
patient-centered care as defined by the tools. The application of new technology
study’s authors. The authors concluded seems endless. It will continue to change
that the physicians basically don’t “walk not only how we think about medicine
the talk.” I am reminded of an episode but also how it is taught and practiced.
of the TV show House where the ever 8. More on the ASQ may be found on their
so smart and ever so rude Dr. House is website: https://asq.org/about-asq/history.
telling a mother what he plans to do to 9. A more detailed history of the IHI along
her sick daughter. The mother is politely with its current programs, resources,
objecting and asking questions about and initiatives may be found at http://
his plan of care. In his usual brusque www.ihi.org/Pages/default.aspx.
and arrogant manner, he cuts her off 10. More about the NQF may be found at
and turns to leave. She tells him to stop http://www.qualityforum.org/Home.aspx,
and delivers an absolutely marvelous including a listing of the more than 700
line. She says with a very stern face and measures they have been assembling on
pointing a finger in his face, “Look, I healthcare topics, their programs, and
am the mother and you’re the doctor. other resources.
I outrank you. This is what I want for my 11. I actually had a participant in one of my
daughter.” QED! workshops say this in class one day. I
7. For example, today a patient can have was speaking about the issue of making
laparoscopic gallbladder surgery at an quality thinking and practice part of daily
24 Chapter 1 Setting the Context for Measurement

work. This individual was sending body Institute of Medicine. To Err Is Human: Building a Safer
language messages that said “Boy this is Health System. Washington, DC: National Academy
Press, 1999.
really a lame idea” or something similar. Institute of Medicine. Crossing the Quality Chasm: A New
So, I asked her if she would care to make Health System for the 21st Century. Washington, DC:
a comment on my points or if she had a National Academy Press, 2001.
different perspective. She then came out Juran, J. Juran on Quality by Design. New York: Free Press, 1992.
with this classic response, “In addition King S. Josie’s Story. New York: Grove/Atlantic, 2009.
Langley, J., R. Moen, K. Nolan, T. Nolan, C. Norman, and
to everything else I have to do I am now L. Provost. The Improvement Guide: A Practical Approach
expected to work on quality. Where do to Enhancing Organizational Performance. San Francisco,
they think I can find the time to do this?” CA: Jossey-Bass, 2009.
I will never forget it. But before I could Lavizzo-Mourey, R. Remaking American Medicine: Health
respond others in the class challenged her Care for the 21st Century. The Secret of Patient Care.
Adapted from The Malcolm Peterson Honor Lecture,
response. The best comment was from a National Scientific Meeting, Society of General Internal
quiet woman in the back of the class who Medicine, Los Angeles, California, presented April
raised her hand and simply asked, “Well if 28, 2006.
quality is not your job, why are you here?” Lloyd, R. “Improvement Tip: Quality Is Not a Department.”
Institute for Healthcare Improvement website, 2016.
http://www.ihi.org/resources/Pages/Improvement
Stories/ImprovementTipQualityIsNotaDepartment.aspx
References Mallon, B. Ernest Amory Codman: The End Result of a Life
Barrett, S. “The Rise and Fall of the People’s Medical Society in Medicine. Philadelphia: WB Saunders, 2000.
and Charles Inlander.” June 17, 2011. Retrieved from Millenson, M. Demanding Medical Excellence: Doctors and
http://www.quackwatch.com/01QuackeryRelatedTopics Accountability in the Information Age. Chicago: University
/pms.html. of Chicago Press, 1999.
Berwick, D. M. “E. A. Codman and the Rhetoric of Battle: Neuhauser, D. “Ernest Amory Codman, M.D., and End Results
A Commentary.” Milbank Quarterly 67, no. 2 (1989): of Medical Care.” International Journal of Technology
262–267. Assessment in Health Care 6 (1990): 307–325.
Codman, E. A. A Study in Hospital Efficiency (1917 original Nightingale, F. Notes on Matters Affecting Health, Efficiency
private printing). Reprinted. Oak Brook Terrace, IL: and Hospital Administration of the British Army. London:
Joint Commission on Accreditation of Healthcare Harrison and Sons, 1858.
Organizations, 1996. Oglesby, P. The Caring Physician: The Life of Dr. Francis
Codman, E. A. “Committee for Standardization of Hospitals W. Peabody. Cambridge, MA: Countway Library of
[of the American College of Surgeons] Minimum Stan- Medicine, distributed by Harvard University Press, 1991.
dards for Hospitals.” Bulletin of the American College of Peabody, F. “The Care of the Patient.” Journal of the American
Surgeons, 8 (1924): 4. Medical Association 88.8 (March 19, 1927): 877–882,
Deming, W. E. Out of the Crisis. Cambridge, MA: Massa- reprinted in JAMA 252, no. 6 (1984): 813–818.
chusetts Institute of Technology, Center for Advanced Rabin, P. L. and D. Rabin. “The Care of the Patient: Francis
Engineering Study, 1992. Peabody Revisited.” JAMA 252 (1984):819–820.
Donabedian, A. “The End Results of Health Care: Ernest Reinertsen, J. “How to Go Naked.” Leadership (Healthcare
Codman’s Contribution to Quality Assessment and Financial Management Association) (Summer 2012): 42.
Beyond.” Milbank Quarterly 67, no. 2 (1989): 233–256. Rogers, E. M. Diffusion of Innovations. New York: Free
Gamble, M. “10 Things the Most Progressive Hospitals Press, 2003.
Do.” Becker’s Hospital Review July 8, 2013. Retrieved Schultz, L. Profiles in Quality. New York: Quality Resources,
from http://www.beckershospitalreview.com/hospital 1994.
-management-administration/10-things-the-most Swensen, S. J., and D. A. Cortese. “Transparency and the
-progressive-hospitals-do.html. End Result Idea.” Chest 133, no. 1 (2008): 233–235. doi:
Horowitz, L. Taking Charge of Your Medical Fate. New York: 10.1378/chest.07-2101
Random House, 1988. Tapscott, D. and D. Ticoll. The Naked Corporation: How the
Hurst, J. W. “Dr. Francis W. Peabody, We Need You.” Texas Age of Transparency Will Revolutionize Business. New
Heart Institute Journal 38, no. 4 (2011): 327–329. York: Free Press, 2003.
Inlander, C., L. Levin, and E. Weiner. Medicine on Trial. Webster’s II New Riverside University Dictionary. Boston:
New York: Prentice Hall, 1988. Riverside Publishing Company, 1984.
CHAPTER 2
Why Are You Measuring?
does bear a resemblance to many meetings that
▸▸ Connecting the Dots! occur on a regular basis throughout healthcare
settings. A frequent expression that arises in such
We have all sat through meetings where a vari-
meetings is, “You know what we need to do here?
ety of tables, charts, and graphs were presented
We need to connect the dots!” This is a rather
for entirely more minutes than we thought we
popular expression that started me t­hinking.
could ever endure. The presenter drones on in
Why not give people a bunch of dots (data
a monotonic voice describing the results of this
points) and ask them to predict what the image
month compared to last month or this quarter
will be before we connect the dots? FIGURE 2-1A
compared to the same quarter a year ago. You
provides a series of dots. What do you predict
are bored but don’t have the nerve to get up and
will appear when the dots are connected? I have
walk out. So you endure. But you hold on to a
had many creative responses in class. The most
vague hope that some minor crisis will erupt
frequent response is that the image will reveal a
that will cause someone to call or send you a
person riding a motorcycle. Others have seen a
page with an urgent request for you to leave and
bear, a leaping animal, and even a rough image
attend to the pending crisis. Unfortunately, the
of their city, county, or country. Because no one
call never comes. So, you endure the meeting
has been able to accurately predict what the dots
till the end. As you are leaving the meeting
will produce when they are connected, I start
you turn to a colleague and ask, “Did you get
to give them hints by connecting a few dots at a
anything out of that meeting?” She looks at you
time. I show an image with dots 1–25 connected.
and mumbles something that you believe sounds
I ask, “Now what do you see?” After a few more
vaguely like “No, but that was certainly typical
interesting guesses someone usually shouts out
of how we attempt to connect the dots at our
“an arm.” Although this is correct, it does not give
monthly management meetings.” As you walk
much insight into predicting what the remaining
back to your office you realize that you and your
158 dots will produce when connected. By the
colleague have just survived another journey
time I reveal the connections for the next 25 dots
into the “Data Zone”!
people are realizing that the image they are trying
Okay, your Monday morning management
to predict is not a bear. At this point, however,
meeting might not be quite like this one but it

© Michal Steflovic/Shutterstock

25
26 Chapter 2 Why Are You Measuring?

not be correct. The central point is that the dots


36
37 35

46

53
47

52 50
48 33
34
have been connected for you by someone else
so that when you walk out of the meeting you
45
55 54 32
42
38 57 60 51 49
43 61 31
41 56 59
44 62 63
39 78 58 65 30

leave with an image and a conclusion. How


79 64
40 80 66 24 23
67
86 85 81
69 29
68
84 70 25 22
77 72 71 28
73

many times have you been in a meeting where


74 21
1 83 75
82 76 7
3 27
87 26
5 8

someone connected the data dots for you?


2
4
88 178 9 20
176 175
6 19
89 177 173 18
180 171
174 183 17

When this happens do you raise your hand


179 172 168
10 16
170 167
182 169
90
181 12 15 155
166 11

and ask whether there is another way to look


91 165 154
162 14
97 164
13 156
92 94 163
98 153

at the data? Do you ask whether the presenter


160
96 161
99 93 134
100
158
95 157 152

is merely describing what has happened in the


101 159
145
102 121 141
140

103 122 151


133 143

past as opposed to predicting what we might


135
105 146
139 142
107 120 144
106 150
119 123 136
127

expect the process to produce in the future? Do


124 132 137 138 147
108 149
118 130 148
131
125 126 128
109 129
104

110 111
117 you politely ask whether there are alternative
ways to analyze the data instead of showing
112

113

two bar graphs and calculating a percentage


116

114 115

FIGURE 2-1A Connecting the dots (What image do change from Time 1 compared to Time 2? In
you predict will emerge?) short, can you predict what the dots are trying
to tell you before they are connected? This is a
key question for anyone interested in quality
they have narrowed down the range of possible improvement (QI).
outcomes but they still cannot predict what the Data should be used to help you predict where
dots will yield when connected. As I continue to you will be going in the future. Customers are
connect more of the dots the class quickly comes not concerned about the average wait time last
to realize what the image is (see the End Notes of month or last quarter. They want to know why
this chapter for the answer if you have not figured they are not being served now. A mother waiting
it out by now).1 They have now connected the dots in the emergency area, for example, wants to know
and have produced an image of what the data when her febrile child will be seen by a doctor.
points were hiding. So, the next image I show She does not care one iota about the average
the class is Figure 2-1A again where we have all wait time in the emergency area last week, last
183 dots but no lines. Of course they all see the month, or last year. Similarly, a physician waiting
image produced by the dots even though none for a STAT (immediate) troponin result takes no
of the dots are connected by lines. I then point comfort in being told that the average turnaround
out that once the image has been implanted in time (TAT) last week was statistically significant
their brains it is hard not to see the final pattern from the average TAT last month and that the
created by the connected dots. standard deviation went from 10.3 minutes to
This simple example illustrates precisely 9.4 minutes. In this case, the physician would
what happens when people enter the “Data have every right to voice concern.
Zone.” Data are presented, and the individual The level of quality for a particular product
making the presentation explains what the or service is determined by understanding cur-
dots are showing. In doing so, he or she is rent performance and predicting the future not
connecting the dots for you and in this pro- by how the process or system performed in the
cess is imputing meaning to the dots that then past. If you need an everyday example of how
sticks in your brain. You walk away thinking, this works all you have to do is ask sports fans
“I guess the outpatient satisfaction scores are about the current performance of their favorite
getting better.” Or, “Wow I didn’t realize that team, especially if that team won a championship
there was an upward trend in the inpatient fall in the past. I live in Chicago. We have a variety
rate.” These impressions, however, may or may of professional sports teams including football,
Connecting the Dots! 27

basketball, baseball, ice hockey, and soccer. Several in front of you. You are completely unaware of
of these teams have won a championship within how much data the pilots are processing and
their respective sport. In 1985, the Chicago the hundreds of dials and gauges they need to
Bears professional football team won their one constantly monitor. They are processing data in
and only Super Bowl championship. Even today real time to make predictions about the future.
fans remember and reflect on the Super Bowl They are not sitting up front in the flight deck
victory and talk with nostalgia about where they having this conversation: “Gee, Bill, what was
were and with whom when the Bears won. But our average fuel consumption per hour when we
that moment in 1985 does not help us predict flew this same route last month?” “I don’t know
the performance of the Bears this season, which about fuel consumption, Anne, but I do have the
was the worst in the history of the franchise. In average altitude and the standard deviation from
2016, a major sporting miracle occurred here in projected altitude by quarter for the last 2 years.”
Chicago. Yes, after 108 years of failed seasons the Because the pilots are concerned about where
Chicago Cubs won the World Series. The city they are right now, the current flying conditions,
went wild with excitement. But now that the and their ability to predict how the rest of the
celebration has ebbed some are already starting flight will go, the aggregated summary statistics
to talk about a repeat of this year’s performance. from the past have little or no value.3 From an
Although hope springs eternal, people who make improvement perspective data are meant to
these types of statements have little understand- predict the future not describe past performance.
ing of a basic principle of human performance: The future can be understood only by analyzing
past success is no guarantee of future success. repeated measures of the variables of interest
Just ask the Bears or the Bulls fans! over time and under a wide range of conditions.
Frequently, statistical methods reinforce Data and subsequently the interpretation
this focus on past performance. The average (or of data require inquiry and dialogue. For ex-
any other descriptive statistic for that matter) ample, when you are sitting in a meeting that
for last month or last quarter provides an aggre- feels like the one described at the beginning of
gated summary statistic that merely describes this chapter you should realize that even though
a single characteristic of past performance.2 you are bored with the lengthy presentation of
Dr. Myron Tribus, retired director of the Center figures and statistics, the presenter is actually
for Advanced Engineering Study at Massachu- connecting the dots for you. When the presenter
setts Institute of Technology and a student of says, “And you can see from this summary
Dr. W. Edwards Deming, had a classic line about ­table of numbers that the average wait time for
the problem with making comparisons based registration, the average wait time to see the
on monthly numbers: “Managing a company doctor, and the standard deviations for each of
by means of the monthly report is like trying these measures have all dropped significantly
to drive a car by watching the yellow line in over the past two quarters,” your bored brain is
the rear-view window” (Wheeler, 1993, p. 4). actually connecting the dots. You walk out of
Historical data by month, quarter, or year can the meeting with an image of what the presenter
be useful in helping you understand where chose to tell you about the dots. You have no
you have been historically but it provides no ability to predict what next quarter’s wait times
basis for determining where you are right now will be but you did make the connection that
(i.e., your baseline) or, more important, how all the measures “dropped significantly over
you will perform in the future. Consider pilots the past two quarters.” The presenter used data
and how they analyze data. You are flying in a that are aggregated and dated to plant a little
Boeing 777 over the Atlantic from Chicago to causal model in your head. The dots have been
London. You are having dinner and watching a connected. Lower wait times are better than
movie on the tiny screen in the back of the seat high wait times and this past quarter was lower
28 Chapter 2 Why Are You Measuring?

than the previous quarter, therefore, we have For Shewhart prediction was the central concept
improved. Right? The correct answer is that you driving quality.
do not know whether things have improved or Dr. W. Edwards Deming (1942, 1950) built
not. All you have is two dots. If I would have on Shewhart’s initial thinking about prediction.
connected only the first two dots in Figure 2-1A In the foreword to Quality Improvement Through
no one would ever have figured out the im- Planned Experimentation, Deming wrote this
age. What about the other 181? Deming had about prediction:
a great line about two data points. In one of
his 4-day seminars that I had the privilege of Why does anyone make a comparison
attending he said, “When you have two data of two methods, two treatments, two
points it is very likely that one will be different processes, or two materials? Why does
from the other.” anyone carry out a test or an experiment?
Being able to predict where a process or a The answer is to predict—to predict
system will go in the future builds much more whether one of the methods or mate-
knowledge than merely describing what has rials tested will in the future, under a
happened in the past. In order to be successful at specified range of conditions, perform
connecting the dots and using the data to make better than the other one. The question
predictions about future performance, however, is, What do the data tells us? How do
two questions need to be addressed: they help us predict? (Moen, Nolan, &
Provost, 1991, p. xiii)
■■ What type of study have you setup and what
action(s) will result from the study? One’s ability to predict, however, depends
■■ Why are you measuring performance? exclusively on the type of study designed and
the actions to be taken as a result of the study.
These two questions are addressed in the Deming provided a foundation for thinking about
remainder of this chapter. the types of studies that could be designed. He
classified studies as being either enumerative or

▸▸ Types of Studies analytic (Deming, 1975). According to Deming


(1975, p. 147), an enumerative study is one in
In 1938, Dr. Walter Shewhart delivered four which “action will be taken on the material in
lectures to the Graduate School of the Depart- the frame being studied.” He defined the frame
ment of Agriculture in Washington, DC, on as “an aggregate of identifiable tangible physical
statistical thinking and quality control (QC). units of some kind, any or all of which may be
In 1939, he expanded on the content of these selected and investigated” (p. 146). A classic
lectures and wrote Statistical Method from the example of an enumerative study is a census
Viewpoint of Quality Control. Shewhart outlined conducted on a country’s population. In this case,
three components of knowledge (Shewhart the frame is the entire population of the country
1939, 85–86): and the objective is to find out how many people
live within the country’s geographic boundaries.
1. The data of experience in which Following this definition, a frame in a healthcare
the process of knowing begins setting might be all inpatients, all hip and knee
2. The prediction in terms of data that replacement patients, a nursing unit, a long-term
one would expect to get if he were care facility, or all attending physicians.
to perform certain experiments in Three key points highlight enumerative
the future. studies:
3. The degree of belief in the predic-
tion based on the original data or 1. The aim of an enumerative study
some summary thereof as evidence. is principally descriptive in nature
Types of Studies 29

(Deming, 1975, p. 147; Wheeler, 1995, is fixed and not moving. One or more random
p. 18). Enumerative studies basically samples drawn from the pond are sufficient to
describe how many or how much and determine the water quality of the entire pond.
are essentially based on historical data. They point out that traditional statistical meth-
The focus of an enumerative study, for ods such as hypothesis testing (i.e., rejecting or
example, is not on explaining why not rejecting the null hypothesis), confidence
there were more males than females intervals, and tests of significance can be used to
receiving a particular medication but make decisions about the quality of the water in
rather on merely quantifying how the pond based on random sampling techniques.
many males and females received the But note that the conclusions and subsequently
particular medication. Prediction of the decisions about the pond in question must be
the future is not possible with enu- restricted to only that pond. If there is a pond of
merative studies. similar size 15 yards away from the tested pond,
2. The actions to be taken or the decisions no conclusions should be made about that pond
to be made as a result of an enumerative even though they are relatively close to each other.
study will be (or should be) directed The writers’ analogy for an analytic study is a
only to the subjects in the frame.4 stream or river that is in constant motion. If you
3. Random sampling methods based are standing along a rapidly moving stream, for
on probability theory and related example, the water quality and the properties in
techniques such as confidence inter- the water right in front of you are gone in a second
vals and tests of significance usually and new water is now in front of you. Does the
provide the statistical approach to water now in front of you have the same properties
enumerative studies. and qualities as the water that just went past a few
seconds ago? Thus water quality determined by
In contrast to the enumerative study is an a random sample of the water at a single point in
analytic study. Deming defined an analytic study as time will not be able to reveal anything about the
one “in which action will be taken on the process quality of the water that will pass in front of you
or causal system that produced the frame studied 10 seconds later. The stream is in constant motion.
with the aim being to improve practice in the It is not fixed or static like the pond. The present
future” (1975, p. 147). A key distinction between condition of the water in the stream will change
an enumerative and analytic study, therefore, is over time. For example, imagine that a chemical
that analytic studies focus on the future perfor- plant discharges its waste into the stream at 2 a.m.
mance of the system or frame being studied, in every other day. If you drew a single random
other words prediction. A fundamental assump- sample of water from the stream at 2 p.m. on the
tion of analytic studies is that the conditions day between when the chemical plant dumps its
that produce an observable outcome today will waste and took this sample back to the laboratory
be different tomorrow and the next day and the for testing you would most likely miss the chemical
next. Enumerative studies, on the other hand, waste that was dumped at an entirely different
look at data that have occurred in a defined point in time. As Wheeler (1995, p. 18) points out,
period of time (e.g., last month, last quarter, or “there is no way to define a random sample of the
even last year) that is fixed or static in nature. future.” Without repeated sampling over time it is
Provost (2011) and Provost and Murray impossible to detect a change in the water quality.
(2011) provide a useful analogy for thinking about So, whereas enumerative studies rely heavily on
this important distinction of the role of time in random sampling methods, analytic studies use
study design. The analogy is to water quality in primarily judgment sampling techniques, both
a pond versus a moving stream or river. They of which will be discussed in greater detail in
compare an enumerative study to a pond that Chapter 4.
30 Chapter 2 Why Are You Measuring?

Similar analogies can be found throughout are the methods and techniques you have been
the healthcare industry. Patients connected to exposed to. In some circles these area referred
telemetry in the intensive care unit (ICU), for to as “traditional statistics.” These include both
example, are monitored in an analytic not an univariate (descriptive) as well as multivariate
enumerative fashion. We connect ICU patients statistical methods, which includes everything
to a variety of electrical leads in the ICU that from tabular analysis (i.e., crosstabs) to various
allow us to track their heartbeats, respiration rate, forms of regression analysis, factor analysis, clus-
and blood pressure in real time and over time. ter analysis, discriminant analysis, and analysis
Why do we do this? Why don’t we just take one of variance. A major reference from the social
random sample of their blood pressure at some science field for these approaches can be found
random point during their 11-day stay in the in the works of Hubert Blalock (1979) and in
ICU and use this one reading to make clinical particular his classic textbook Social Statistics.
decisions about the patient’s progress? It would When conducting analytic studies graphical
save time for the nurses and most likely save the methods of analysis are used most often. This
ICU money. Or maybe we could get their average branch of applied statistics is generally referred
of all blood pressure readings on the first day to as statistical process control (SPC) and was
of admission to the ICU then hold off till they developed by Dr. Walter Shewhart in the early
are discharged 10 days later and take the second 1920s (more about this in Chapter 9). SPC meth-
average of all blood pressure readings on this ods, most notably the run chart and the Shewhart
last day. We could even go further and discover control chart, accommodate repeated sampling
whether the two average blood pressure readings over time that in turn allows the researcher to
between admission and discharge are statistically understand the variation inherent in the process
different by applying a test of significance to and thus predict the future performance of the
determine whether the resulting difference be- process. More advanced statistical methods of
tween admission and discharge is significant at prediction are incorporated in analytic studies
the 0.05 or 0.01 level of significance. If we took by using multifactorial designs and planned ex-
this enumerative approach to patient care we perimentation (Moen Nolan, & Provost, 2012).
would end up causing considerable harm to the Because this book is concerned principally with
patients. Clinical decision making is essentially analytic studies these methods will be discussed
grounded in analytic study designs. We track in detail in the remaining chapters.
patient conditions over time, especially when the
need to know necessitates moment-to-moment
monitoring as in the ICU. The medical model
is clearly more analytic in nature than enumer- ▸▸ Research for Efficacy,
ative. Yet, the distinctions between these two
approaches are typically not typically taught in Efficiency, and
medical, nursing, allied health professions, or
healthcare administration programs.
Effectiveness
One final distinction between enumerative Closely aligned with the type of study you
and analytic studies needs to be reinforced design is a very pragmatic question: Why are
before moving on. Specifically, it is critical to you measuring? This seems like a simple and
realize that each approach employs different straightforward question. Yet many healthcare
statistical methods to arrive at conclusions. professionals do not think about why they are
Enumerative studies rely on probability theory, actually measuring. You may hear managers
descriptive and comparative statistics, and tests or frontline workers, for example, say, “Look,
of significance. If you have ever had to take a we need to submit some data on our progress
basic or even advanced course in statistics, these related to how much time patients have to wait
Research for Efficacy, Efficiency, and Effectiveness 31

before actually seeing the doctor in the family Every time healthcare researchers conduct
practice clinic so find some recent wait time a randomized clinical trial (RCT) to compare
numbers and send them in.” Frequently this the impact of a new drug or a protocol, they
means the data submitted may not be the most are conducting research for efficacy purposes.
recent data, it may not be defined in the same A typical study might involve testing the effect
way it was defined when it was first submitted last of a new blood pressure drug on two groups
year, or it may not be presented in a format that of patients. One group would receive the drug
adequately answers the questions being posed. and be labeled as the experimental group. The
Anyone engaged in performance measurement, other group would receive a placebo and would
therefore, needs to be very clear about their be referred to as the control group. Baseline
reasons for starting the quality measurement blood pressure readings would be obtained on
journey (QMJ). each group, as well as demographic information
Brook, Kamberg, and McGlynn (1996) about their current physiological condition,
provided guidance for helping healthcare family history, and activities of daily living, etc.
professionals think about why they are mea- After a period of time, the two groups would
suring. They differentiated two basic research have their blood pressure levels measured
paths that can help individuals determine the again, and statistical tests would be performed
purpose of their measurement efforts: research to see whether there is a “significant” difference
for efficacy and research for efficiency and between the two groups. The demographic and
effectiveness. The choice of which path is to be other patient characteristics would be used as
followed should be based on the nature of the control variables (i.e., they are used in an effort
questions the researcher is trying to answer.5 to “hold constant” confounding or exogenous
All scientific inquiry begins with questions variables that might play a role in blood pressure
not with statistical tools. variation).
The efficacy road takes the researcher down If we really wanted to increase the precision
the traditional enumerative path of scientific in- of this type of study, we would establish matched
quiry described previously. This path is grounded samples of patients (i.e., we would make every
in experimental and quasiexperimental designs possible effort to have the members of the
(Campbell and Stanley, 1966; Posavac and Carey, experimental and control groups be similar in
1980; Weiss, 1972). In health care, efficacy studies gender, age, race, socioeconomic status, and so
are frequently used to test the ability of a par- on). The participants would be randomly assigned
ticular drug, treatment, procedure, or protocol to the experimental and control groups, and
to improve medical conditions. For example, a then steps would be taken to make the study a
researcher might pose the following questions double-blinded trial. The double-blinded com-
about a new drug: ponent means that neither the participants nor
the researchers know which group gets the drug
■■ Is this drug capable of producing the de- and which one gets the placebo. This is done in
sired effect? order to minimize bias, increase the validity of
■■ Will this drug act differently with different the results, and thereby enhance the researcher’s
types of patients (age, gender, race, etc.)? ability to answer the efficacy question.
■■ Does a 5-mg dose of the drug produce The study design I have just described is a
different results than a 10-mg dose? classic approach to conducting a research study.
■■ What does the literature tell us about the In a design of this type, the goal is to test the null
use of this drug? hypothesis (Ho) that there is no difference between
■■ What does past research reveal about the the experimental and control groups with respect
drug’s application in experimental or qua- to the blood pressure drug. Statistical analysis of
siexperimental trials? the before and after blood pressure readings for
32 Chapter 2 Why Are You Measuring?

the two groups allows the researchers to reject or month or this quarter compared to the previous
not reject the null hypothesis and test for statistical quarter) and asking whether one time period is
significance (Babbie, 1979; Morrison & Henkel, statistically different from the other. Research
1970; Selltiz Jahoda, Deutsch, & Cook, 1959). for efficacy purposes is designed to answer this
Although some traditional research studies type of question.
are based on time series analysis or interrupted Brook et al. (1996) refer to the second
time-series analysis (McDowall McCleary, research pathway as one that leads to insights
Meidinger, & Hay, 1980; Ostrom, 1978), the about efficiency and effectiveness. This is the road
majority of the studies designed to test efficacy that leads to QI research and is closely aligned
are not designed to monitor moment-to-moment with analytic study designs described earlier
fluctuations in an observed process or outcome. in this chapter. Research related to efficiency
Instead, they typically obtain rather large sample and effectiveness is also consistent with what
sizes (e.g., 100 or more observations in two com- Westfall, Mold, and Fagnan (2007) and Khoury
parison groups), let weeks or months transpire et al. (2007) call translational research.
as the research trial is allowed to run its course, Closely linked with the ideas of Shewhart
and then see whether the two comparison groups (1931), Deming (1992), and Juran (1988, 1992),
show a significant statistical difference. The research for efficiency and effectiveness takes
focus of most experimental and quasiexper- a very practical approach that seeks to answer
imental designs, therefore, is on static com- questions about the variation in process or out-
parisons (i.e., enumerative designs that make come indicators over time instead of whether
comparison of datasets that are fixed at discrete one number is different from another (i.e.,
points in time). the efficacy question). The key characteristics
In summary, the primary purpose of research of research for efficiency and effectiveness
for efficacy purposes is to build knowledge by include:
testing theories against empirical evidence. The ■■ A focus on solving practical problems rather
research results may not have immediate or even than testing or building theoretical models.
readily apparent practical outcomes, however. ■■ A dynamic perspective rather than a static or
For example, research done in the vacuum of aggregated perspective. This is the distinc-
space by astronauts has produced amazingly tion discussed earlier about the difference
pure crystals, but scientists are not sure what between enumerative and analytic studies.
they can actually do with this new knowledge ■■ Use of small samples of data selected con-
on earth. This new knowledge will be added tinuously over time. For example, sample
to the rest of the body of knowledge about sizes could be as small as five patients a day
crystalline formation, and someday someone or 10 each week. The frequency of sampling
will figure out how to use this knowledge for when conducting research for efficiency
practical purposes. In the meantime, articles will and effectiveness depends on the unit of
be written and new theories about the nature of time (i.e., the subgroup) to be used in the
crystals will be explored. study. More will be said about this point
When healthcare professionals receive training in Chapter 4.
in research designs and statistical methods, the ■■ More reliance on graphical displays of data
approach described here is usually the standard than on descriptive statistics and tests of
frame of reference (i.e., research for efficacy). significance.
In my mind, this is one of the primary reasons
why many healthcare professionals seem to Since Brook et al. made the distinction
have a strong proclivity to focus on comparing between research for efficacy and research for
two numbers (e.g., this month compared to last efficiency and effectiveness in 1996, considerable
The Three Faces of Performance Measurement 33

progress has been made to incorporate the latter


form of research into the healthcare field. Today, ▸▸ The Three Faces
what Brook called research for efficiency and
effectiveness is essentially known as the science
of Performance
of improvement (SOI). The rich history of the
SOI has been summarized nicely by Moen and
Measurement
Norman (2010). Perla, Provost, and Parry (2012) In 1997, Solberg, Mosser, and McDonald wrote
provide additional depth in exploring what they an excellent article that provided a very practical
call the “seven propositions of the SOI.” Taylor compliment to the more conceptual framework
et al. (2013) provide a very nice review of how the provided by Brook et al. (1996). Solberg et al.
SOI has been applied to improvement efforts in identified what they called the three faces of
healthcare settings.6 performance measurement: measurement for
The key messages Brook et al. (1996) helped improvement, measurement for accountability
to reinforce were that all measurement is the same (or what I refer to as judgment), and measure-
and that the scientific method lies at the heart ment for research. They further argue that these
of all good research, whether it is done on large three approaches to measurement should not be
static comparison groups (i.e., efficacy research mixed: “We are increasingly realizing not only
or enumerative studies) or in real-time settings how critical measurement is to the QI we seek
(i.e., research for efficiency and effectiveness or but also how counterproductive it can be to mix
analytic studies). All research designs and methods measurement for accountability or research with
have utility. It is up to the individual researcher measurement for improvement” (p. 135).
to build a knowledge base that allows him or her The authors describe the characteristics of
to know which design, method, and/or statistical each of the three faces and how each approach
technique is most appropriate for the questions to measurement is based on a different purpose
that need to be answered. Consider an analogy or aim and methods. TABLE 2-1 provides my
from the surgical field. The surgeon has many adaptation of the key points from this valuable
tools and methods available for surgery. The chal- article. The three faces are listed as the column
lenge is not to use all of them just because they headings and the rows identify the major
are there or to state unequivocally that one tool aspects that need to be addressed during the
or method is superior to all the others. Instead, measurement journey. The reader is encouraged
the real challenge is to have knowledge of all to spend a few minutes reviewing the details
the tools and methods and know when to use the in each cell of this. Even a quick perusal of
right tool at the right time to solve the problem the table’s content will quickly reveal that the
at hand. It is the same way with research. The three faces not only have very different aims
question or problem being addressed should be but also use different methods and strategies
the primary driver for deciding which research to collect and analyze data. For each of the
design or methods are appropriate for turning three faces the following key characteristics
data into information. Some research methods are briefly discussed:
and tools are best suited to answer questions ■■ Measurement aim
related to efficacy. Others are more appropriate ■■ Testing methods and observability
for questions related to effectiveness and efficiency. ■■ Data collection and sample size
The wise (and in my opinion humble) researcher ■■ Determining whether the data demonstrate
should know enough about each approach to an improvement or change (I don’t get why
know when to use one approach and not the these words were capped and the previous
other. The questions, not the statistical tools or lines were not? So I changed them to lower
methods, should drive the research endeavor. case.)
34 Chapter 2 Why Are You Measuring?

TABLE 2-1 The three faces of performance measurement

Accountability
Aspect Improvement (Judgement) Research

Aim Improvement of Comparison, choice, New knowledge


care (efficiency and reassurance, motivation (efficacy)
effectiveness) for change

Methods: Test observable No test, evaluate current Test blinded or


■■ Test performance controlled
observability

■■ Bias Accept consistent Measure and adjust to Design to eliminate


bias reduce bias bias

■■ Data “Just enough” data, Obtain 100% of available “Just in case” data
small sequential relevant data
samples

■■ Flexibility of Flexible hypothesis, No hypothesis Fixed hypothesis (null


hypothesis changes as hypothesis)

■■ Testing strategy Sequential tests No tests One large test

■■ Determining Analytic statistics No change focus Enumerative statistics


whether (SPC); run & control (maybe computer a (t-test, F-test, chi
change is an charts percent age change or square, p-values)
improvement rank the order of results)

■■ Confidentiality Data used only by Data available for public Research subjects’
of the data those involved with consumption and identity protected
improvement review

Modified from Lief Solberg, Gordon Mosser and Sharon McDonald, Journal on Quality Improvement vol. 23, no. 3, (March 1997), 135-147.

As you review Table 2-1 notice that I have placed Yet, I often find that although it is quite popular
dashed lines as the vertical dividers between the to say you are committed to quality and safety
three columns. This was done to help remind us these days, many organizations do not follow
that the three approaches should not be separate the measurement principles and practices that
and isolated from each other. The learning from Solberg et al. laid out a number of years ago.
each column should be permeable and influence How well do your QI measurement activities
the other two columns. More will be said about align with the following characteristics?
this notion later.
The improvement column is where many ■■ Measurement Aim. As mentioned previously,
healthcare professionals claim to be working. the aim of measurement for improvement
The Three Faces of Performance Measurement 35

is centered on enhancing the efficiency and et al., 1996, p. 142). This approach to data
effectiveness of care processes and outcomes. collection is referred to as gathering “just
There is a direct connection between mea- enough” data.
surement for improvement and measurement ■■ Determining whether the data demonstrate
for research. Improvement is basically the an improvement or change. (Again these
extension of the research act. Traditional capitals) the primary branch of statistics used
research helps us determine “the what” (i.e., to analyze improvement data is SPC. This
what might be a reasonable new idea to test?) branch of analytical statistics is a combination
whereas improvement research allows us to of graphical displays of data over time plus
determine “the how.” How can we implement statistical estimates of the variation within
an efficacious idea, technique, procedure, or the data display. This is achieved by analyzing
drug so that it performs 100% of the time the data with run charts or Shewhart charts.
in an efficient and effective manner every Note that statistical tests of significance (e.g.,
time it is administered or applied? p-values) are not appropriate for improve-
■■ Testing Methods and Observability. When a ment research. On this point Deming (1992,
team is working on testing a new idea that p. 312) wrote “Students are not warned
they feel will improve a process or outcome in class nor in the books that for analytic
they are usually following what Campbell purposes (such as to improve a process),
and Stanley (1963, p. 37–43) would classify as distributions and calculations of mean, mode,
quasiexperimental time series design. This is standard deviation, chi-square, t-test, etc.
not as rigorous or complex a design as a true serve no useful purpose for improvement
experimental design but it is quite good for of a process unless the data were produced
addressing issues of internal validity, which in a state of statistical control. The first step
is what QI is all about (i.e., comparing your in the examinations of data is accordingly to
current performance against a more desirable question the state of statistical control that
level of performance over time). The other produced the data.” Details on these statistical
characteristic of improvement measurement methods are provided in Chapters 8 and 9.
related to methods is that the work of the The second column in Table 2-1 addresses
researchers (i.e., the team) is observable measurement for accountability or as I refer
not only to the team and management but to it, judgment. The role of measurement for
to anyone else who wants to see the team’s judgment within the healthcare industry has
work. It is not uncommon for example, to increased dramatically over the past 10 years
observe team members posting the results throughout the world (see Chapter 1 for addi-
of their most recent tests in a location that tional detail on this topic). Measurement for
the patients or the public can review.7 accountability (judgment) should not always be
■■ Data Collection and Sample Size. Data collection labeled as negative. We all need to be accountable
for improvement is structured around small for our actions and the work we produce. It is
samples collected sequentially within time important for both internal as well as external
periods as close to the actual productions purposes. The problem from my perspective is
of work as possible. Solberg et al. describe when individuals are judged, rated, or ranked
this approach to data collection as “good and such assessments are not linked to improve-
enough” data collection. They write, “Because ment. The key characteristics of measurement
a high degree of precision is not necessary for accountability and judgment are:
for improvement purposes and because data
collection needs to be simple and repetitive, ■■ Measurement Aim. The primary purpose
small samples, for example, 10–20 cases per of measurement for accountability or
sample are usually appropriate” (Solberg judgment is to make comparisons, rate
36 Chapter 2 Why Are You Measuring?

and rank performance, pass judgment on current conditions of operation and their
performance, help customers make choices results are quite different than they were
(as is done for example by Consumer Reports a year ago when the comparative data
or the Leapfrog Group in the United States were collected. The final aspect of data
that rates and ranks healthcare providers), collection for judgment is that most of
decide on the distribution of bonuses and the measures in this category are outcome
incentives, and/or drive competition and measures (e.g., overall hospital mortality,
stimulate a desire for change. This form infection rates, or patient responses to a
of measurement is typically focused on general survey question such as, “How
comparing groups or individuals and would you rate the overall quality of your
asking a very simple question, “Is your care?”). The problem created by a singular
performance better now than it was the focus on high-level outcome indicators is
last time we looked at you? Yes or no?” In that a focus on outcomes alone provides
most instances, the answer to this question little or no insight into the processes that
is based on current performance compared produce the outcomes or more important
to past performance or performance against what needs to be changed in order to move
targets or goals. the outcomes to a more preferred level
■■ Testing Methods and Observability. There is of performance. This occurs most often
no rigorous testing occurring within this when external agencies or organizations
approach to performance measurement pass judgment on healthcare providers.
let alone any application of experimental When accountability or judgment is done
or quasiexperimental designs. Because the internally against target or goals, however,
focus is on evaluating current performance there is usually more concern over hitting
against past performance all that matters internal targets or goals that can be aimed
is whether there is a difference between at improving the processes that drive the
time 1 and time 2. When the comparisons outcomes.
are made it is not uncommon to use na- ■■ Determining Whether the Data Demonstrate
tional, regional, state or provincial norms an Improvement or Change. Statistical
as comparative reference markers (dare I analysis for accountability and judgment
say benchmarks?) for external compar- is essentially a binomial question. Are you
isons and targets and goals for internal better now than when we last looked at you?
comparisons. Yes or no? Do you have fewer infections
■■ Data Collection and Sample Size. The ob- now than when we posted your data a year
jective is to obtain 100% of available data ago? Has your inpatient mortality dropped?
for the defined period of time, which is Yes or no? The answer to this question is
usually by quarters or years and is usually usually based on comparing raw numbers,
aggregated into summary statistics. The percentages, or rates at two points in time.
major controversy that emerges from this Some comparisons will be based on calcu-
approach is that the data are lagged and lating a percentage change to see whether it
the aggregated results, therefore, may not meets or exceeds a specified target or goal.
reflect current performance. For example, One of the more popular methods used to
hospital ratings and rankings at the state, determine whether a change has occurred
province, or national level may lag for as from a judgment perspective is to use what
much as a year or more. In these cases is popularly known as the “traffic light”
when the results are released hospital scorecard. This approach establishes three
leaders are quick to point out that the cut points against a target or goal and then
The Three Faces of Performance Measurement 37

assigns red, yellow, or green colors to the and quasiexperimental designs (Campbell &
units of observation based on where each Stanley, 1963) are used to maximize the
unit falls against the targeted cut points. In validity and reliability of the results and
this case, green is usually at or above target, reduce threats to the study.9 Frequently,
yellow indicates that the unit of observation is blinded tests are used so that neither the
not getting better or worse, and red indicates researchers nor the participants in the study
performance below target or goal. A variation know which participants receive the actual
on this same methodology is to assign stars test intervention and which ones receive
rather than colors to indicate the level of the placebo.
performance. The better the performance ■■ Data Collection and Sample Size. One of the
the more stars you get. Such methods to key points of difference between the three
determine whether a change has occurred faces of performance measurement relates to
are frequently tied with pay for performance data collection strategy. Solberg et al. point
bonus and incentive programs. Although out that when conducting research studies
the negative impacts of these approaches we typically create large, complex datasets
have been well documented by Deming that are based on data that have occurred in
(1992, 1994), Kohn (1986, 1993), Berwick the past (i.e., last quarter or more often last
(1995), and Herzberg (2003), the traffic year). The authors refer to this type of data
light method of analyzing performance as “just in case” data. I believe this term is
is still widespread in healthcare settings. used because when doing research we usually
Besides being antithetical to the basic tenets collect more data than we may need “just
of QI, the problem with the traffic light in case” reviewers of our research, journal
approach is that it completely ignores the editors, or critics raise questions about our
underlying variation in the processes and results or methods. If this happens, we can
related outcomes being judged. Further respond, “No problem. I collected additional
details on understanding variation will be data ‘just in case’ you raised that question.”10
explored in Chapter 6. ■■ Determining Whether the Data Demonstrate
an Improvement or Change. When we conduct
The third face of performance measurement is
statistical analysis for research we typically
measurement for research purposes. This approach
use methods that were mentioned previously
serves a vital and extremely important function
in the description of enumerative studies.
within the healthcare industry. Interestingly
Usually these approaches compare groups
enough, however, although healthcare profes-
or identify which variables are believed to
sionals frequently reference RCTs and research
have “significant” influence over the de-
findings there are actually very few healthcare
pendent or outcome variable. In a research
professionals who work full time in this area.8
context we usually employ statistical tests
■■ Measurement Aim. As has been stated the of significance and most notably a p-value
primary aim of research (or efficacy to use to determine whether the null hypothesis
both Solberg’s and Brook’s term), is to de- (fixed hypothesis) is rejected or not and
velop new theories, test existing theories, significant results are observed between the
and build knowledge. experimental and control groups. Although
■■ Testing Methods and Observability. We take the statistical notion of “significance” has
elaborate steps to help ensure that exogenous played a major role in making clinical
variables that could confound or mess up decisions, there has been considerable
our study and thereby invalidate the results debate in the literature about the “signif-
are controlledor held constant. Experimental icance test controversy” (Ioannidis, 2005;
38 Chapter 2 Why Are You Measuring?

As... As a...

h
earc
Res
or,
Improvement

Judgment

Research
Im
pr
ov t
em
gmen
en
t Jud

FIGURE 2-2 Viewing the three faces of performance measurement as silos and as a Rubik’s cube
© Georgi Nutsov/Shutterstock; © tinka’s/Shutterstock

Zwarenstein & Oxman, 2006; Morrison &


Henkel, 1970; Ziliak & McCloskey, 2011). earc
h
Res
Readers interested in this topic should make
a special effort to read the works of Ziliak
and McCloskey. They provide a wonderful Im
pr
ov t
historical review of the development of the em
g men
en
t Jud
test of significance, especially the p-value, and
then provide a comprehensive review of the
debate surrounding the test of significance
controversy. In their overview of the three FIGURE 2-3 Viewing the three faces of performance
faces of performance measurement Solberg measurement as a Rubik’s cube
et al. conclude “how counterproductive it is
to mix measurement for accountability or
research with measurement for improve- view the three approaches as separate and
ment” (1997, p. 135). This may have been distinct silos as shown in FIGURE 2-2. I also
true in 1997 but I do not think it is quite do not believe that it is “counterproductive
true today or an appropriate approach for to mix measurement for accountability or
current time and challenges. In my opinion, research with measurement for improve-
healthcare organizations need leaders and ment.” Instead I believe that it is more
frontline individuals who are comfortable effective to view the three faces as a Rubik’s
and competent in being able to blend the cube. FIGURE 2-3 provides a graphic image
three faces of performance measurement. of this analogy.
Although I agree with the authors that it
can send confusing messages to say you are Healthcare leaders will improve their
doing research and then apply procedures and measurement capacity if they navigate the
methods that characterize measurement for edges and interfaces of the three faces of per-
accountability (judgment) or improvement formance measurement. Sometimes, a person
or any other combination of the three faces becomes strongly invested in only one type
and the aspects of measurement, I do not of measurement. A person may talk about
The Three Faces of Performance Measurement 39

measurement for research as the “only” valid translator must be able to read and understand
type of measurement for example. Similarly, scientific research and then synthesize and
others may talk about aggregate and summary interpret it in relation to policy perspectives”
data as being preferable because it allows them (p. 13). Healthcare organizations need individ-
to compare hospitals, clinics, and groups of uals who can function as translators and be able
doctors, cities, or regions. In this case, they are to talk to individuals who are leading each of
invested squarely in the center of the measure- the three faces of performance improvement
ment for judgment space of the cube. Others and find the linkages between improvement,
may say that although the other approaches to judgment, and research. It is not that difficult but
measurement must be addressed, the prefer- it does require the development of specialized
able way to proceed with measurement is for skills as Konan points out.
QI and all other approaches are of secondary For physician and nurse leaders especially,
importance. Instead of creating more silos the edges of the cube where the three faces are
within the healthcare industry we should be connected are critical. They need to be able
investing time in creating what Konan (1981) to walk comfortably back and forth between
calls translation. According to Konan (p. 12), the three faces of the cube so as to understand
“The term ‘translation’ is chosen because the how valid and reliable measurement is struc-
process is like translating prose from one lan- tured and organized within each of the three
guage to another. A good language translation is approaches. Translators are not stuck in any
not a literal rewriting of each sentence. Rather one of the measurement silos but work to break
it captures the essence of the original copy and down barriers and then build on the strengths
then uses the mentality of the second language of each approach. All three approaches must
to present the material to a different audience.” be understood as a system. The problem is that
If we are to have translation between the three individuals identify with one of the approaches
faces of performance measurement, then we and dismiss the value of the other two. This
need translators not silo builders. Konan picks leads to duplication of effort and redundancy
up on this notion by stating, “Being a good in data collection and places limitations on
foreign-language translator requires more than learning. A short case study demonstrating
the ability to read and write another language. how a translator with the appropriate skills and
Similarity, the skills required for translating knowledge can flip the cube around to address
research into everyday language are different the aims of each of the three faces while using
from those required for doing research. The the same data follows.

CASE STUDY #1: Being a Translator


Situation
The board of a 310-bed community hospital is concerned that the hospital did not fare too well in a
recent citywide report on inpatient infection rates. This report compared 16 hospitals and showed that
the hospital ranked 13th out of the 16 hospitals. The report was recently highlighted in a newspaper
story and the hospital was singled out as one of the worst performers. An improvement team was
established to improve these results.

(continues)
40 Chapter 2 Why Are You Measuring?

CASE STUDY #1: Being a Translator (continued)

The Data
The QI department worked with the 11 inpatient units and 2 critical care units to establish historical
baselines on the number of infections using run charts with the number of infections (e.g., central line
bloodstream infections, clostridium difficile, and urinary tract infections) plotted on the vertical axis and
week displayed as the unit of time on the horizontal axis. Each unit then initiated various improvement
strategies and continued to track the various infections on their run charts. Each time they tested a
new idea they annotated the run chart to identify when the test was conducted.

Discussion
As the teams tracked their improvement tests and plotted the weekly number of infections, they
noticed that a majority of the units demonstrated a gradual decline in the number of infections. The
three units that did not show declines in the number of infections visited the other units to determine
what they were doing that did produce better results. After several team meetings these two units
also started to notice declines in their numbers of infections. So now it was time to report back to
management and the board. The staff in the QI department, functioning as translators, knew that the
board had a very simple question for them: “Do we have fewer infections now than when we first saw
the ranking of the hospital in the public report?”
The QI staff had all the detailed improvement data for the 13 units plotted by week on the run
charts. They knew, however, that this was too much detail for the board. So, they flipped the Rubik’s
cube around to the accountability face and aggregated the detailed run charts into two numbers; the
median number of infections for the entire hospital in the baseline period and the median number of
infections over the last 12 weeks. The baseline median for the entire hospital was 29 infections whereas
the more recent median after changing the infection assessment protocol and management processes
was 17. Both the baseline and the current number of infections were below the number that had been
printed in the public report. This allowed the QI director to go to the board meeting and provide a
simple answer to the board’s question: “Yes, we are safer now than we were when the public report
was released and we have continued to improve since then.” The QI translators have been able to take
the detailed improvement data from the units, flip it around, aggregate it, and provide the board with
an answer to their accountability question.
The final step in translating these data would come about when the translators take the detailed
improvement data and the run charts from the units and meet with researchers to discuss and
possibly design a quasiexperimental research study to see whether a totally new intervention that
had not been tested previously could be conducted to determine whether the idea is efficacious.
In this way, the same dataset has been used to address questions relevant to all three faces of
performance measurement. The key, however, is having translators who are able to understand the
aims and methods of each of the three faces and then be able to develop linkages between the sides
of the Rubik’s cube.
We have covered a lot of ground in this chapter. We started by connecting the dots and thinking
about our ability to predict with data. This led us into a discussion on the types of studies with a focus on
the differences between enumerative and analytic studies. Next we thought about the roles of research
for efficacy as compared with research for efficiency and effectiveness. This led us finally to discussing the
three faces of performance measurement and the fundamental question “Why are you measuring?” With
these fundamental concepts in place we are now ready to start our own QMJ.
Notes 41

Notes descriptive statistics consists of measures


that reveal the dispersion in a distribution
1. Figure 2-1B with all 183 dots connected. of data. The simplest of these measures
Did you predict that it would be a hurdler? is the maximum and minimum values
in a distribution. This is followed by the
36
37 35

52 50
48
46

53 33
47 34
range, which is the absolute difference
between minimum and maximum values.
45
54 55 32
42
38 60 51 49 57
43 61 31
41 56 59
44 62 63
39

Finally there are a number of descriptive


78 58 65 64 30
79 23
40 80 66 67 24
86 85 81
69 29
68
84 70 25 22
77 72 71 28
73

statistics that capture various aspects of


74 21
1 83 75
82 76 7
3 27
87 26
5 8

dispersion. These include the sum of the


2
4
88 178 9 20
176 175
6 19
89 177 173 18
180 171
174 183 17

deviations, the average deviation, the


179 172 168
10 16
170 167
182 169
90
181 12 15 155

sample or population variance, and the


166 11
91 165 154
162 14
97 164
13 156
92 94 163
98 153

standard deviation. Further details on


160
96 161
99 93 134
100
158
95 157 152

any of these descriptive statistics can be


101 159
145
102 121 141
140

103 122 151

found in any basic statistics text.


133 135 143
105 146
139 142
107 120 144
106 150
119 123 136

3. This is not to say that the historical data


127
124 132 137 138 147
108 149
118 130 148
131
125 126 128
109 129

have no value at all. They have little or no


104

110 111
117

value to the pilots making the flight you


112

113

114 115
116

are on right now. But the historical data are


FIGURE 2-1B All 183 dots connected. Did you of great value to the designers, engineers,
predict that it would be a hurdler? and technicians at Boeing who built and
maintain the 777s. The historical data on
2. You will remember from your basic past performance is also of value if there
statistics class (or “sadistics” as some is an accident and the Federal Aviation
might call them) that there are two basic Administration is trying to determine
groups of descriptive statistics. First, when a possible failure started to occur.
there are measures of central tendency The key point to be made in this example is
characterized by the mean, the median, that the question you are trying to answer
and the mode. The mean is simply the should guide your measurement journey.
arithmetic average of all the numbers in a The measures you choose to track, your
distribution. The median is the point data collection plan, statistical analysis
where half the data are above this point techniques, and the conclusions you can
and half are below it. Another name for make all start with the question(s) you
the median is the 50th percentile. The are trying to answer.
mode is the most frequently appearing 4. This is a sometimes a challenge for indi­
number in a distribution of numbers. viduals who conduct enumerative studies.
In the classic picture of the theoretical Specifically they do not confine themelves
normal curve these three numbers are to making inferences about the ­members
shown as the vertical line in the center or entities in the frame. As one of the
of the bell curve and depict the mean, professors in my doctoral program used to
median, and mode all at the same posi- say, “They frequently want to drive beyond
tion. Typically this happens only when a their headlights.” That is they define a frame
theoretical normal curve is being shown and then arrive at a conclusion about the
or when computer simulations are run. In members or entities in the frame but they
real life, however, it is rare for the mean, don’t stop there as they should. If, for ex-
median, and mode to all be represented ample, an enumerative study is conducted
by the same number. The second group of on all the hip replacement surgeries done
42 Chapter 2 Why Are You Measuring?

last year at Hospital A and the research- challenged as not being a “legitimate
ers report that 62% of the patients who science.” When I hear this type of statement
participated in pre-hab (i.e., engaging in I always wonder if the person making
rehabilitation exercises prior to surgery the comment would ask sociologists,
to strengthen their muscles) were able to psychologists, economists, or chemists to
walk with the assistance of only a cane prove that their disciplines are scientific
on the second day after surgery then all in nature.
they can rightfully conclude is that this 7. Observability or transparency of improve-
is the outcome for Hospital A. If there ment tests and the related results are not
are two other hospitals in the system, always widely embraced, especially in a
however, that also do hip replacement country as litigious as the United States.
surgery but were not part of the original Some healthcare organizations in this
frame would the researchers be justified country, for example, have been counseled
in telling the other hospital orthopedic by their legal staff to not reveal, report, or
directors that they should have all their publicly post QI results. The concern is that
hip replacement patients engage in pre- if a patient experienced an adverse event
hab because Hospital A had 62% of its or harm and data are posted publicly then
patients walking with a cane on the second the family and their lawyers might use the
day after surgery? If they did arrive at this data to support their claim that the provider
conclusion they would be “driving beyond did not have quality processes in place at
their headlights.” Their study merely the time of the incident. This position has
described a characteristic of a subset of been shown to have little credibility, how-
patients in the frame. This conclusion ever, and the transparency of QI initiatives
provides no insights as to why 62% of the is increasing with fewer legal challenges
patients were walking with assistance of a each year (see, e.g., Baily et al., 2006).
cane on the second day. Who were these 8. I have arrived at this conclusion by asking
patients? All male? A mix of males and participants in my workshops how many
females? What ages were these patients? of them spend a majority of their time
Did they have any comorbidities? When working on setting up and running ex-
questions like these are being asked, the perimental or quasiexperimental medical
study needs to move from being an enu- research projects. Even in audiences of
merative to an analytic design. several hundred I usually have either only
5. Note that I am using the term “researcher” a few or no hands go up. Most healthcare
in a fairly broad context in this text. For professionals do not spend a majority
me, you do not need an academic degree of their time in conducting academic
or work for an organization that has the research studies designed for efficacy
word research in its name to be called a purposes. On the other hand, when I
researcher. I maintain an applied perspective ask audiences how many of them spend
when it comes to defining who might be most of their time being accountable for
a researcher. Anyone who is interested in results or trying to improve results nearly
answering a question, finding a better way all the participants have their hands up
to perform a task or activity, or generating in the air. The point is that although we
new knowledge for themselves or others rely on research to move the healthcare
is in my mind a researcher. field forward, a majority of healthcare
6. Despite the rich history of the SOI and its professionals, those delivering care on
very direct links to the scientific method, a day-to-day basis, are not engaged in
it is interesting that the SOI is frequently conducting research. These people are
Notes 43

being charged with being accountable week and the average is calculated. This
for and/or improving the efficiency and serves as the posttest value. Although your
effectiveness of care. pretest number of falls over the past year
9. One of the best resources on this point was on the average 23 per month, there is
is Campbell and Stanley (1963). This is great excitement when the posttest average
a classic reference relevant to any field (which remember is only for 1 week) re-
of research and something that in my veals only 16 falls. What do you conclude
opinion should be required reading for from these two numbers? The wrong
any healthcare professional. The authors conclusion is that the posters “caused”
discuss three preexperimental designs, fewer falls. The correct conclusion is that
three true experimental designs, three it reveals absolutely nothing. Campbell
quasiexperimental designs, counterbal- and Stanley write that “such studies (i.e.,
anced designs, and four separate-sample one-shot case studies) have such a total
pretest–posttest designs. They discuss each absence of control as to be of almost no
of these designs in terms of 12 threats to scientific value” (p. 6). Yet we see many
internal and external validity, which are pretest–intervention–posttest designs in
a challenge to be addressed in any study health care in which completely incorrect
design. Some of the designs handle these conclusions are offered. Most healthcare
threats better than others. For example study designs fall into the category of
the weakest of all designs is the one-shot preexperimental designs. This is why I
case study. Campbell and Stanley point believe that all healthcare professionals,
out that this design is used frequently especially managers and leaders, need
in educational studies. I would add to to spend a little time learning from Drs.
this that it is also used unfortunately all Campbell and Stanley.
too often in healthcare settings. You will 10. An example of this “just in case” approach
probably recognize this study design. It to data collection comes from my own ex-
carried out with a single study group over perience. When I was building the dataset
two time periods. Time period 1 is the for my doctoral dissertation I spent the
pretest marker, sometimes referred to as better part of a year designing, organizing,
a baseline measure. The intervention is and executing the data collection plan. My
administered to the study group and then a dissertation research was designed to iden-
posttest measurement of the variable of tify factors that led manufacturing firms to
interest is taken. For example, imagine locate in rural and small communities in
that your long-term care facility has been Pennsylvania (Lloyd, 1983). My data plan
ranked in the bottom decile of a statewide involved collecting data from Pennsylvania
study on falls in nursing homes. The board state archives, the U.S. Census, Dun and
of your facility has made it clear to the ­Bradstreet’s Market Identifiers File, County
management team that this outcome is Business Patterns, the Pennsylvania Bureau
unacceptable and must be improved. You of Economic Analysis, local government
have the pretest result showing that your documents, and phone call flow data from
facility is in the bottom decile. Someone AT&T. This was clearly a “just in case” data-
comes up with the idea of placing posters set that enabled me to respond not only to
up throughout the facility admonishing additional analytic questions my committee
everyone to “Pay Attention! Only YOU posed to me but also to address editorial
can prevent patient falls!” The posters questions when it came time to publish
are in place for 1 week. Then the number the results of my research in a professional
of falls is recorded each day for the next journal (Lloyd & Wilkinson, 1985).
44 Chapter 2 Why Are You Measuring?

References Moen, R., T. Nolan, and L. Provost. Quality Improvement


Through Planned Experimentation, 3rd ed. New York,
Babbie, E. R. The Practice of Social Research. Belmont, CA: McGraw Hill, 2012.
Wadsworth Publishing Company, 1979. Moen, R., and C. Norman. “Circling Back: Clearing up
Baily, M., M. Bottrell, J. Lynn, and B. Jennings. “The Ethics Myths about the Deming Cycle and Seeing How It Keeps
of Using QI Methods to Improve Quality and Safety in Evolving.” Quality Progress, November 2010, 22–28.
Health Care.” Hastings Center Report 36, no. 4 (2006): Morrison, D., and R. Henkel, eds. The Significance Test
S1–S40. Controversy: A Reader. Chicago: Aldine, 1970.
Berwick, D. “The Toxicity of Pay for Performance.” Quality Ostrom, C. Time Series Analysis: Regression Techniques.
Management in Health Care 4, no. 1 (1995): 27–33. Beverly Hills, CA: Sage Publications, 1978.
Blalock, H. Social Statistics, rev. 2nd ed. New York: Perla, R., L. Provost, and G. Parry. “Seven Propositions of
McGraw-Hill, 1979. the Science of Improvement: Exploring Foundations.”
Brook, R., C. Kamberg, and E. McGlynn. “Health System Quality Management in Health Care 22, no. 3 (2012):
Reform and Quality.” JAMA 276, no. 6 (1996): 476–480. 170–186.
Campbell, D. and J. Stanley. Experimental and Quasi-Experimental Posavac, E., and R. Carey. Program Evaluation: Methods and
Designs for Research. Boston, MA: Houghton Mifflin, 1966. Case Studies, 4th ed. Englewood Cliffs, NJ: Prentice
Deming, W. E. “On Classification of the Problems of Sta- Hall, 1980.
tistical Inference.” Journal of the American Statistical Provost, L. “Analytic Studies: A Framework for Quality
Association 37 (1942): 173–185. Improvement Design and Analysis.” BMJ Quality &
Deming, W. E. Some Theory of Sampling. New York: Wiley & Safety 20, supplement 1 (2011): i92–i96.
Sons, 1950, reprinted in 1960. Provost, L., and S. Murray. The Health Care Data Guide.
Deming, W. E. “On Probability as a Basis for Action.” ­American San Francisco, CA: Jossey-Bass, 2011.
Statistician 29, no. 4 (1975): 146–152. Selltiz, C., M. Jahoda, M. Deutsch, and S. Cook. Research
Deming, W. E. Out of the Crisis. Cambridge, MA: M ­ assachusetts Methods in Social Relations. New York: Holt, Rinehart &
Institute of Technology, 1992. Winston, 1959.
Deming, W. E. The New Economics. Cambridge, MA: Shewhart, W. Economic Control of Quality of Manufactured
­Massachusetts Institute of Technology, 1994. Product. New York: D. Van Nostrand, 1931; reprinted
Herzberg, F. “One More Time: How Do You Motivate Employ- by the American Society for Quality, 1980.
ees?” Harvard Business Review 81, no. 1 (2003): 86–96. Shewhart, W. Statistical Method from the Viewpoint of
Ioannidis, J. “Why Most Published Research Findings Are Quality Control. Mineola, NY: Dover Publications, 1939.
False.” PLoS Medicine 2, no. 8 (2005): 124. doi:10.1371 Solberg, L., G. Mosser, and S. McDonald. “The Three Faces
/journal.pmed.0020124 of Performance Measurement: Improvement, Account-
Juran, J. Juran on Planning for Quality. New York: Free ability and Research.” Journal on Quality Improvement
Press, 1988. 23, no. 3 (1997): 135–147.
Juran, J. Juan on Quality by Design. New York: Free Press, 1992. Taylor, M., C. McNicholas, C. Nicolay, A. Darzi, D. Bell,
Khoury. M. J., M. Gwinn, P. W. Yoon, N. Dowling, C. A. and J. E. Reed. “Systematic Review of the Application
Moore, and L. Bradley. “The Continuum of Translation of the Plan–Do–Study–Act Method to Improve Quality
Research in Genomic Medicine: How Can We Accel- in Healthcare,” BMJ Quality & Safety 23, no. 4 (2013):
erate the Appropriate Integration of Human Genome 1–9. doi:10.1136/bmjqs-2013-001862.
Discoveries into Health Care and Disease Prevention?” Weiss, C. Evaluation Research: Methods of Assessing
Genetics in Medicine 9, no. 10 (2007): 665–674. Program Effectiveness. Englewood Cliffs, NJ: Pren-
Kohn, A. No Contest: The Case Against Competition. New tice-Hall, 1972.
York: Houghton Mifflin Company, 1986. Westfall, J., J. Mold, and L. Fagnan. “Practice-Based
Kohn, A. Punished by Rewards. New York: Houghton Mifflin Research—Blue Highways on the NIH Roadmap.” JAMA
Company, 1993. 297, no. 4 (2007): 403–406.
Konan, M. “Translation: A Neglected Stage on the Research Wheeler, D. Understanding Variation: The Key to Managing
Process.” Rural Sociologist 1, no. 1 (1981): 11–18. Chaos. Knoxville, TN: SPC Press, 1993.
Lloyd, R. “Vertical and Horizontal Linkages on Manufacturing Wheeler, D. Advanced Topics in Statistical Process Control.
Employment In Rural Communities of Pennsylvania.”PhD Knoxville, TN: SPC Press, 1995.
diss., The Pennsylvania State University, 1983. Ziliak, S., and D. McCloskey. The Cult of Statistical Signifi-
Lloyd, R., and K. Wilkinson. “Community Factors in Rural cance: How the Standard Error Costs Us Jobs, Justice, and
Manufacturing Development.” Rural Sociology 50, no. 1 Lives. Ann Arbor, MI: University of Michigan Press, 2011.
(1985): 27–37. Zwarenstein, M., and A. Oxman. “Why Are So Few
McDowall, D., R. McCleary, E. Meidinger, and R. Hay. Randomized Trials Useful, and What Can We Do
­Interrupted Time Series Analysis. Beverly Hills, CA: About It?” Journal of Clinical Epidemiology 59 (2006):
Sage Publications, 1980. 1125–1126.
CHAPTER 3
Measuring the Voice
of the Customer
about your political representatives to whether
▸▸ It All Starts With or not your community should raise taxes, so
more trees can be planted in the parkways. If you
Listening are like most people, these interruptions to your
Good evening, my name is Pam and dinner hour, the added paper in your mailbox,
I’m calling on behalf of Acme Storm or unsolicited feedback requests in your email
Window Company. Recently we installed are basically irritations. But if you look at these
new windows in your home and I was “irritations” from the viewpoint of those making
wondering if you would take a few the inquiry, they are usually sincere attempts to
minutes to answer several questions listen to the voice of the customer (VOC).
about our service? Yes, I realize that Quality improvement (QI) begins with lis-
it is dinnertime, but I have only eight tening to the VOC (Deming, 1992; Hayes, 2008;
brief questions to ask you and it will Langley, Nolan, Nolan, Norman, & Provost, 2009;
take less than 2 minutes. No, sir, I’m Scherkenbach, 1991). Customers define quality
not trying to sell you our service. As and set the expectations for performance. So when
I said, we installed new windows in Pam from Acme Storm Window Company calls
your home last week and I merely asking for your opinions about the new windows
want to find out if our staff met your you recently had installed, she is doing it to see
expectations. OK, thanks; I’ll be brief. whether you are delighted with their product and
Here’s the first question. service, moderately satisfied, or totally turned off.
Basically, she wants to listen to you, the VOC.
Sound familiar? We receive one or more of It is the first step on the road to improvement.
this type of customer feedback calls each day or In health care, however, it is ironic that we
at least each week. In addition, you probably also seem to be rather ambivalent about taking this
receive periodic surveys in the mail asking for first step and really listening to those we serve.
your opinions on everything from how you feel Although we list the various things we offer to

© Michal Steflovic/Shutterstock

45
46 Chapter 3 Measuring the Voice of the Customer

patients as “services” (e.g., outpatient services, or fall short in meeting customer expectations?”
laboratory services, emergency services, labor An equally quick and resounding response is
and delivery services), it is frequently brought that at best we meet expectation, rarely exceed
to our attention that our services often fall short expectation, and more often than not fall short
of even meeting expectations let alone exceed- of customer expectation. When I then ask why
ing them. Webster’s II New Riverside University they responded this way, the responses do not
Dictionary (1984, p. 1066) defines service as “an come as quickly. You can see the participants
act of assistance or benefit.” In order to provide thinking about their responses. Eventually, a few
such benefit, however, it is essential that the brave souls offer comments. Some folks point
provider of a service understands the type of to “the complexity of the system,” “not enough
assistance needed or the nature of the desired staff,” or “not having enough time to relate to
benefit—all of which requires listening to those the patients.” Others say that they do not have
who receive the output of our work.1 “accurate data on what customers want or need,”
The very name of what we provide, healthcare and therefore, the patient satisfaction survey
service, explicitly states that we intend to serve results “do not reflect reality and are therefore
those who come to us in need. This requires are not reliable.” You can see the growing conflict
listening, real listening to actually hear the in their eyes as the reality of not being able to
concerns and expectations of the patient and fulfill their desire to “help people” is frequently
their caregivers. Too often, however, we listen compromised by the systems, processes, and
to patients and families in a very passive and cultures in which they have to work.
disengaged manner rather than engaging in
reflective listening and dialogue (Bohm, 1990).
We hear what patients and caregivers say, but
we do not listen to what is behind the words. ▸▸ Creating a Service
So instead of asking “What is the matter with
you?” we should be asking, “What matters to
Excellence Culture
you?” (Barry and Edgman-Levitan, 2012). What The creation of a service excellence culture does
matters to you? Four simple little words that not happen by merely telling everyone to smile
totally change the tone of an exchange between and be nice to people. Health care is not like the
a provider of care and the recipient of that care. old television show Fantasy Island (1978–1984).
If the conversation begins with the question The host (Ricardo Montalban) would start each
“What matters to you?” then rather than doing episode by making sure the staff were all properly
things to or for the patient we would be more aligned and polished. As the plane circled the
inclined to do things with the patient. island and made its final turn for the landing
I genuinely believe that the majority of approach, Montalban’s character, Mr. Roarke,
healthcare providers do care about patients and would recite the opening line, “Smiles, everyone.”
their needs. When I ask the participants in my A dazzling smile, however, will never replace
workshops, for example, why they decided to behaviors that demonstrate that the workers
enter the healthcare profession there is usually really care about the guests on Fantasy Island or
a resounding reply, “because I want to help the patients waiting in a clinic reception area.
people.” I always find this fascinating. The intent What it takes is having structures and processes
to listen and serve seems to be there, but we in place that are reliable and user friendly and
frequently fall short in actually delivering care exceed customer expectations.
that is characterized by listening. I continue by FIGURE 3-1 depicts the organizational
asking workshop participants the follow-up com­ponents that drive the creation of values,
question, “Well, if you got into health care to norms, and behaviors that in turn determine
help people, do you think that we meet, exceed, the organization’s culture. Service excellence
Creating a Service Excellence Culture 47

• Recruitment • Data for


Human Measurement assessment
• Training resource and
• Development issues information • Data for
action
• Attitude
• Common
tools and
methods
Culture
(norms and behaviors)

• Communication
• Performance • Education
• Evaluations • Information
• Rewards Organizational • Support
Incentives
• Celebrations design structures
• Compensation • Leadership

FIGURE 3-1 Components that build an organization’s culture

does not happen by chance. It also does not to the Southwest selection process have been
exist merely because your organization won a the characteristics of humor and the extent to
quality award 2 years ago or has a great mission which individuals seem capable of becoming
statement. Service excellence results from a very part of an extended family of people who work
deliberate focus on the four basic components hard and have fun at the same time (Freiberg
shown in Figure 3-1. These factors will determine and Freiberg, 1996, p. 67). If applicants do not
the culture your internal and external customers meet certain criteria, they are told this up front
experience on a day-to-day basis. so that they do not waste their time or the com-
pany’s. Failure to achieve a particular score on
the profile exam is not an indication that the
Component #1: Human applicant is a “bad” person. It merely reflects
the fact that the applicant’s perspectives and the
Resource Issues organization’s expectations are not aligned. It is
It all begins with the individuals we hire, how we much better to find this out at the initial stages
train them, and what we do to retain them for the of the recruitment process than 6 months or
long run. If the employees are our most valuable a year after the person is hired. But this is not
resource (which almost every organization says usually done in a majority of healthcare settings.
at one point or another), then a considerable As a result, we hear comments such as, “How
amount of effort ought to be directed to this did she get her job?” “Who does he know?”, and
component. Many healthcare organizations have “He offends everyone in the department but no
initiated programs to screen potential applicants one [in management] seems to care.”
to see whether they are in alignment with the Every organization has employees who are
organization’s mission, values, and philosophy. notorious for being rude, domineering, and out of
The Disney organization and Southwest Air- sync with what the organization claims to value.
lines are two well-known companies that have Yet they seem to remain, and sometimes they
used initial screenings for years to determine even get promoted while “good” people leave.
this degree of alignment. Southwest refers to How does this happen? I believe that several
their process as targeted selection. Central key factors create human resource (people)
48 Chapter 3 Measuring the Voice of the Customer

challenges in healthcare organizations. First, practicing physicians, but they quickly become
healthcare organizations often hire strictly for another example of The Peter Principle in action.2
technical skill and assume that the person’s people The third human resource consideration
skills will somehow develop automatically over relates to retention of good staff. As mentioned at
time. Herb Kelleher, the retired chief executive the beginning of this chapter, most organizations
officer (CEO) of Southwest Airlines, made it very have a statement that goes something like this:
clear that he expected his People Department “Our employees are our most valuable resource!”
(what most organizations refer to as the human Yet, when you look at what many organizations
resources department) to “Hire for attitude, actually do to retain, reward, and build a bond of
train for skill.” Kelleher made his hiring policy loyalty with the employees, you start to wonder
very clear: “We look for attitudes; people with a why anyone stays with the organization. Deming
sense of humor who don’t take themselves too put it this way: “The greatest waste in America
seriously. We’ll train you on whatever it is you is failure to use the abilities of people” (1992,
have to do, but the one thing Southwest cannot p. 53). Why do some people love going to work
change in people is inherent attitudes” (Freiberg each day and others hate even to get out of bed?
and Freiberg, 1996, p. 67). Although healthcare Part of it relates to paying fair and competitive
professionals, especially those delivering direct wages. This is only part of the equation, how-
care to patients, must be able to demonstrate ever. The other part, which is a major factor in
technical knowledge, skill, and proficiency in their the healthcare industry, is not related to money
area of specialization, healthcare organizations and perquisites. It is related to how people are
frequently miss or ignore the “inherent attitudes” treated, how they feel about their coworkers,
component of the hiring process. the respect they have for their managers, and
Second, individuals who have done well in the opportunities they have for self-expression
one role (e.g., as a bedside nurse, a family practice and meaningful work. It relates to leadership
physician, or a home healthcare worker) are often and how organizations create forces to motivate
assumed to be prime candidates for promotion their most “valuable resource.”
to managerial or administrative roles. Further-
more, if they have been with the organization
long enough, then it is a foregone conclusion that Component #2: Motivation
they should be “moved up.” Faced with the offer Any healthcare organization that does not give
of more money, status, and responsibility most serious consideration to the broad range of factors
individuals will say, “Thank you, I am ready for that motivate their employees will never be able to
a promotion!” Rarely will you hear, “No, thank develop and sustain a service excellence culture.
you, I am perfectly content to stay where I am.” So Besides basic compensation and benefits, other
they accept the new managerial or administrative key factors that lead to employee motivation
position. In many cases, however, this is a major include (1) the performance evaluation process;
mistake for the individual and the organization. (2) reward and recognition systems; (3) the ex-
The individual soon becomes the living embod- tent to which micromanagement exists within
iment of what Peter and Hull (1970) refer to as the organization; (4) professional freedom and
The Peter Principle—that is, people will be raised autonomy to make decisions; and (5) organi-
to their level of incompetence. Many extremely zational hypocrisy (i.e., the extent to which the
competent physicians, for example, have been organization’s leaders behave in a manner that
placed in high-level administrative roles only is inconsistent with the organization’s mission,
to discover that they are incompetent when it values, and philosophy).
comes to managing people, conflict, and dealing Deming was keenly interested in the topic
with the administrative and political aspects of motivation. In fact, one of his four areas of
of the job. They might have been excellent as profound knowledge (Deming, 1994; Langley
Creating a Service Excellence Culture 49

et al., 2009; Scherkenbach, 1991; Schultz, 1994) by instituting extrinsic motivational factors. In
includes human behavior or psychology and The New Economics (1994, p. 121) he wrote: “We
the key roles they play in managing people. must throw overboard the idea that competition
Although Deming acknowledged readily that is a necessary way of life. In place of competition,
he was not a psychologist, he pointed out con- we need cooperation.” To demonstrate how
stantly that if leaders do not realize that people he believed that the present style of extrinsic
(their employees) have differing desires, hopes, rewards robs people of joy in work, he laid
aspirations, ambitions, expectations about the out what he called the forces of destruction.
role of work in their lives, and learning styles, An adaptation of Deming’s portrayal of these
the organization will never succeed. forces of destruction (1994, p. 122) is shown in
Central to his interest in this subject was FIGURE 3-2. In this figure, intrinsic motivation
the question of whether individuals are in- is shown to be high during the early years of
trinsically or extrinsically motivated. Deming life. Over time, however, Deming maintained
believed very strongly in the value of a person’s that competition driven by extrinsic motivation
intrinsic motivation. His writings and lectures gradually crushes an individual’s internal or
all reinforced the notion that people are born intrinsic motivational forces. Nelson (1994)
with a natural inclination to learn and be in- provides a creative discussion on ways to es-
novative. His concern was that organizations tablish reward systems that strengthen intrinsic
knock intrinsic motivation out of individuals motivation. I think Deming would have found
by making them compete with each other and great value in Nelson’s work.
si o .
vi iti le

ho ls

nt p,
ar s

ce .
an ay
st e

e
di et op
on n

et a

ce u
s.

er
d ad

nc
m l go
rm p

it ro
s
s, mp pe
ol r

ria
fo ive

of g
G fg

t a ca

pr ery
up o e

va
er nt
s. o

ro . C dg

ou eri
r p ce
ol ion

, a Ev
of
, g ts u

ith m
le slo . J

fo In

n
ho ut

on n.
w Nu

io
op o m
sc trib

si io
at
e

vi t
pe in st

di iza
an
In dis

n m sy

y
Pa

pl

y m
t
ed

ee he st

Ex

er pti
tw ut t Mo
rc

ev ob
Fo

Su
Be P

Life begins Life ends

Extrinsic motivation
• Gradually replaces intrinsic motivation
Level of Intrinsic Motivation

• Competition and recognition drives actions


• Focus on the Individual not the System
Intrinsic • Problems attributed to Individuals not System
motivation • Resignation to external pressure - demotivational
• Curiosity
• Cooperation
• Joy in learning Intrinsic motivational factors are high at the beginning
• Self-esteem, dignity of life but are gradually crushed by the forces of
destruction.

Time

FIGURE 3-2 Forces that destroy intrinsic motivation


Reprinted courtesy of The MIT Press from The New Economics for Industry, Government, Education by W. Edwards Deming.
50 Chapter 3 Measuring the Voice of the Customer

On this point Deming wrote (p. 121): today to reflect motivational aspects of human
performance.
What they (forces of destruction) do A second critical piece of research on mo-
is to squeeze out from an individual, tivation was published in 1968 by the Harvard
over his lifetime, his innate intrinsic Business Review. The article, “One More Time:
motivation, self-esteem, dignity. They How Do You Motivate Employees?” by Freder-
build into him fear, self-defense, ick Herzberg, provides an essential review of
extrinsic motivation. We have been social and psychological factors that motivate
destroying our people, from toddlers and demotivate workers. Herzberg noted in
on through the university, and on the this article, “What has been unraveled [about
job. We must preserve the power of the psychology of motivation] with any degree
intrinsic motivation, dignity, cooper- of assurance is small indeed” (1968, p. 86). He
ation, curiosity, joy in learning that was very clear in his writings, however, that
people are born with. managers have only limited power to motivate
employees. He also maintained that extrinsic
The literature on this topic is legion. I will not motivators, like pay-for-performance (PFP) or
go into depth on these materials at this time, incentive programs, do not necessarily make
but there are three important items that deserve workers happy and productive. Like Deming,
mentioning. Probably the best known historical Herzberg maintained that only interesting
piece of research on motivation took place at and challenging work will result in motivating
Western Electric’s Hawthorne Works in Cicero, the workers. Deming referred to this simple
Illinois from 1924 to 1932. Most readers will be principle as finding joy in work.
familiar with the term the “Hawthorne effect” One of Herzberg’s key concepts is “moti-
(aka the observer effect), which emerged as a vating with KITA (i.e., a kick in the a _ _!).” He
result of these studies. It is interesting to note, defines the components of both negative and
however, that the term (i.e., the Hawthorne effect) positive KITA. On the negative side of the ledger
actually did not emerge until 1950 when Henry he points out that historically (late 1800s and
A. Landsberger (1958) began to reanalyze the early 1900s) many organizations felt that the
outcomes of the numerous studies conducted best way to motivate employees was through
at the Hawthorne Works. Essentially, the design “negative physical KITA,” which included
of these studies was based on the hypothesis creating fear in the employees, intimidating
that if you made working conditions worse, them, and in some instances causing actual
productivity would go down. Over the roughly physical harm. He emphasizes that fortunately
8 years that spanned the numerous investiga- in a more modern context, organizations have
tions at the Hawthorne Works, however, the moved away from negative physical KITA to
researchers discovered that as they made both more subtle approaches characterized by what
the environmental and managerial conditions he calls “negative psychological KITA.” KITA
worse productivity actually improved. Then according to Herzberg produces movement but
when the researchers stopped their observa- not motivation. His analogy is that if he kicks
tions and left the factory, productivity declined. his dog, the dog will move. If he wants the dog
The researchers’ major conclusion was that to move again what does he do? Of course, he
productivity increased, even under increasing kicks the dog. But is the dog really motivated?
worse operating condition, because the workers Herzberg argues that by kicking his dog he
were being observed. But, not all the original achieves movement but not motivation. “I can
researchers or those who followed have all agreed charge a person’s battery and then recharge it,
on the interpretation of the original results. Yet and recharge it again. But it is only when one
the term “the Hawthorne effect” is widely used has a generator of one’s own that we can talk
Creating a Service Excellence Culture 51

about motivation. One then needs no outside being “number one” makes no one number one.
stimulation. One wants to do it” (1968, p. 88). He reviews studies from around the world to
Herzberg (1968) proceeds to review positive place the intrinsic–extrinsic debate into clear
KITA personnel practices that were developed (and frequently controversial) context. His
as attempts by organizations to “motivate” second book, Punished by Rewards: The Trouble
employees. This is a very interesting list of with Gold Stars, Incentive Plans, A’s, Praises and
nine practices that range from reducing time at Other Bribes (1993), continues his analysis of
work to increasing salaries and fringe benefits intrinsic and extrinsic motivation but is less
to requiring human relations training and even academic and far more controversial context (as
employee counseling. But in the long run, he you can probably surmise from the title). Kohn
maintains that such efforts produce only short- pulls no punches and takes a strong stance by
term movement not motivation. Instead of claiming that “rewards simply do not work to
employing any form of KITA, Herzberg argues promote lasting behavior change or to enhance
that it is essential to understand what he calls the performance.” He continues:
“motivation-hygiene theory of job attitudes.” This
Gradually it began to dawn on me that
theory hinges on a key principle that the opposite
our society is caught in a whopping
of job dissatisfaction is not job satisfaction but
paradox. We complain loudly about
rather no job dissatisfaction. Elaborating on
such things as the sagging productivity
this conclusion Herzberg writes: “The growth
of workplaces, the crisis of our schools,
or motivator factors that are intrinsic to the job
and the warped values of our children.
are: achievement, recognition for achievement,
But the very strategy we use to solve
the work itself, responsibility and growth or
those problems—dangling rewards like
advancement. The dissatisfaction-avoidance or
incentive plans and grades and candy
hygiene (KITA) factors that are extrinsic to the
bars in front of people—is partly re-
job include company policy and administration,
sponsible for the fix we’re in (p. xii–xiii).
supervision, interpersonal relationships, working
conditions, salary, status and security” (p. 91–92). Despite all the work that has been done
He cites 12 different studies that interviewed indicating that extrinsic motivation does not
a total of 1,685 employees from a variety of fields motivate employees, there is still an extremely
as well as countries to document his conclusions. strong belief within many organizations and
In short, I think that anyone interested in this governmental bodies that incentives, bonuses,
topic of employee motivation or human moti- and rewards do in fact motivate employees. The
vation in general needs to spend time reading extrinsic motivation camp is firmly grounded in
and digesting Herzberg’s classic work. For some the work of the behaviorists. Notable among this
it may fly in the face of conventional thinking group are Ivan Petrovich Pavlov (1849–1936),
and organizational strategies about motivation. Edward Lee Thorndike (1874–1949), John
But if nothing else, Herzberg’s work provides a Broadus Watson (1878–1958), and of course,
wonderful provocation for setting up a dialogue B. F. (Burrhus Frederic) Skinner (1904–1990).
on a subject that many organizations avoid. Skinner’s influence in the field of psychology is
The final writer who deserves mention for widely recognized. Yet, his theories about human
his work on motivation is Alfie Kohn. Kohn behavior and motivation have been quite con-
has written two critically acclaimed books troversial. There is little middle ground when it
on this subject. No Contest: The Case against comes to discussing (debating?) Skinner’s (1974)
Competition (1986) provides a review of the “radical behaviorism.” He was a firm believer
role that competition has played in human of the idea that human free will was actually
development. Contrary to popular opinion, an illusion and any human action or reaction
Kohn demonstrates that our fascination with was the result of the consequences of that same
52 Chapter 3 Measuring the Voice of the Customer

action. If the consequences were bad, there was to go the route of gain sharing (as opposed to
a high chance that the action would not be re- PFP) as a way to reward the entire organization.
peated. If the consequences were good, on the Because operating margins are relatively small in
other hand, he maintained that the actions that the healthcare industry, a gain sharing approach,
led to the good outcome would be reinforced. where everyone gets the same monetary share of
He called this the principle of reinforcement the overall gain or operating margin, can receive
(Schacter, 2011). favorable support. Gain sharing also has some
This fundamental belief of Skinner (that inherent appeal to employees in not-for-profit
free will was an illusion) supported the extrinsic industries, because true PFP systems, such as those
motivation theories that are still popular today. found in the sales industry, are more difficult to
The proponents of “behavioral engineering” as tie directly to individual performance in healthcare
Skinner called it maintains that people can and settings. In a service industry like health care, it is
should be controlled through the systematic allo- often hard to decide which member of the team
cation of external rewards (Grenning, 1991). For has outperformed the others. For example, the
organizations today, this translates into offering surgeon may have performed an excellent hip
bonuses, incentives, and other inducements to replacement surgery with no complications.
the workers in order to get them to perform in Yet one of the floor nurses discovered that the
ways that management wants them to. It assumes patient was having a negative reaction to a par-
that the workers will do anything for the reward ticular medication, which could have severely
(just like Pavlov’s dog salivating even before the compromised the patient’s recovery. Does the
treat was given). nurse deserve a bonus for catching a potential
I believe this debate over intrinsic and extrinsic medication error? Or has the physical therapist,
motivation is one of the more controversial and who convinced the patient that it was actually
critical dimensions of QI and one that is central better to move the new hip than to stay in bed,
to an organization’s success. Yet most healthcare performed better than the others on the team?
leaders have never had a serious dialogue with In short, given the collaborative nature of the
their management team about this topic. Too healthcare profession, it is often very difficult to
often, healthcare leaders believe that extrinsic determine which individuals have outperformed
motivators do in fact drive the workers. I have the others. An organization’s culture begins with the
seen, for example, many not-for-profit health- people it hires. The culture is cultivated, however,
care organizations create elaborate incentive by the way in which the organization designs its
programs to dole out a few extra vacation days motivational strategies (Berwick, 1995).
or dollars each year to the workers. For many
employees (e.g., nurses) gaining an extra day Component #3: Organizational
or two is not of great benefit, because they are
usually not able to use up the regular vacation Design
time they have accrued. A major challenge for “Every system is perfectly designed to get the
creating bonus/incentive programs in health results it gets.” This phrase has been popular
care is that the actual monetary payouts or at the Institute for Healthcare Improvement
other rewards, such as extra vacation days, are (IHI) for many years. In fact, it is a quotation
small and trivial relative to what they hear their we have on one of the walls in our Cambridge,
counterparts receive in the for-profit sectors. As Massachusetts, office. Since it was first coined
a result, many incentive programs in health care by Tom Nolan, PhD, statistical consultant
do less to motivate and more to provide fodder to the IHI, this expression has been used by
for employee humor and sarcasm.3 thousands of quality professionals to reflect the
In contrast to extrinsic motivation methods, important contribution of organizational design.
some healthcare organizations have decided An organization’s structure and design are the
Creating a Service Excellence Culture 53

result of a direct and purposive set of decisions that even then change did not come without
made by the leadership of the organization. considerable resistance (p. 65).4 The most fre-
Central among these decisions are the ways in quent and well-known quotation from the Balas
which healthcare leaders design, integrate, adapt, and Boren article is a single sentence: “Studies
update, and deploy the following five functions: suggest that it takes an average of 17 years for
research evidence to reach clinical practice”
1. Education of staff and management
(p. 66). It is not uncommon to be in a meeting
2. Communication (including both
about adoption and diffusion of new ideas in
the actual flow of information and
healthcare settings and someone will reference
the ways in which it is disseminated)
this finding and the length of time it takes to get
3. Information flow (including the
new ideas accepted by healthcare professionals.5
amount that is shared with the
The second characteristic that I believe
employees as well as the types of
prevents healthcare organizations from being
information shared)
nimble and quick to respond operationally or
4. Support structures for providing
managerially are the myriad of layers that we
patient care (e.g., the admitting and
have created in this industry. Hierarchies are
scheduling processes and the physical
quite strong and play dominant roles in the
layout of the office space)
healthcare profession. There are hierarchies
5. Leadership structures (including
within the ranks of physicians, nurses, and ad-
the number of formal leaders, their
ministrators. In many cases, these hierarchical
ranges of responsibility, and their
layers are built on decades of tradition and reflect
abilities to manage complex systems)
fairly conservative tendencies to avoid change.
There is no doubt that health care is a These hierarchies did not happen by accident
complex enterprise. Yet from an organizational or pop up overnight. The leaders of the organi-
point of view there are several critical charac- zation created them and they are perpetuated
teristics of the healthcare industry that are not because we allow them to continue. Yet nearly
as prevalent in other industries and that often all of the healthcare professionals I encounter
contribute to organizational stagnation when complain regularly about the lack of flexibility
it comes to being innovative and creative. First, and innovation in their organization’s design and
innovation and rapid change do not come easily administrative structures. Test this when you are
to healthcare organizations even when research next in a meeting to discuss new or innovative
has demonstrated that a new method or technique ways to admit or discharge patients. Within
is efficacious. This is highlighted very clearly in minutes of someone presenting the new idea
the classic article by Balas and Boren (2000). the “Yes Buts” arrive. These are the people who,
They conclude that the healthcare industry has when presented with a new way of discharging
been extremely slow on the transfer of research patients, respond, “Yes but, that’s not how we do
findings to everyday practice. For example, they things around here!” The other variation on this
refer to the work of Oliver Wendell Holmes who situation is when a recently hired person tenders
in 1843 delivered one of his famous papers on an idea that she had seen work at her previous
the “Contagiousness of Puerperal Fever” before organization. After presenting the new idea she
the Boston Society for Medical Improvement. will quickly hear the voice of Yes Buts loud and
Holmes made a strong argument for doctors clear: “Look, you’re new here. I’ve been here for
to wash their hands prior to examining a 18 years and we tried that idea 5 years ago. It
pregnant woman, which was quite novel at the didn’t work then and it won’t work now. So if you
time. Balas and Boren point out, however, that plan to have a future at this organization don’t
it took decades for Holmes’ recommendations go around offering ideas that we know won’t
to become a universally accepted practice and work.” Although not explicitly stated, the goal
54 Chapter 3 Measuring the Voice of the Customer

of many healthcare organizations seems to be to basis and the VOC will surely not be heard. What
maintain the status quo rather than to challenge processes do you have in place for communicating
existing ways of thinking and/or to redesign the your organization’s aim, strategic objectives, and
ways in which things are done. performance? Can the employees articulate your
Jack Welsh, the flamboyant and outspoken mission and strategic objectives? Do you place
former CEO of General Electric, provided a good your key indicators on an intranet or publish
example of how the leader of an organization sets them in employee newsletters so that everyone
the tone for organizational design and change. He can see how you are doing? I have frequently
made it very clear that an employee who wanted seen managers tell the employees that they need
to be promoted within the GE ranks had better to improve the patient satisfaction scores, for
take all their courses in quality and be able to example, but they never share the detailed reports
demonstrate the application of the concepts to with the staff. If an organization is to become
make things better. He personally taught many of truly a “learning organization” (Garvin, 1993;
the classes on quality to the employees. Despite Senge, 1994; Watkins and Marsick, 1993; Wick
what you may think of his management style and Leaon, 1994), then it needs to have a plan
and methods, he was quite clear that the status for continuously evaluating and adjusting the
quo was not going to be the modus operandi at five functions of organizational design.
GE. How many healthcare executives can even
say that they have attended quality classes, let Component #4: Measurement
alone taught them? In an interview with the
Chicago Tribune (April 14, 2002), Welsh made the and Information
following statement about the role of education The final component that builds an organization’s
and learning within an organization: culture (Figure 3-1) is its approach to measurement
and information. Organizational measurement
The challenge in the organization is to
can be classified into two basic categories: (1) mea-
get the collective intellect raised every
surement of the VOC and (2) measurement of
single day. People say to me as I go
the voice of the process (VOP). Within each of
around the country, “Well, gee, this is
these major categories, management and staff
a terrible time to be spending on edu-
must decide how they will assess performance
cation. We’ve got these problems.” My
and how they will use the data they collect to
God, this is the time to be spending.
make improvements. They must also make sure
You can’t stop educating employees.
that they are using the right tools and techniques
Every system (organization) is perfectly designed to really understand the variation that exists
to get the results it gets. I think Jack Welsh would in their processes. Good measurement begins
agree with this statement. But as Deming (1992, with having an overall philosophy and approach
p. 53) pointed out, “Money and time spent for toward measurement and monitoring. Doing
training will be ineffective unless inhibitors to what has always been done and measuring what
good work are removed.” has always been measured or what is convenient
It is very easy to overlook organizational to track will not contribute to the organization’s
design. For some it seems too abstract, for others long-term survival.
too detailed and complicated. System thinking The role of measurement in health care will
lies at the very heart of QI and at the core of only increase. Organizations that do not develop
organizational design. If the organization’s indi- strategies for continuously measuring the VOC
vidual parts and related processes are not aligned and the VOP will most likely find themselves on
toward a common aim, then the organization’s the outside looking in. They will be very upset
long-term results will always be at odds with what when external organizations begin making judg-
goes on within the organization on a day-to-day ments about the care that they deliver. In the end,
Who Are Your Customers? 55

they will not be sure of their way. In fact, they doctor or nurse, and the recipient of care (i.e.,
will be like Alice in Wonderland when she asks the patient). This created a fairly straightforward
the Cheshire Cat how to get out of Wonderland. supplier–customer relationship. Today, things are
The exchange goes as follows: much more complex. Providers of care have to
deal with many individuals and organizations
“Would you tell me, please, which way in order to practice their profession. Similarly,
I ought to go from here,” asked Alice? administrators and managers must also deal
“That depends a good deal on with a myriad of both internal and external
where you want to get to,” said the Cat. customers.7 For example, internal customers can
“I don’t much care where—” said include the board, management, the providers
Alice. of care, and all those employees not involved
“Then it doesn’t matter which way directly with patient care but essential to the
you go,” said the Cat.6 organization’s operation. External customers, on
Measurement without a roadmap and a context the other hand, are probably the most rapidly
for action is a fruitless journey. It creates activity growing and demanding group of customers for
but not accomplishments. The remainder of healthcare providers to accommodate.
this book addresses the quality measurement This means that today everyone in the
journey (QMJ), so I reserve further comments healthcare profession must be deliberately aware
on measurement for subsequent chapters. of the myriad of customers they serve if they
These four components (human resource wish not only to survive but to flourish. We
issues, motivation, organizational design, and depend on others to supply us with various types
measurement) form the key ingredients for of input. At this point, we are their customer.
creating a service excellence culture. Although After we perform the work assigned to us, we
people often think of organizations as physical pass it along to someone or some group that
structures and tangible assets, the truth is that is in turn our customer. A supplier–customer
people and culture they create form the very flowchart, therefore, is a very useful tool to help
essence of organizations. People form the basis individuals and teams appreciate how they and
of all organizations. Individuals with hopes, their customers form an interdependent chain
wishes, aspirations, and desires form the cul- of events. (See my short whiteboard videos on
ture of an organization. How the leaders of an flowcharting methods for details on the supplier/
organization choose to manage and build this customer flowchart and many other flowchart-
culture will determine the organization’s fate. ing methods at http://www.ihi.org/education
In the last 50 years, there have been wonder- /ihiopenschool/resources/Pages/BobLloydWhite
ful examples of how a variety of organizations board.aspx.) Two key components of everyone’s
have built positive and healthy cultures aimed job, therefore, are to (1) know who your internal
at service excellence and examples of those that and external customers are and (2) take steps
seem to have forgotten some very simple and to actively listen to what these customers want,
basic principles. Service excellence does not need, and expect (Lloyd, 2003).
happen by chance. The simple truth is that health care has
become customer driven whether the providers
of care like it or not. When I have discussed this
▸▸ Who Are Your topic with healthcare providers, managers, and
support personnel, however, I have experienced
Customers? a variety of reactions. The traditionalists in the
group ask why all these “untrained patients and
In the past, health care was fairly simple and laypeople” are trying to make decisions about a
direct. There was a provider of care, typically a specialized profession that they are not qualified
56 Chapter 3 Measuring the Voice of the Customer

to address. Those with a more forward-thinking The second group is the “This, too, shall
view of the world, on the other hand, are con- pass” camp. Here, we have a collection of folks
stantly looking for opportunities to include their who have been around long enough to become
customers in the analysis and delivery of care.8 genuinely cynical about any new ways of think-
In order to determine where you and your ing about customers. This group could also be
organization stand with respect to the VOC, classified as the “flavor of the month club.” They
consider your and your colleagues’ responses have seen management fads come and go and
to the following questions: believe that if you merely wait long enough, some
new jargon or acronym will replace last month’s
■■ Who are our internal and external customers?
quick fix. From this group’s perspective, nothing
■■ Which group of customers is our highest
really changes except the new management fad
priority?
or slogan of the month. Although skeptical, this
■■ How deeply do we really listen to our
group can become tremendous agents of change
customers?
if they are given evidence that the organization
■■ How often do we seek their input?
has constancy of purpose (Scherkenbach, 1990,
■■ What do our customers want, need, and
p. 13) and is sincere about turning management
expect from us?
jargon into action. The third group is the “eternal
■■ Do our customers really know what they
optimist group.” These people are true believers.
want, need, or expect?
They will get behind new ideas and support them
■■ We serve patients—what’s all this stuff about
because they fit with the organization’s mission
customers, anyway? Customers buy cars and
and values and their own personal belief struc-
refrigerators, not health care.
tures. They want to help and be team players, and
■■ Is the customer always right? Some of these
they will go out of their way to be nice to their
people make unrealistic demands on us.
coworkers and especially patients. Frequently,
■■ The patients are not trained healthcare
this group irritates the first group because of
professionals—how do they know what
their egalitarian optimism and receives benign
they need?
indifference from the second group because
I have had many interesting (and sometimes they are not cynics.
quite enthusiastic) conversations with healthcare Patients basically believe that the caregivers
professionals about these questions. There seem are competent and know the technical aspects of
to be three primary schools of thought about the their jobs. For example, most patients do not know
role of customers. The first group is the “You’ve whether they should have stitches or staples in
got to be kidding” school. Members of this group their incisions. They do not know whether they
basically cannot believe that they even have to should have a pill for their pain, an IV drip, or an
listen to what the consumers of health care want, injection. What they do know, however, is that
need, or expect. This group is primarily made up they want to be treated with dignity and respect.
of individuals who long for the “good old days” They want their questions answered, and they
when patients were seen but not heard. They do not want to wait for hours on end before they
maintain beliefs that are essentially grounded receive service or get answers to their questions.
in the supposition that they are the center of In many ways, our patients and their families are
the universe and that all the other objects ought asking for basic human courtesy. This point has
to rotate around themselves. Those who come been well documented over the years. In a study
to these individuals for care and advice are as- of more than 6,000 parents of children cared for
sumed to have less medical knowledge and are, in 38 hospitals across the country, Co, Ferris,
therefore, not qualified to be equal participants Marino, Homer, and Perrin (2003) discovered
in the discussions about care. Fortunately, this that overall parental satisfaction ratings of the
category is not as dominant as it once was. care their children received was most closely
Defining Key Quality Characteristics 57

associated with (1) the communication the these approaches to actually listen to the VOC.
parents received about their child’s condition A viable service excellence program, therefore,
and (2) the involvement they had in making requires the deliberate creation of structures
decisions about their child’s care. and process aimed at making the VOC part of
Another perspective on this issue comes everyone’s job 24/7/365.
from Dr. John Stone, a cardiologist and poet from A critical step in this journey lies in turning
Emery University School of Medicine. Dr. Stone’s this verbal orientation to service excellence
classic book, In the Country of Hearts (1990), into tactics and strategies for actions that guide
directly addresses the human side of the medical behaviors. It begins by identifying what matters
profession. In the section titled “Listening to the most to your customers. Classically, these have
Patient,” he writes: “The inventor of the stetho- been referred to as quality characteristics (QCs).
scope, Laennec, knew that the medical history QCs represent a broad collection of things the
given by a patient is just as important—sometimes customers care most about. For example, my
even more important—than the ‘lup-dup’ of the wife has a floral design business. Her business
patient’s heart.” Among his advice to physicians has been growing (blossoming, if you will), and
is: “Listen! Listen to your patient! He is giving she decided it was time to buy a vehicle to carry
you the diagnosis” (1990, p. 65). This is the same all the arrangements to her customers. So we set
advice given by Dr. Francis Peabody in 1927 in out exploring our options. She quickly developed
the article “The Care of the Patient.” a list of QCs. She knew right away that she did
Service excellence stems from a constant not want a sport utility vehicle (too small and
and deliberate dedication to listening to the boxy). She also eliminated a station wagon for
VOC. But listening is not enough. Listening the same reasons. This quickly reduced the field
provides the context for responding. If pro- to a van. As we explored the various QCs of
viders of healthcare services do not listen and vans, she soon got down to a list of key quality
respond appropriately, their customers will find characteristics (KQCs), the things that she cared
alternatives. Establishing what Connellan and most about. Interestingly enough, she actually
Zemke call “knock your socks off service” (1993) got down to a single KQC—sliding doors on
requires more than just listening. It requires both sides of the van. She really did not care
finding out what customers care most about and about too many other options. She knew that
then building the structures, process, and culture she would need to be able to access the interior
that sustains service excellence as an ongoing of the van from both sides. Some vans had only
part of every employee’s mindset and behavior. one door, which was unacceptable from her
point of view.
Another example of defining a KQC comes
▸▸ Defining Key Quality from a friend who planned to buy a new car. Being
a car buff, I asked her what she was thinking of
Characteristics buying, she said proudly, “Anything that is less
than 190 inches long.” I told her that I had never
Having a customer service orientation or phi- heard of anyone buying a car strictly based on
losophy is only the beginning. I have observed length of the vehicle. She said that it was very
numerous healthcare organizations that claim simple. Both she and her husband are great
to have a “fully integrated service excellence Harley-Davidson fans. Between the two of them
program” but the program exists primarily on they own three Harley-Davidson motorcycles,
paper, within the pages of their annual report which they keep in the front part of their garage.
or in the methodology of the management This leaves roughly, yes, 190 inches remaining
incentive bonus program. When you look past in the garage for a car. KQCs are simple and
the words and rhetoric, there is no substance in straightforward when you listen to the VOC.
58 Chapter 3 Measuring the Voice of the Customer

Defining KQCs enables you to start the however, orders will usually be followed even if
VOC process. KQCs are the things customers they are far from the patients’ KQCs.
care most about. They are the aspects of a process We all have stories about our encounters
that are foremost in the minds of the customers. with the healthcare system. Here is one of mine.
We all have KQCs irrespective of what the issue In 2009, I had a total right hip replacement.
or process is. If we are not asked, however, our I interviewed five surgical groups in my area to
wants, needs, and expectations remain unknown. determine which one I wanted to surrender my
KQCs have the following characteristics: hip to. I used the World Health Organization
(WHO) surgical checklist as a starting point for
■■ They reflect quality as defined or judged
discussion with each surgeon and then asked
by the customer.
to see their data on operating time, revisions,
■■ They reflect aspects of the process that the
infection rates, type of procedure used, and
customer cares most about.
type of implant used. Yes, I know what you are
■■ They represent to the customer the key
thinking; I was not the usual type of patient.
measures of quality output.
In fact, when I asked one of the surgeons all
■■ They will vary from customer to customer.
these questions he stopped and asked me,
■■ They will create conflicts for you because
“Who are you and why are you asking me all
you cannot meet all of your customers’
these question?” He did not get picked as my
needs simultaneously.
surgeon. One of the other groups I eliminated
Whenever you start to listen to the VOC, you right away when the surgeon was “too busy” to
should answer several basic questions: meet with me and sent a physician’s assistant
(PA) to interview me. When the surgeon did
■■ What is the process or system the customer
not show up and the PA explained that he was
experiences?
“tied up,” I politely thanked her for her time
■■ When does the process or system begin and
and left. Now I was down to three groups. The
when does it end?
surgeon I finally selected (remember they had
■■ Who is the primary customer? The second-
to convince me that I should give them my hip)
ary customer?
started by asking me “What can I do to improve
■■ Do we have consensus on the primary
the quality of your life?” This fellow had my
customer and what matters most to them?
attention right away. He wanted to listen to the
■■ Can we segment the customers into rea-
KQCs that brought me to him. He focused on
sonably homogeneous subgroups so that
me and instead of asking “what’s the matter”
we are clear on what the KQCs are for each
took the VOC road and asked “what matters to
group?
you?” The procedure and the outcomes exceeded
■■ What aspects of the process (the KQCs) are
my expectations.
most important to the primary customer?
One of the most frequently overlooked aspects
If you do not at least have a dialogue on of KQCs is that they will vary from customer
these questions, you will probably assume that to customer. This requires that you once again
you know what the customer wants, needs, or realize that you do not have just one customer
expects. In health care, we have a long history but many. Even within a group (e.g., hip or knee
of being quick to prescribe and slow to listen. replacement patients or breast cancer patients)
Historically, orders were given and the patients there will be variation. Treating hip replacement
were expected to follow them. Yet the instruc- patients as if they all have the same KQCs is
tions or orders, which seemed perfectly logical not only disrespectful to the patients but also a
to the physician, may not have been what the good indicator that the organization does not
patient wanted, needed, or desired. Because most really listen to the VOC. The concept of mass
patients defer to the judgment of the physician, customization is critical to understanding the
Defining Key Quality Characteristics 59

KQCs within segments of those you serve. Mass frail and had a very different set of KQCs than
customization is defined by Tseng and Jiao (2001, those on my list. This was mass customization
p. 685) as “producing goods and services to meet applied to a healthcare procedure.
individual customers’ needs with near mass pro- Ultimately, if you cannot agree on the primary
duction efficiency.” Manufacturing companies customer and their specific needs and expecta-
have used this notion of mass customization to tions, then the KQC discussion will go round
deliver exactly what the customer wants. I saw and round. This can also occur if you decide to
this occur during a tour of the Harley-Davidson change the primary customer midway through
motorcycle assembly plant in York, Pennsylvania. an improvement initiative. If this happens, then
As the bike frames come down the line, a laminated you should revisit the KQCs and see whether
barcode card is scanned and all the specific cus- they are still relevant to the new customer group.
tomer KQCs are read and built into the assembly It constantly amazes me how improvement
process. This means that the fenders, type of seat, teams cannot reach consensus on the primary
size of engine, handle bars, and paint color are customer. As an example, let me offer a story
customized on the spot. Even in the paint shop from another service sector—education. I was
each bike is painted the color preference of the working with a local school district to assist them
customer. This means that they do not paint all in defining indicators. They were participating in
black bikes on Monday and red bikes on Tuesday. a national demonstration project that required
As the bar code is read, the robotic paint ma- them to identify “quality indicators” that could
chines switch out the colors for each bike. I know be tracked over time. They were ready to start
what you’re are thinking. “But we are health defining indicators and looking at the data.
care not Harley-Davidson!” True. Health care I asked them to consider a simple question be-
is different, but we have many opportunities for fore we started to peer at numbers: “Who is the
applying the principles of mass customization primary customer of the educational system?”
to our patients. The team was composed of administrative staff,
I experienced mass customization when I teachers, and community representatives. As
had my hip replacement surgery. The surgical you can imagine, there were mixed responses to
group I selected to do my procedure had a meeting my question. The majority of the responses had
of all the hip and knee replacement patients and something to do with the students and/or teachers
their respective caregivers 2 weeks before the as the primary customers. I commented that if
procedure. In this “Joint Adventures” meeting the students are the primary customers and the
(seriously it was called Joint Adventures), they school really listened to their wants, needs, and
explained how the procedure would be performed, expectations, then school would probably begin
what would happen to the patient, the expectations around noon, lunch would be the first period,
for both patient and caregiver involvement in then free time would be scheduled for the next
the recovery process, and rehabilitation plans. 2 hours before dismissal at 3 p.m. School sys-
They also had us fill out a brief survey asking tems are not designed to maximize the KQCs of
us about our exercise habits, expectations for students. As we explored this question further,
how this would improve the quality of our lives, some team members admitted honestly that the
activities of daily living and current functional teachers were the primary customers. Others
status, and what we hoped to be able to do once concluded that the community was the primary
the procedure was over that we could not do customer. Still others thought the business sector
now. All of this information was then used by the was the primary customer. Eventually, as the
care management team and the nursing staff to discussion slowed the majority concluded that
customize a recovery plan for each of us. At 63 society was the primary customer of the edu-
I was the youngest patient in the group. Sitting cational system. Personally, I am not sure they
next to me was an 84-year-old woman who was ever really reached consensus on their primary
60 Chapter 3 Measuring the Voice of the Customer

customer of the educational system. Periodically exclusive. In fact, relying on only one or even
I run into the school superintendent and ask him two of these approaches will place tremendous
if they ever resolved this question. He usually limitations on understanding what customers
responds, “No, but we are still working on it.” want, need, and expect. Establishing multiple
Now consider a healthcare example. Imagine listening points should be the standard for anyone
that you are responsible for the billing process. who genuinely cares about customers. Multiple
If you define the patient as the customer, the listening points not only enhance the amount
KQCs might be an accurate bill, a timely bill, or of information received but also increase the
a bill that is easily understood (i.e., no procedure validity and reliability of the findings (Campbell
codes, cryptic Latin phrases, or other medical and Fiske, 1959). TABLE 3-1 provides a summary
jargon). If, on the other hand, the insurance of the objectives of the three listening points.
company is the primary customer, then detailed The first time to be listening to our cus-
procedure codes, insurance classification codes, tomers is before they experience our products
and medical terminology might be the desired or services. This is what is typically known as
KQCs. My point is that if you change the primary preservice assessment. For example, if a hospital is
customer the KQCs will also change. But if you thinking about creating a new service for women
cannot agree on who actually is the primary and children, they should devise methods and
customer, then the team’s discussion will have techniques to find out what potential patients,
a lot of rhetoric but little practical value. families, and payers expect from this service before
it is designed and implemented. How often do
we decide to open a service or program before
▸▸ Listening Three Times we determine whether there is a market for the
service? Health care is not like the classic line
FIGURE 3-3 shows three key points at which from the movie Field of Dreams—“If you build
listening to customers should occur. Note that it, they will come!” Asking people how they
these three listening points are not mutually would like something to work is the first step

Preservice Identify customer expectations

Design customer-friendly
Solicit systems
Manage
Point-of-service point-
unsolicited
of-service
feedback
feedback Identify opportunities for
further improvement

Postservice Measure organizational performance

How often do you listen to the VOC?

FIGURE 3-3 Three listening points


Understanding VOC Tools 61

TABLE 3-1 Summary of the objectives of the three listening points

Preservice Point-of-Service Postservice

Identify the needs and Delivering and managing care Determining how well customer-
expectations of potential while patients and families are focused care and service were
customers before they experiencing care or service. This delivered. Evaluation is after the
experience the actual must be done at the site of care, delivery of care and should focus
delivery of care or service. for all levels of care and along all on how well we met customer
This should be an essential points of the continuum. These expectations and needs and/or
part of designing new contact points become the the experience of the customer.
services. “moments of truth.”

to a successful design. This type of listening is In this case, patients are asked to provide their
frequently overlooked in health care. opinions after they have been discharged. Most
A hospital I worked with in the Chicago postservice assessments are obtained through
area successfully applied preservice listening mailed, telephone, or computer-based (email)
when they redesigned their outpatient testing surveys. Postservice assessments have three
and therapy area. Focus groups were used to major advantages over face-to-face interviews:
collect data from patients, families, physicians’ ■■ They can be more representative, reliable,
office staff, and insurance companies. They were and valid than asking patients for opinions
all asked to describe how and when they would while they are undergoing care.
like the testing and therapy process to work. ■■ They provide patients with an opportunity
Because each group had a little different perspec- to objectively reflect upon their experiences.
tive on the situation, the redesign team was able ■■ They allow patients to respond according
to gather a variety of insights that were used to to their own time frames.
design the new testing and therapy area. If they ■■ They allow you to gather data from a larger
had not asked the customers before they designed segment of your total population of patients.
the new service area, they would have built a
facility and supporting processes that made Because surveys are used so frequently in
sense to the staff, managers, and designers of healthcare settings, I provide more detail on
the service but not necessarily to the customers. this method in the next section.
The second time to listen to those we serve is
while they are experiencing service or care. This
is known as point-of-service (POS) assessment.
Many healthcare providers are actively engaged
▸▸ Understanding VOC
in this type of listening. This can include patient
interviews, leadership walk-rounds,9 focus groups,
Tools
interactive TV surveys, recorded interviews A comprehensive VOC measurement system
(audio or video), and computer feedback kiosks. should combine all three types of assessments.
POS feedback can also come from unsolicited The first step toward building such a system is
feedback and complaints. to become familiar with all the different tools
The third and most frequently used method and methods that can provide VOC insights.
of listening to the VOC is postservice assessment. TABLE 3-2 summarizes the dominant methods
TABLE 3-2 VOC feedback methods and tools
62

Service Level
Tool or
Approach Pre Pos Post Cost Advantages Disadvantages

Focus Groups X X Moderate ■■ In-depth qualitative data ■■ Low generalizability


to high cost ■■ Do not require large samples ■■ Requires skilled facilitators
depending on ■■ High flexibility ■■ Not anonymous
the number ■■ Can identify new issues or concerns
conduced and while conducting a focus group
who conducts
them

Observation X Low cost ■■ Easy to do ■■ Low generalizability


■■ Highly flexible ■■ Low value for comparisons
■■ Limited to publicly
observable behavior
Chapter 3 Measuring the Voice of the Customer

■■ Requires considerable time


and effort

Personal X X X High cost ■■ Very detailed data ■■ Very labor intensive


Interviews ■■ Easy to probe for additional data ■■ Very time consuming
■■ Effective with all socio-economic ■■ Quality of data depends on
levels skill of interviewer
■■ Not anonymous

Leadership X Low cost ■■ Provides real time feedback from ■■ Scheduling can be a
Walk-rounds staff challenge for senior leaders
■■ Identifies issues related to quality, ■■ Focusing the walk-round
safety and customer service discussions on key quality
■■ Engages senior leaders and staff in characteristics requires focus
dialogue and commitment
■■ Taking action on the
identified issues requires
leadership commitment
TABLE 3-2 VOC feedback methods and tools (continued)

Service Level
Tool or
Approach Pre Pos Post Cost Advantages Disadvantages

Unsolicited X X Very low cost ■■ Identifies extreme dissatisfiers/ ■■ Virtually no opportunity to


Feedback satisfiers generalize findings

High-Tec Tools X Very high cost ■■ Provides real time feedback ■■ May not be appealing to
(TV & Touchpads) ■■ User-friendly certain groups of respondents
■■ Flexibility ■■ No control over sample of
respondents

Experiential X High cost ■■ Tremendous depth of data ■■ Low generalizability for


(The “Mystery ■■ Can cover all aspects of a customer’s defining a “typical experience”
Shopper”) experience ■■ Requires the mystery
shoppers to be trained and
articulate

Surveys X X X Moderate ■■ Generalizability ■■ Requires rigorous protocols,


to high cost ■■ Offers continuous monitoring valid and reliable instruments
depending ■■ Provides comparative reference data ■■ May require sampling
on the ■■ Versatility ■■ Not highly flexible
methodology ■■ Reasonably quick to implement ■■ Unwillingness of individuals
used to participate
■■ Inability of respondents to
recall
Understanding VOC Tools

Modified from Advocate Health Care. Used by permission of Robert Lloyd.


63
64 Chapter 3 Measuring the Voice of the Customer

and tools used most frequently to gather cus- is actually one of the most insightful methods
tomer input. This table indicates at which of the of obtaining POS feedback. This approach is
three measurement points (pre, POS, and post) relatively inexpensive (compared to surveys
the method or tool can be used, as well as some and focus groups) because the principal costs
of the advantages and disadvantages of each are essentially related to staff time. The observer
approach. Note that some of these methods or merely selects a reasonably unobtrusive location
tools can be used at multiple listening points. and notes what occurs during a defined period
of time. With a minimum level of training,
the observer can record a fairly rich volume of
▸▸ Focus Groups information. The key to success with observa-
tional studies, however, is that there needs to be
Focus groups are typically the method of choice a standardized method for observing and most
for gathering detailed information after the important a consistent method for recording and
patient is discharged. The major drawback of documenting the observations. Unfortunately,
focus groups, however, is that they are generally all too often observational studies are done as
done with small groups (10–15 people) and the ad hoc and periodic exercises designed to see
results cannot be generalized to larger popu- if there is “a problem.” As a result, the observa-
lations (even though many people try to draw tional studies produce a few anecdotal insights
conclusions about the population represented but little in the way of data or information that
by the focus group). However, focus groups do can be used to make improvements. The major
provide a wonderfully rich and detailed listening drawbacks with data collected through obser-
mechanism. Remember—focus groups provide vational studies are:
depth but very little breadth or generalizability.
Another pitfall with focus groups is that the ■■ The observer can be in only one place at a
people running the focus group frequently think time. If you want to cross-validate observa-
that they do not have to be very rigorous in the tional results you will need to assign multiple
planning and implementation of a focus group observers assigned to multiple groups.
session. Many of the same threats that challenge ■■ The observer sees events from a limited
the validity and reliability of surveys also apply time perspective (e.g., the observer was in
to focus groups. If focus group facilitators are the emergency department [ED] for 3 hours
not trained in how to form and manage a focus on a Monday morning).
group, ask questions, and then build on the re- ■■ The results, like those for focus groups,
sponses of the participants, they will end up with have very low generalizability (i.e., can
a rambling manuscript that provides little useful the observations made on Monday morn-
information. Focus groups are not informal chats ing in the ED be generalized to the rest of
with customers. Focus group feedback should the day, to the other days of the week, or
not be used in isolation, however. It should be to the other shifts? The simple answer is
combined with the results from leadership walk- “No.”).
rounds and staff rounding and then compared with ■■ The observers need to be trained in a
the preservice and postservice results to build a standardized methodology so that they
more comprehensive understanding of the VOC. are all looking for the same types of events,
situations, or encounters. Observation fre-
quently suffers from not only respondent
▸▸ Observation bias but also situational bias (i.e., the very
nature of the situation causes the observer
Observation is an effective method for gathering to selectively focus on some events and
data on the current experience of customers. This ignore others).
Unsolicited Feedback 65

■■ Finally, observational studies require having to be improved, and (3) determine priorities. An
individuals skilled in conducting qualitative additional benefit of conducting leadership walk-
data analysis (Crabtree and Miller, 1992; rounds is that the patients and family members
Krippendorff, 1980; Patton, 1987; Webb, find it very comforting to have someone from the
Campbell, Schwartz, & Sechrest, 1969). management team spend a few minutes talking
with them about their care and the processes they
are experiencing. This form of feedback not only
▸▸ Personal Interviews provides an opportunity for immediate service
recovery but also the ability to start identifying
Personal interviews can be used at all three VOC patterns and opportunities for QI teams.
listening points. Like observation, this method Two cautions with this type of feedback. First,
can provide very rich and detailed information. it typically produces more positive results than
Interviews have the added advantage, however, postservice evaluations do. If you rely strictly
of being able to provide direct one-on-one inter- on this form of feedback, therefore, you will
action with the customer. Interviews have a great probably start to think that you are much better
deal of appeal because they seem so simple on than you really are. The reason for this positive
the surface—find someone and go talk to him bias is simple—patients are hesitant to be totally
or her. The challenge comes in having skilled honest with you while they are still under your
interviewers who are able to probe without care. They are concerned that you may hold
guiding the respondents to the answers being something back or retaliate against them if they
sought. Personal interviews are not investigative are honest and tell you what they really think of
reporting sessions. Interviews need to be well the care and service they are receiving. As a result,
thought out and conducted with consistently it is not uncommon to have a frequent response
applied methods of inquiry. Finally, if the re- from POS assessments be, “Oh, everything is just
sults are to be reflective of a larger population, fine.” Second, if follow-up action is not taken
then the interviews need to be of sufficient on the identified issues or topics presented by
volume to capture a representative sample. This patients or staff management loses credibility.
can be quite time consuming, expensive, and This is why I usually set up two key measures
more involved that just talking to a few people. when a group wants to engage in leadership
walk-rounds: (1) the number of issues identified
during the walk-round and (2) the percentage of
▸▸ Leadership Walk-Rounds identified issues that were addressed (or better
yet resolved) within 30 days of identification.
One of the more successful methods used to Listening without responding is in most cases
gather POS feedback is leadership walk-rounds, worse than not listing in the first place.
sometimes called leadership rounds (Budrevics
and O’Neill, 2003; Frankel et al., 2005; IHI,
n.d.; Thomas, Sexton, Neilands, Frankel, &
Helmreich, 2005).9
▸▸ Unsolicited Feedback
In this approach to listening, managers Unsolicited feedback is one of the most effective
and administrators review a very brief set of methods of gathering information on customer
structured questions with patients and/or staff. concerns. Regrettably it is frequently overlooked
The leadership walk-rounds process of gather- in healthcare settings and in some instances
ing real-time feedback allows management to ignored. Ignoring unsolicited feedback occurs
(1) get out of their offices and visit the places most often when customers are so upset that they
where work occurs, (2) listen to patients and say, “I want to speak to someone in charge and I
staff to determine which topics or issues need want to speak to them right now!” Most often this
66 Chapter 3 Measuring the Voice of the Customer

happens when customers feel the need to voice issue of limited appeal is the fact that it is hard
dissatisfaction with the care, service, or outcomes to control who uses them, how many times the
they are receiving. Most unsolicited feedback same person might interact with the same touch
is negative, which is why some organizations screen, or most important, if they provide a
choose to bury it in administrative channels or reliable estimate of the perspectives of the total
ignore it. But unsolicited feedback, especially population being served.
negative feedback, does provide an opportunity
to identify potential areas for improvements. The
central challenge with this type of feedback is
not to overreact to single events. Ask yourself,
▸▸ The Experiential
“How many times has this happened in the past
week or month?” If it is an isolated incident, en-
Shopper
gage in root cause analysis and service recovery The use of the experiential shopper has increased
strategies immediately. If the complaint happens in popularity over the past 5 years. This is a classic
repeatedly, consider collecting more data on the POS method that stems primarily from the retail
issue to see if there is a pattern and a need for shopping industry (i.e., the “mystery shopper”).
widespread improvement work. Its adaptation to the healthcare field has required
a number of revisions, however. In healthcare
settings, the experiential shopper is usually a
▸▸ High-Tech Tools hired external consultant or a staff person who
poses as a patient to gain firsthand knowledge
Increasingly healthcare organizations are discover- of what it is like to be a patient. Typically, the
ing high-tech tools as a way to listen to the VOC. experiential shopper will call to see how long it
These include interactive televisions in patients’ takes to get an appointment or have an outpatient
rooms, small handheld computers (personal dig- procedure done. I know of one medical group,
ital assistants or PDAs) used by nurses to record for example, where each week the CEO spins a
patients experiences, and interactive kiosks. This wheel similar to a roulette wheel that contains
approach to listening is the most expensive of all the names of each medical practice around the
the methods, and the returns are frequently of edges of the wheel. When the wheel stops, she
marginal value. High-tech tools are quite flexible calls the particular medical office indicated and
because they can be used 24 hours a day, 7 days asks how long it will take to get a prescription
a week, 365 days a year and they do not require refilled, get an initial visit scheduled, or obtain
direct staff interaction as is required by personal a follow-up appointment. She then uses the
interviews or focus groups. They are also able to responses she gets as input for the weekly prac-
provide rapid-cycle feedback because the devices tice manager meetings. A new wrinkle on this
for recording the interaction can be tabulated approach has been that some consulting groups
on demand. Besides being expensive, the major will actually hire out their employees to schedule
disadvantage is that high-tech tools appeal to a and experience selected procedures (e.g., blood
limited segment of the population. Elderly patients draws, x-rays, CAT scans, and believe it or not
usually are not eager to engage with interactive some invasive procedures like colonoscopies)
TVs and computer kiosks. Typically, high-tech to see how well the administrative and clinical
devices pique the interest of younger cohorts. I processes go. One of the major problems with
had one organization, for example, tell me that the high-tech methods is that they suffer from
they discovered that one 13-year-old boy had low generalizability. If staff pose as the mystery
completed a patient satisfaction survey on an shopper, one of the side benefits of this method
interactive kiosk in the lobby 23 times. He was has been that they typically gain a new apprecia-
bored waiting for his mother. Related to this tion of how patients view healthcare encounters.
Surveys 67

▸▸ Surveys ■■ Creates direct and indirect interactions


between the survey instrument, modality of
Surveys are essential tools for a majority of the VOC administration (e.g., mail, telephone, struc-
approaches outlined in Table 3-2. Focus groups, tured interview, focus group, or computer
personal interviews, and interactive high-tech based), the person conducting the survey
kiosks, for example, all require a survey format (especially if it is a structured interview or
of some sort to lay out the questions presented focus group survey), the survey format, and
to the respondents. The methods of distributing the respondent!
a survey may differ (e.g., structured interview, ■■ Consists of individual questions (sometimes
mailed, telephone interviews, an Internet survey, called items) and potentially subscales. Sub-
or an interactive high-tech device), but they scales (or scales) are groupings of individual
all rely on a survey of some sort to start with. questions, usually in the range of five to
Surveys can be used at all three listening points eight questions that are thought to “hang
(preservice, POS, and postservice) and if they together” and capture some underlying
are properly constructed, administered, and concept, function, or domain. Examples
analyzed surveys not only can provide valuable of subscales frequently formed for inpa-
insights about current customer experiences but tient satisfaction surveys include nursing
also can provide an opportunity to make com- care subscale, physician communication
parisons over time. Carey (1999) provides good subscale, pain management subscale, or
guidance on how to choose a survey system. comfort and cleanliness subscale. Not all
So what is a survey? Basically, a survey is surveys are constructed around subscales.
a standardized measuring instrument that is External surveys developed by vendors or
designed to take concepts, ideas, or behaviors oversight bodies generally will group the
that may be vague in nature (e.g., satisfaction, individual survey questions into specific
excellent care, conservative, liberal, or quality) subscales, so that comparisons of the vari-
and attempt to quantify these concepts so that ous function related to patient care can be
decisions can be made (inferred) from the data. compared. It is important to realized that
Key features of a survey include that it: subscale development is not an intuitive
process (i.e., you do not create subscales
■■ Is purposeful and is designed to build a because you or your colleagues “feel” that
systematic information collection system. a particular number of questions more
■■ Is focused on a question, problem of interest, or less hang together and reflect a similar
or issue and is designed to be exploratory concept). Subscales are created through a
(i.e., what’s wrong?) in nature rather or fairly elaborate statistical analysis process
confirmatory (i.e., did we fix it?). that is based on factor and correlation
■■ Involves question design, a data collection analysis (Child, 1973; Harman, 1976; Kim
plan (including stratification criteria and a and Mueller, 1978a, 1978b). Many people
sampling strategy), analysis, and reporting. erroneously say they have “scales” on their
■■ Should have a degree of validity and reli- surveys, but these were most often not de-
ability. Even if a survey is considered to be rived by applying psychometrics methods
valid and reliable, based on psychometric based on factor and related analytic tech-
analysis, it must be realized that valid and niques. Note that most commercial survey
reliable in one context or for one segment products have been put through factor
of the population does not necessarily mean analysis methods, but most ad hoc (locally
that the survey will be valid and reliable in developed) surveys have not. So, if someone
other contexts or with other segments of uses the term subscale or even scale, you
the population. are well within your rights to politely ask
68 Chapter 3 Measuring the Voice of the Customer

them what they mean by the term. In some ■■ Internet, Phone, Mail, and Mixed-Mode
instances, people use the terms subscale Surveys: The Tailored Design Method, 3rd
and scale synonymously. Again, subscales ed., by Don Dillman, Jolene Smyth, and
are groupings of individual questions that Leah Melani Christian. New York: Wiley &
have been shown statistically to capture Sons, 2009.
some underlying concept or domain. The
term “scales,” on the other hand, is used in The Bradburn et al. book on Asking Questions is
survey research primarily to refer to the a very good place to start learning about what
type of measurement used in a response constitutes a well-written survey question.
format (i.e., nominal, ordinal or rank order, Hayes’s 2008 book not only describes how to
interval, and ratio scales). More will be said construct a good survey but also lays out very
about response scale shortly. clear steps for including surveys into a QI pro-
gram, including the use of Shewhart control
The details on these key features of surveys charts for analyzing the results. Dillman’s work
can be obtained from a myriad of books that have (2009) is considered by most experts in this field
been developed on this subject over the years.10 to be essential reading. He not only provides
To begin building your knowledge and skill in guidance on creating what he calls the “total
the design and use of surveys, I suggest reading design method” but also discusses in a very
a few of the classics in the field of psychometrics practical manner the historical developments
and then some of the more recent the works. surrounding the use of surveys, questionnaire
Classic Readings in Survey Design and Use: formats, open and closed-ended questions,
■■ Interviewing: Its Forms and Functions by implementation and administration of surveys,
Stephen Richardson, Barbara Dohrenwend, and the use of mixed-mode surveys. The first
and David Klein. New York: Basic Books, edition of Dillman’s work was published in
1965. 1978. It is no wonder that this book is now in
■■ Readings in Attitude Theory and Measurement its fourth edition. If you are going to read one of
edited by Martin Fishbein. New York: John the books in this listing I’d start with Dillman’s
Wiley & Sons, 1967. book. You won’t be disappointed. Finally, in
■■ Social Psychology by Muzafer Sherif and addition to reading you might also consider
Carolyn Sherif. New York: Harper & Row, sending one or more individuals on your team
1969. to a class or workshop on survey design. Fre-
■■ Attitude Measurement edited by Gene Sum- quently local colleges and universities provide
mers. Rand McNally & Co., 1970. (This is a a good resource for classes in psychometrics.
wonderful collection of many of the classic These course are usually taught in sociology,
articles on attitude measurement.) psychology, or marketing research programs.
Now that we have outlined the key features
More Recent Books on Survey Design and Use: of a survey, let’s apply this knowledge to a survey.
■■ Asking Questions: The Definitive Guide to Spend a few minutes reviewing the survey shown
Questionnaire Design—For Market Re- in EXHIBIT 3-1. What is your opinion of this sur-
search, Political Polls, and Social and Health vey? Is each question clear and unambiguous?
Questionnaires by Norman M. Bradburn, Are the response options appropriate for each
Seymour Sudman, and Brian Wansink. San of the questions? Do you think patients would
Francisco, CA: Jossey-Bass, 2004. complete this survey or find it bothersome and
■■ Measuring Customer Satisfaction and therefore not even finish it before leaving the
Loyalty: Survey Design, Use, and Statistical clinic? Develop some of your ideas and impres-
Analysis Methods, 3rd ed., by Bob E. Hayes. sions before reading my observations following
Milwaukee, WI: ASQ Press, 2008. the exhibit.
Surveys 69

EXHIBIT 3-1 Survey assessment exercise

Happy Valley Medical Clinic


Please complete this survey and give it to the receptionist before leaving.
Date of visit: ___________ Time: ___________ Dr. ____________
1. Was your appointment on time?
Yes _____ No ____ Not Sure ____
2. Were you greeted in a friendly manner and did the receptionist answer your questions?
SA _____ A ____ Not Sure ____ D ____ SD
3. Did you have to wait longer than expected to see the doctor?
Not Too Long _____ Sort of Long ____ Very Long ____
4. Where the chairs comfortable?
Yes _____ No ____ Not Sure ____
5. Did we treat you with dignity and respect?
Definitely ____ A Little ____ Somewhat ____ Not at All ____
6. Did the doctor answer all your question in a thorough and friendly manner?
Yes _____ No ____ Not Sure ____
7. How long ago did you make this appointment?
Date: _____________
8. Did you use our valet parking service?
Yes _____ No ____ Not Sure ____
9. Will you recommend our services to friends and family?
Definitely ____ A Little ____ Somewhat ____ Not at All ____
Use the following space to tell us anything else that concerns you about your health or our services.
_______________________________________________________________________________

You’ve probably concluded that this survey doctor was on time for the appointment?
has a number of problems. How many of the The question is confusing. Second, the in-
following issues did you identify? clusion of Not Sure as a possible response is
not appropriate. This question is basically an
■■ Why didn’t a staff person fill in the admin- event question that should have simply a Yes
istrative items (Date of Visit, Time of Visit, or No response format. But again the entire
and Doctor to be seen)? This is probably just question is confusing. An alternative would
going to annoy the respondent. They are be to offer forced-choice response categories
thinking, “Why didn’t they fill this in? They relating to how many minutes the patient
knew I had an appointment.” Little things had to wait in the waiting area, in the exam
like this can create a negative context for room, or until the doctor actually saw the
respondents and can bias their responses. patient. This could also be done by having
■■ Question 1: Was your appointment on time? the patient write down the time when each
Yes, No, or Not Sure. This is a problem on of these steps in the process occurred.
several counts. First, what is it trying to ■■ Question #2: Were you greeted in a friendly
ascertain? Is this trying to capture whether manner and did the receptionist answer your
the patient showed up at the appointed questions? This is a classic example of what
time? Or is it asking about whether the is known as a double-barreled question. It
70 Chapter 3 Measuring the Voice of the Customer

contains two parts. Part 1 is, “Were you the waiting area, the exam room, in the
greeted in a friendly manner?” Part 2 is, “Did lab area where blood samples are drawn?
the receptionist answer your questions?” The lack of specificity will make it difficult
If both of these aspects are important then for the respondent to provide an informed
they should be separate questions. When opinion. The Yes/No response options are
given double-barreled questions respondents appropriate. The Not Sure option is not
are frequently caught in a quandary. What applicable for this item. It would also be
if I think they did greet me in a friendly an option to have a response format that
manner but did not answer my questions? assessed the level of comfort (e.g., very
What response do I give? Most often if the comfortable, somewhat comfortable, not
respondent agrees with part of the ques- very comfortable, or very uncomfortable).
tion but not the other half they will either As a general operating rule you should
not answer the question or pick a neutral offer response choices that address the key
middle of the ground response. The final concept being assessed. Respondents get
problem with this question is the response confused and possibly irritated when you
choices. The use of a Strongly Agree (SA) ask a question and then provide a response
to Strongly Disagree (SD) response format format that makes no sense. For example, if
is not appropriate for this item. Also note the question is, “Do you have any questions
that they did not spell out SA, A, D, and about the medicine the doctor prescribed
SD. Will the respondent know what these for you today?” then a reasonable response
abbreviations mean? Typically a response option would be Yes or No. An inappropriate
scale based on level of agreement is reserved response option for this question would
for attitudinal questions. The wording of be to offer opinion-based responses such
Question #2 is more of an event-based item as the level of agreement. You may laugh
in which case a Yes or No response option because this seems so obvious but I have
would be adequate. Again, you do not need seen surveys that are asking event-based
a Not Sure response. Yes/No type questions and then offer a SA
■■ Question #3: Did you have to wait longer to SD response option.
than expected to see the doctor? When a ■■ Question #5: Did we treat you with dignity
question is trying to capture some aspect and respect? This is another double-barreled
of time the best approach is not to provide question. Break it into two questions if both
vague concepts like Not Too Long, Sort of concepts are important. The response options
Long, and Very Long but rather either (1) ask are reasonable for this item.
the respondent to select categories of time ■■ Question #6: Did the doctor answer all your
(e.g., less than 5 minutes, 5 to 10 minutes, questions in a thorough and friendly manner?
11 to 15 minutes, or over 15 minutes). A Yes, you spotted another double-barreled
preferable way to capture the time dimen- question that needs to be broken apart.
sion is to ask a staff person or the patient The doctor may have been a very friendly
to write down the actual times when key person but provided incomplete answers to
steps in the process occurred. This applied the patient. How do they respond to these
also to Question #1. With the actual times two items when they are combined? The
recorded you can now analyze the variation response options are appropriate and in this
in the process which you cannot do with case the Not Sure option does have value.
response options such as Not Too Long, ■■ Question #7: How long ago did you make
Sort of Long, and Very Long. this appointment? This question is another
■■ Question #4: Were the chairs comfortable? annoying one because the office should know
The issue here is obvious. What chairs? In when the patient made the appointment. If
Surveys 71

they do not know the patient will surely not bigger or instruct the respondent to use
know the date she called for an appointment. the reverse side of the card for additional
■■ Question #8: Did you use our valet park- comments.
ing service? This may be of interest to Now that you have had time to evaluate a
the practice manager of the clinic but the survey and to start thinking about some key
patient will most likely find the question components of survey design it is time to dive
annoying. The best way to gather this bit of
into five topics that greatly affect the success of
data is to ask patients when they check in at your surveying efforts:
the registration desk if they used the valet
parking service. Staff should be collecting ■■ The growth of surveys in healthcare settings
the response to this item. ■■ Writing survey questions
■■ Question #9: Will you recommend our ■■ Response formats for the questions
services to your friends and family? This is a ■■ The logistics of survey administration
popular question on many patient surveys. ■■ Linking survey results to improvement
The issue with this item is the word “ser- strategies
vices.” How is the patient to know which These five topics are addressed next.
services are being referenced? Registration
service? Doctor services? Nurse services?
Phlebotomy services? Valet services? If The Growth of Surveys
each of these services is critical then list The use of surveys in healthcare settings has
them separately and have the respondent grown faster than the use of any of the other VOC
provide opinions on each type of service. approaches and tools already discussed. There
The problem then becomes the length of are so many internal and external surveys within
the survey and the time commitment you the healthcare industry that a participant in one
are requesting of the respondent. The other of my workshops on the design, development,
issue with this item is that there may be a and use of surveys asked at the beginning of the
recommendation patients might make to class, “How long will this survey madness go on?”
a friend that could be different from that My response was that the use of surveys is most
which they would offer to a family member. likely to continue and probably increase. In the
For example, if the patient is a 42-year-old United States particularly, we have been survey
female she might recommend the practice crazy for decades. This has been the case not only
to her friends but not to her 80-year-old in the healthcare industry but also in all sectors
mother. Again, this has the potential to be of our society (remember Pam from the Acme
a double-barreled question. For the res­ Storm Window Company whom you met at the
ponse options, I would ask what is the dif- beginning of this chapter?). Other countries are
ference between “A Little” and “Somewhat”? not as fixated on the use of surveys but there is
These seem to both be dealing with the a growing tendency in western European coun-
same or similar level of magnitude. Given tries, especially in the United Kingdom, to use
the wording of this question the responses surveys in healthcare settings as the dominant
of Yes, Maybe, and No could be appropriate. approach to determining whether the service
■■ The final invitation to “tell us anything users’ expectations are being properly addressed.11
else that concerns you about your health The growth in surveys has been charac-
or our services” seems, on the surface, to terized by two major types of surveys: formal
be a friendly, open-ended request. But the and informal. Formal surveys come from two
limited space provided suggests that we primary sources. First, there are surveys required
really don’t want you to write too much. by external agencies, ministries, inspectorates,
They should either make the survey card departments of health, and other groups that
72 Chapter 3 Measuring the Voice of the Customer

have oversight responsibility for the provision plans, (6) being single point-in-time (one-shot)
and delivery of healthcare services and outcomes. surveys rather than being done in a continuous
The second type of formal surveys is those that manner, (7) inappropriate statistical analysis, and
are internally sponsored by the healthcare or- (8) the inability to have the results generalized
ganization itself. Most of the time these surveys to broader segments of the population being
are designed, administered, and processed by a served. Yet, the survey development and distri-
commercial vendor who then sends the analysis bution process probably made the people who
and interpretation of the results to the sponsoring created the survey feel like they did something
organization. These surveys typically are aimed worthwhile or they could then tell their manager
at gathering data on patient, employee, and/or that they “did a survey.” Too often, these home-
physician satisfaction. The vendor is retained grown surveys provide inadequate and at times
to provide or develop valid and reliable survey misleading findings that are then used to make
instruments, design sampling strategies, handle poor management decisions.12
the distribution and collection of the survey So, how many surveys are being used
results (usually by mail or telephone methods), throughout your organization? Think about this
process the completed surveys, and then produce for a minute. How many formal surveys do you
statistical and graphical reports for management. have to complete? More important, how many
Although the formal surveys usually have informal or ad hoc surveys exist? When I ask
a high degree of visibility and require a line in participants in my workshops to answer this
the organization’s budget to cover the costs, it question I get comments like, “I have no idea how
is informal surveys that are creating the most many surveys are floating around the hospital”
challenges within healthcare organizations. These or simply “Too many!” Based on the feedback
informal or ad hoc surveys are usually conducted I get from workshop participants it is not un-
in response to a problem or a need to answer the usual for a hospital of 200 beds to be sponsoring
ever popular questions: “How are we doing with 50 or more different surveys throughout the
(fill in the blank)?” Or, “We have a problem with course of a year. Ultimately the critical question is,
(fill in the blank) so we better develop a survey once a survey is conducted what do you do with
and get some views on how extensive the problem the results? Do the results get used by anyone?
is.” Popular topics for these informal surveys Repeatedly I have seen surveys conducted on
include handwashing compliance, handoff at wards or units that end up sitting on a manager
shift change, communication among and be- or supervisor’s shelf.
tween staff, medication availability, discharge To help you get a handle on the number
processes, clinic wait times, educational needs, of surveys floating around your organization
feedback on in-service training programs, phy- I have created a Survey Inventory Worksheet
sician satisfaction with laboratory services, the (EXHIBIT 3-2). The seven columns in the worksheet
culture for reporting errors, and the list goes on! are as follows:
Although the individuals who create these
informal surveys have good intentions, fre- ■■ Column 1: What is the short title of the
quently their surveys suffer from (1) validity survey?
issues (i.e., the survey does not actually measure ■■ Column 2: Who is the target group for
what it purports to measure), (2) reliability the survey (e.g., patients, family members,
issues (i.e., the survey cannot reproduce the physicians, staff, and managers)?
same or similar results under different admin- ■■ Column 3: Is the survey a commercial survey
istrations), (3) questions that are generally not provided by a vendor or an internal (ad hoc)
constructed properly, (4) inappropriate response survey developed by staff?
formats for the items or questions on the survey, ■■ Column 4: What is the frequency of dis-
(5) inappropriate sampling and data collection tribution for the survey (e.g., is a one-time
EXHIBIT 3-2 Survey inventory worksheet
Facility Name: _______________________________________________ Date: 

Column 5
Column 3 Column 4 What method is
Is this survey a: Frequency of used to collect Column 6
■■ Commercial distribution the survey How many Column 7
Column 2 Survey (CS)? ■■ A one-time responses (e.g., individual How do you
Column 1 Target group ■■ Internal (ad survey (OTS) mailed, telephone, questions intend to use
Short Title of the for the hoc) Survey ■■ A repeat survey? handout, are on this the survey
Survey survey (IS)? (RS) interview) survey? results?

CS  IS OTS  RS

CS  IS OTS  RS

CS  IS OTS  RS

CS  IS OTS  RS

CS  IS OTS  RS

CS  IS OTS  RS

CS  IS OTS  RS
Surveys

Page _____ of _____


73
74 Chapter 3 Measuring the Voice of the Customer

event or is it repeated at specific periods of As you look at either commercially developed


time)? If it is repeated, indicate the frequency surveys or your local ad hoc ones consider the
of administration (e.g., weekly, monthly, following issues when assessing the individual
quarterly, or biannually?). survey questions:
■■ Column 5: What method is used to collect
■■ Will the aim of the questions be to capture
the survey responses (e.g., mailed, telephone,
attitudes, opinions, behaviors, or events?
handout, or interview)?
Each of these objectives requires different
■■ Column 6: How many individual questions
approaches to asking questions.
are on this survey?
■■ What type(s) of question(s) will you be
■■ Column 7: How do you intend to use the
using? Will you be developing a survey
survey results (e.g., for improvement, per-
with open-ended, forced-choice (sometimes
formance management, incentive bonus
called closed-ended) questions or a com-
payouts, or external requirements)?
bination of both types of questions? Each
Once you complete the inventory, use it as type has advantages and disadvantages that
a basis for a dialogue about the purpose of the are summarized in TABLE 3-3. Open-ended
survey and how useful or valuable are the results questions, as the term implies, provide a
to the organization. Which surveys are essential sentence or two inviting the respondent to
and must be done? How many are interesting provide, in her own words, commentary
but not essential to the organization’s strategic on the topic of interest. Frequently, open-
objectives. What surveys could you eliminate? ended questions are used to explore new
Try to get the total number of surveys reduced to ideas or as a provocation to the respondent
the vital few. I think you will be surprised at how when there is little existing knowledge on
many surveys, formal and informal, are being the topic. Rich insights can be gleaned
conducted that you did not even know existed. from open-ended questions but there are
several obvious issues with including them
Writing Survey Questions in a survey: (1) many respondents do not
want to take the time to write a response
It seems like writing survey questions should
that requires them to formulate their an-
be a rather straightforward activity. You have
swer then write or type it out; (2) there is
in your mind something you want to ask an-
a high potential for nonresponse bias in
other person and you just write it down. Right?
completing open-ended question due to
Although this may seem like a simple task, the
age, literacy, or language issues; and (3) tab-
major issue in writing survey questions is that
ulation of open-ended questions requires
you are writing the question but someone who
considerable time and effort as well as skill
does not have your frame of reference and view
in using qualitative methods to identify the
of the world will be responding to the question.
underlying constructs or issues identified
This is the problem with many ad hoc surveys.
by the respondents. A final challenge I have
People with good intentions sit down and
found with the use of open-ended ques-
write questions for a survey but never get any
tions in healthcare surveys is that they are
input from the people expected to respond to
generally not very probing. They typically
the questions. Consider Dillman’s guidance on
have a lead-in that says something like this:
writing survey questions:
“Please use the following space to provide
One must think about many things at us with and additional details on your visit”
once to write a good question and failure or “Please use the following space to tell
to do so can have significant effects on us what went well or could be improved.”
how the question performs. (Dillman, Most of the time these invitations go
Smyth, & Christian, 2009, p. 67) unanswered.
Surveys 75

TABLE 3-3 Advantages and disadvantages of open- and closed-ended questions

Format Type Advantages Disadvantages

Open ended ■■ Reduces surveyor bias ■■ Creates challenges for summarizing


■■ Enables rich and detailed the responses
responses (stories) ■■ Many respondents don’t provide
■■ Can provide insights the written comments
surveyor never anticipated ■■ Increases the length of the survey, time
to respond, and potentially the cost

Closed ■■ Provides standardized responses ■■ Success is closely linked to how well


ended (aka ■■ Reduces ambiguity the questions were written
forced-choice) ■■ Enhances comparability ■■ Increased variation in interpretation of
■■ Reduces response time the question
■■ Facilitates coding of the ■■ Regression to the mean in response
responses and tabulation patterns
■■ Measurement scale confusion

The majority of surveys in and outside of actually responded to a number of surveys that
healthcare settings use the second approach had questions structured this way. For example
to writing survey questions—forced-choice or consider the following question: Which of these
closed-ended questions. These types of questions five statements best describes your manager?
present a statement that is then followed by a ___ Supportive of the department
limited set of response choices. Some of the
response choices are offered in the form of a rank- ___ Supportive of me and my work
ordered progression (e.g., SA, Agree, Undecided, ___ A real people person
Disagree, or SD) whereas others provide more of ___ Focused on results
a multiple-choice format where the choices are
not necessarily in any rank order or progression. ___ Singularly driven by the department
Response options of the multiple-choice category budget
usually are used to classify the responses but do In this example, there is no sense of order or
not have any particular valance or directionality. a progression of good to bad or favorable to
Demographic questions, gender or age categories, unfavorable. The responses are independent
educational attainment, location of residence descriptors of how the manager is perceived by
(e.g., urban, rural, or suburban), or preferences an employee. The respondent is merely being
(e.g., from the list below please select the type of forced to select among several alternatives that
car you drive) are examples of this type of type have equal valance.
of question. Another approach to this response Alternatives to this type of approach could
format is to use what Dillman et al. (2009, p. 3) be to (1) create response choices that reflect a
call the “nominal closed-ended questions with rank order, progression, or weighting schema;
unordered response categories.” In this case, a (2) allow the respondent to select more than one
question is posed and response choices are offered of the five response options; or (3) use an open-
but they are all independent responses that have ended question to let the respondent describe in
no overall directionality. I imagine you have her own words what she thinks of her manager.
76 Chapter 3 Measuring the Voice of the Customer

In summary, it is critical that you give some Notice that the original questions contain
thought to the type(s) of questions you have excessive wording that make the questions
on your surveys. One of the most important confusing. The question stem, therefore,
issues at this point is that you give serious con- captures the topic of interest with the fewest
sideration to how the types of questions will be number of words. Anyone developing a
received, viewed, and answered by the targeted survey should conduct an exercise with
respondents. As I mentioned earlier, too many colleagues to see if they can determine the
survey questions are developed not from the question stem for each item.
respondent’s perspective but rather from the ■■ Will the wording and terms you use be
perspective of the question writer. clear to the respondents? A majority of the
■■ What is the question stem? The stem of a books on designing surveys point out that
question (Dillman et al., 2009, p. 77–78) the level of words used in a survey should
consists of a very few words that lie at the be geared to a to sixth- or seventh-grade
core of what the question is attempting to reading level. What happens in healthcare
ascertain. Often questions are developed settings, however, is that survey questions
with entirely too many words that end up frequently use words and terms that make
confusing the respondent. What is the sense to the people writing questions but not
essence of the question you are trying to the respondent. Dillman et al. (2009) go
to ask? Dillman et al. (2009, p. 77) write, so far as to recommend that if a word in a
“Crafting good survey questions requires survey question exceeds six or seven letters
understanding how each component of the a shorter and more easily understood word
question conveys meaning independently should be substituted. For example, instead
to respondents as well as how all the of asking “Do you feel exhausted most of
parts work together to convey meaning.” the time?” substitute the word “tired” for
For example, look at the questions in exhausted. Health care has a lot of medical
TABLE 3-4. The stem for each question is terms and jargon that make no sense to the
shown in the right column of the table. patients or their caregivers who are asked

TABLE 3-4 Determining the question stem

Original Question The Question Stem

1. Not counting the number of times you might have How many times have you visited
flown through O’Hare Airport or transferred to another Chicago?
airline, how many times have you visited Chicago?

2. When you think about a recent successful vacation that How much did you spend on your
was not only fun but also added considerable value to recent vacation?
your family, how much did you spend?

3. Excluding any recent visits to the ED or the outpatient How satisfied were you with the
clinic, how satisfied were you with the information information the nurses on the
the nurses on the inpatient unit provided on your inpatient unit provided on your
condition? condition?
Surveys 77

to complete surveys. So instead of asking, multiple items or actions being referenced


“Did the phlebotomist draw your CBC are not all equally important then drop
specimen without causing you excessive one of the items and shorten the question.
pain?” consider a shorter and less technical
question such as “Did the person who drew
your blood sample cause you any pain?” A Response Formats
related aspect of choosing your words is to What response format options are most appro-
make sure you avoid the double-barreled priate for the questions on your survey? This
question. I addressed this issue at the be- topic probably has had more books and articles
ginning of this section when we looked at written about it than any of the other items in this
a test survey but it bears repeating. This section. The easiest way to set the context for this
is not a technical issue but a logical one. topic is to tell a brief story. I was working with an
People can legitimately have different improvement team at a family practice clinic who
views on different topics so why combine wanted to develop a survey that could be handed
multiple concepts in one question. BOX 3-1 out to patients upon arrival. They wanted to have
provides a few classic examples of how we them fill it out as they progressed through their
often write double-barreled question into visit and then deposit it in a box before leaving.
healthcare surveys. You will improve your They also had a variety of questions they thought
surveys tremendously if you become aware they would like to ask. Before I even got to start
of double-barreled question and vigilant in asking them about the number and types of ques-
detecting them. If the multiple components tions they planned on asking or their sampling
referenced in the question are all equally strategy, one of the staff raised her hand and
important then break them into separate proudly announced, “And you know we are going
questions. If after detecting a double-barreled to use a Likert scale.” Now this was not the first
question, however, you discover that the time I have had this happen. I am not quite sure
how this statement has become such a popular
refrain when talking about the use of surveys in
healthcare settings but it happens often enough that
BOX 3-1 Double-barreled questions I now anticipate it when working with healthcare
teams on surveys. I find this reference particularly
■■ Did we treat you with courtesy and curious because most healthcare professionals
respect?
have not had formal courses in survey design
■■ Did the nurses and the doctors respond to
yet they seem to know Rensis Likert as if he was
your needs and answer your questions?
■■ When you came to the clinic for your visit one of their regular patients.
last week, did you think that the way you Before we get into the details of what Likert
were treated demonstrated our corporate contributed to survey methodology we need
values of compassion and dignity? to back up a little. Let’s start by considering
■■ In the past 30 days, have you taken all the nature of response formats. A response
the medicines we prescribed at time format determines how individuals can provide
of discharge and had any unusual side feedback to a specific question on a survey.
effects? Response formats are used almost exclusively
■■ When you were ready to leave the clinic, with closed-ended (forced-choice) questions.
did you receive full information about
Over the years there have been many response
your copay and next appointment?
formats developed by a variety of researchers in
■■ As an employee of XYZ Medical Center, do
you feel the benefits are reasonably priced the fields of sociology and psychology. Each of
and comprehensive? these numerous response formats are dependent
on the type of measurement scale being used.
78 Chapter 3 Measuring the Voice of the Customer

Measurement scales are the means by which measurement scales. If you are not familiar with
we assign a number to an object or entity (Hayes these four categories of measurement you can
2008, p. 183). Stevens (1951) defined four basic gain considerable insight by typing “measure-
types of measurement scales: nominal, ordinal, ment scales” into an Internet search engine and
interval, and ratio. Although there has been you will gain a great deal of information on this
considerable debate in the literature over Stevens’ important topic.
typology, these four categories of measurement With a little background in measurement
have remained the primary ways to organize scales it is now time to return to our friend
scales of measurement. TABLE 3-5 provides a Likert (pronounced correctly by the way as LIK-
summary of the basic characteristics of these дrt and not LY-kд rt). Rensis Likert (August 5,

TABLE 3-5 Measurement Scales and Characteristics

Scale Type Characteristics of the Measurement Scale Examples

Nominal ■■ Naming or categorizing objects or ■■ Male = 1, Female = 2


characteristics ■■ Single = S, Married = M,
■■ The categories are mutually exclusive Divorced = D
■■ If numbering is used it is only for ■■ The numbers used to identify
identification purposes players on a sports team
■■ Cannot use mathematical operations on
this type of measurement

Ordinal ■■ Objects are ordered according to some ■■ Win, place, or show in a horse
defined characteristic race
■■ But we do not know exactly by how much ■■ Low, medium, or high
they differ satisfaction
■■ Rank order hospitals in a
city by overall percentile on
inpatient satisfaction

Interval ■■ Items are ordered and the difference ■■ Temperature in Fahrenheit or


between each pair of items is the same; Celsius
distance is meaningful ■■ A 1–10 scale where the
■■ Average is meaningful, ratios are not patient response is Strongly
■■ No true zero level Agree, Agree, Neutral, etc.
when question is properly
worded and tested

Ratio ■■ Same characteristics as an interval scale ■■ Temperature in Kelvin


■■ But has a true zero point on the scale (i.e., ■■ Length, weight, distance,
no numbers exist below zero) time, etc.
■■ Ratios are meaningful
■■ All parametric statistics permissible

Note: Some writers describe interval and ratio scales as being virtually the same because it is difficult to find a legitimate interval scale that is not also
a ratio scale (Blalock, 1960, p. 15).
Surveys 79

1903–September 3, 1981) was a psychologist having the beginnings of a scale. The individual
who was keenly interested in figuring out how to items would then need to be tested statistically to
measure the extent of a person’s beliefs, attitudes, see if they “hang together” and capture the same
or feelings toward some object or issue. In 1932, concept or issue. This is usually accomplished,
Likert detailed a technique for the measurement as mentioned at the beginning of this section,
of attitudes that was new, empirical, and quite by submitting the individual survey questions
practical. Since then, he and others have made to factor and correlation analysis.
a variety of adjustments and adaptations to his In healthcare settings, therefore, when most
original technique. I do not intend to go into all people say, “And you know we are going to use a
the subtle statistical and methodological aspects Likert scale” what they actually mean is that they
of Likert’s approach However, I will clarify a key are applying a five-point Likert response format
point related to Likert’s work and its application to a bunch of independent questions that are not
to healthcare survey work. related to the same underlying concept and have
Remember the statement, “And you know little or no hope of ever forming a true Likert
we are going to use a Likert scale”? Well, in most scale. What the person on the family practice
instances when people say this they are confusing improvement team should have said was, “And
two critical aspects of Likert’s work. First, Likert you know that we are going to use a five-point
developed a balanced response format using a Likert rating scale as our response format.”13
five-point rank-ordered rating scale (e.g., SA,
Agree, Undecided, Disagree, and SD) that is The Logistics of Survey
applied to an individual item or question on
a survey. Second, he defined what has become Administration
known as a Likert scale, which is created from Another challenge with conducting surveys,
the aggregation of the individual Likert items especially ad hoc or locally developed surveys,
on a questionnaire that capture different aspects, is that the logistics and administrative aspects
concepts, or phenomena being studied. Therefore, of the survey process are either ignored or for-
the term Likert scale, properly used, is the sum gotten about. When this happens, biases and
of responses to individual Likert items using errors will be inherent not only in the design
the 5five-point Likert response format (scale). and administration of the survey but also in the
As you have probably figured out by now results. TABLE 3-6 provides a summary of some
the term “scale” in survey research can actually of the more prevalent sources of bias and error
be somewhat confusing because it is used in in survey research.
two different ways especially when we are con- There are many things you can do to avoid
sidering the work of Likert. When most people or at least minimize the impact of the biases and
refer to a “Likert scale,” they are referencing the errors listed in Table 3-6. Some of these things
five-point balanced response format (i.e., two relate to the logistics of the survey process whereas
positive responses, a neutral midpoint, and two others relate to the design and formatting of the
negative responses). Likert’s principal concern, survey. TABLE 3-7 provides a checklist and guidance
however, was with creating collections of indi- on how you can address and resolve bias, survey
vidual items (questions) that capture or reflect error, and logistical challenges. These are actually
the underlying phenomenon being studied. rather easy issues to address. All it requires is
A Likert “scale,” therefore, cannot consist of to (1) be aware of the potential impact of each
one or even two items on a questionnaire. He of the various biases and errors on your survey
claimed that you needed to have at least eight and (2) take steps to build knowledge about how
items (questions) on a survey that address the to create the structures and processes needed to
same topic or issue to even begin thinking about address each issue.
80

TABLE 3-6 Potential sources of bias and error in survey research

Potential Source of
Survey Bias and Error Description

Motivation to respond Individuals are motivated for different reasons to respond to a survey. Some individuals are intrinsically motivated whereas
others require some form of extrinsic motivation (e.g., money, a coupon for free coffee or lunch at the hospital cafeteria).
Some individuals respond to a survey because they are loyal customers and feel a sense of “belonging” to the organization.
Others respond if they find the survey to be easy to understand and well designed. The point is that if you do not
consider and discuss the potential motivational factors for respondent participation your survey will probably suffer from
nonrespondent bias.

Coverage error This error occurs when not all the members of a targeted population have an equal and known, nonzero chance of being
included in the sample and when those excluded from the sampling strategy are different from those who are included.
For example, if at your clinic 63% of the patients are females and 37% are males do you know the historical probabilities of
each group completing the surveys you send to them? If you get 87% completed surveys from females and only 18% of
the surveys completed by males then you have a high likelihood of a coverage error. This also will occur when you decide
to send only to respondents who you think will be more responsive (e.g., female patients usually are more responsive than
Chapter 3 Measuring the Voice of the Customer

males so if you send to proportionally more females than males then you will experience a coverage error).

Sampling error Sampling error occurs when the precision of the survey estimates is limited because not every person in the population
is sampled. For example, a random sample of about 100 members of the U.S. general public can produce margins of error
of ± 10% with a 95% confidence level (i.e., 95 out of 100 times the estimate will be within ± 10% of the true value). But if
you surveyed 2,000 people the margin of error is reduced to about ± 2%. Sampling error is a frequently quoted statistic
especially when doing polls related to elections or critical national issues. Sampling issues are discussed in greater detail in
the next chapter.

Nonresponse error When you do not get everyone in the sample to respond to the survey, you will experience nonresponse error. This is
especially a critical error when the survey also suffers from the coverage error problem and those who respond have
different characteristics from those who do not respond.

Measurement error Measurement error occurs when a respondent’s answer to a survey item is inaccurate or imprecise. Measurement error is
usually the result of poor question wording or survey design. This is especially a problem with mailed or Internet surveys
when there is no one from the surveying group around to explain the meaning of a question.
TABLE 3-6 Potential sources of bias and error in survey research (continued)

Potential Source of
Survey Bias and Error Description
Wording of questions The details related to these two topics have been addressed earlier in this section. The key issues related to bias and error with
and response formats the wording of the questions include (1) using a reading or language level for the questions that is not appropriate for the
respondents, and (2) creating questions that are not clear, double barreled, or too wordy. Bias and error are also introduced
when the selected response format does not correspond to the logic of the wording of the questions. These problems were
also addressed earlier in this section. They occur most often, however, when the question has a stem that asks the respondent
to assess one level (e.g., the level of importance of some aspect of care) and then provides a response format that is based on
a rank ordering of levels of agreement (e.g., SA, Agree, Undecided, Disagree, SD). In this case, the question sets up a bait and
switch situation. The respondent was asked to indicate how important a topic or aspect is to her but is then given a level of
agreement as a response format. This creates confusion in the respondent and leads to response bias and errors in responding.

Reliability errors Most ad hoc or local surveys are not evaluated adequately for reliability and validity. Reliability refers to consistency and
the lack of distortion over multiple applications of the survey. Issues of concern with reliability include (1) test–retest
reliability (i.e., do we get similar responses with multiple applications of the survey?), (2) the level of inter-rater reliability
(i.e., agreement among different reviewers that the concepts or questions do in fact lead the reviewers to the same
conclusions), which is usually assessed with the kappa statistic (Cohen, 1960), and (3) internal consistency among the
individual questions on a survey which is usually assessed with Cronbach’s alpha (Cronbach, 1951).

Validity errors If the survey you have developed has not been assessed for validity you run a good chance of having the survey instrument
not actually capture the key construct or idea you are trying to study. Validity simply refers to the fact that the concepts,
constructs, or issues being measured are actually captured by the questions in the survey. A classic example frequently
referenced and debated, is IQ testing (i.e., does an IQ test actually measure IQ?). From a healthcare perspective we can ask
whether the medical record is a valid measure of the quality of care the patient received. The medical record reflects certain
aspects of the care experience, namely clinical details, but it tells only part of the story. In 1966, the American Psychological
Association’s Standards for Educational and Psychological Tests and Manuals listed three major types of validity: content
validity, criterion-related validity, and construct validity. Frequently, face validity is referenced as a fourth type of validity. But
face validity provides only an initial quick and dirty assessment of validity and is the weakest form of validity. Face validity
is basically a common sense approach to determine if the connection between the measurement instrument and the
construct or concept under investigation is apparent or self-evident. But, as Selltiz Johada, Deutsch, and Cook (1959, p. 165)
Surveys

point out, “Whether such an assumption (of face validity) is justified in any given case is ultimately a matter of judgment.”
Finally, note that reliability and validity are usually considered together. A test or set of questions might be reliable (i.e., the
survey gets roughly the same results upon multiple administrations of the instrument) but it could also have no validity
81

because the survey instrument does not adequately capture or measure the constructs being investigated). Additional
references on both reliability and validity include Bauer (1996), Bohrnstedt (1970), Hayes (1998), and Selltiz et al. (1959).
82

TABLE 3-7 Addressing survey logistics and formatting issues

Who Is Going to… Why It Matters

. . . lay out the Frequently not enough time and attention are devoted to the graphic design and layout of a survey, especially ad hoc
survey, size the surveys. If it is a mailed or handed-out survey will you present the survey on a landscape or portrait format? Will you make
answer spaces, and full-page single- or double-sided surveys or make a folded booklet? What point size will you use to print the survey? What
graphically design color paper will you use to print the survey? You may be saying at this point, who cares? Well, research has shown that the
the survey? layout, spacing, font, bold/not bold, use of italics, and point size all can make a big difference in terms of response rates
on a survey. For example, elderly patients respond more completely and accurately to surveys that use simple words and
are printed on light yellow paper (rather than white) with a 14 point Arial font. Do you even think about the font and point
size used in making a survey? The science of visual perception has provided considerable insights on how the visual and
graphic aspects of a survey can play a key role in motivating a respondent to complete the survey (Hoffman, 2004; Palmer,
1999; Tufte, 1983, 1990, 1997; Ware, 2004). Dillman et al. (2009, p. 89–106) provide very practical guidance on the visual
presentation of surveys. It is a topic that most healthcare professionals grossly underestimate when they are developing
surveys.

. . . write the All too often surveys are distributed to patients or staff with no or inadequate instructions for responding to the survey or
instructions for what they can expect with respect to the results. Every survey, whether it be a mailed, phone, or Internet survey or focus
Chapter 3 Measuring the Voice of the Customer

responding to the group structured interviews should provide information on (1) the purpose of the survey, (2) how to complete or respond to
questions and what the survey, (3) how the results will be used, and (4) whether a copy of the results will be made available to the respondents.
will happen with the Organizations that are focused on transparency will also have a strategy for distributing the survey results to the public, the
final report? media, and possibly to local political leaders.

. . . identify the actual There is a need to keep the target groups for receiving the survey clearly defined. Identifiers for each respondent need to be
target group(s), established so that completed surveys can be distinguished from those not returned. Location codes and maps may also be
service lines, needed in order to determine if there are clusters of nonrespondents. Stratification may also be required to establish target
physician groups, etc. groups that have similar characteristics.
for the survey?

. . . define the How soon after a patient engages with healthcare services should they receive a survey? Will you construct a structured
distribution and interview while they are in your care? Maybe you will give them a survey at time of discharge. Some service providers favor
collection process? using phone surveys whereas others prefer mailing the patient a survey within 7–10 days of discharge. Will you consider
using an Internet survey? Will your patients respond to one of these approaches more positively than another? Will age,
gender, or socioeconomic status influence the distribution and collection process? You also need to make sure that some
people don’t get the survey at beginning of their experience whereas others in the same target group receive it in the
middle or at the end of their experience.
TABLE 3-7 Addressing survey logistics and formatting issues (continued)

Who Is Going to… Why It Matters

…follow up with Without a high percentage of returned and completed surveys you should analyze the results and make conclusions with
nonrespondents caution. Yet many surveys are processed with insufficient response rates. Although there are no response rate figures that
to obtain a high everyone agrees with there are guidelines that commercial survey groups do provide and to use as references. Check with
response rate on the your survey vendor if you have not seen the response rates that they use. In conducting staff or employee surveys you
survey? should be getting a response rate of 80% or better. Because the employees should have a vested interest in the organization
they work for we expect higher response rates from this group than from patients. For response rates related to hospitals,
clinics, home care services, mental health services, outpatient labs, and other allied health services, recommendations on
response rates depend in large part on the size of the sample or target population you are surveying. This is why your data
collection strategy is so important. For example, if someone says “We got a 50% response rate to our survey” you should ask
them how many surveys they sent out to get this number. If they sent out 20 surveys and got 10 back the 50% response
rate does not carry the same importance as having sent out 100 surveys and receiving 50 back. So, what will be your
process be for sending out reminders to those who do not respond to the survey? Will you call them, send them a reminder
letter with another survey and self-addressed stamped envelope, or send them a text message?

…scan the results Structures and processes need to be established for ongoing data collection, retrieval, and analysis of the surveys. If your
and/or enter them survey is being conducted by a commercial vendor this will be taken care of for you. But how will you process the ad hoc
into a database for surveys? Most healthcare organizations do not have a survey processing department. So these functions are often left up to
analysis? the team or group sponsoring the survey. Without a plan for these activities surveys frequently go unanalyzed or experience
long delays in processing the results.

…do the quality Even if you do have structures and procedures in place to scan and process completed surveys do not overlook the need for
checks on the quality control checks on the returned surveys. Steps should be established to ensure that the responses can be processed
completed surveys? (e.g., decide what to do with missing values, wrong responses, weak check marks, or lost surveys).

…own the analysis, Data without a context for action are useless. How will the results be reported and to whom (e.g., to respondents,
Surveys

reporting, and use of management, or staff )? There is a need to close the loop especially with the survey respondents. Do you share survey results
the survey results? transparently with those who provided responses? Do you publicly post the results? Finally, do you use the results to drive
improvement? Or, are they used only for incentive and bonus payouts in a PFP system?
83

The points of bias and error listed in this table have been adapted from Dillman, D., Smyth, J. and Christian, M. Internet, Mail and Mixed Mode Surveys: The Tailored Design Method. John Wiley & Sons, Inc., 2009. The detailed
descriptions of the points, however, were created by this author.
84 Chapter 3 Measuring the Voice of the Customer

Linking Survey Results of experiencing real improvement is greatly


increased (Scherkenbach, 1991).
to Improvement Strategies FIGURE 3-4 shows how listening to the VOC and
From a QI perspective, listening to the VOC the VOP creates the foundation for improvement.
is only a start. Listening to those we have the The improvement journey begins by conducting
privilege of serving, including staff and cowork- an enumerative study14 at Time 1 to evaluate how
ers, provides us with a baseline upon which we the patients feel about their experiences in the
can build and improve. Once we have received ED. The survey gathered levels of satisfaction with
customer input, however, the next step is to iden- various aspects of patient care, such as waiting
tify the processes that influence the customers’ time, explanation of diagnosis and treatment,
perspectives. This is referred to as listening to comfort of the waiting area, and adequacy of
the VOP. Customers can tell us what is wrong discharge instructions. Suppose that the results
with the processes they experience. It is up to of the survey showed that waiting time was the
the owners of the processes to take action and issue with which patients were least satisfied (i.e.,
make things better. When a department or unit less than 25% were satisfied) and that a majority
combines the VOC with the VOP, the probability wanted to be treated and discharged in 2 hours

Voice of Voice of
the customer Voice of the process the customer
% satisfied with Waiting time in % satisfied with
waiting time Minutes emergency room waiting time
100 240 100

210

75 180 75

150

50 120 50

90

25 60 25

30

0 0 0
Time 1 Time 2 Time 3 Time 4 Time 5
Enumerative Analytical studies Enumerative
study study

FIGURE 3-4 Relating the voice of the customer to the voice of the process.
Reprinted with permission of ASQ
Surveys 85

or less. As a result of this finding, a QI team was (i.e., another enumerative study was conducted).
initiated and charged with improving ED wait Results showed that the percentage of patients
times. So where do they start? They should begin satisfied with the waiting time had increased
by realizing that the enumerative study they con- substantially to approximately 70%. The analytic
ducted at Time 1 captured the VOC at a single study used to measure the VOP showed that
point in time. It provides a baseline of not only there had been two shifts in the process from
what the patients and their families want, need, Time 2 to Time 3 and from Time 3 to Time 4.
or expect but also about the actual experiences The enumerative study (another patient sur-
of those going through the ED processes. vey) at Time 5 showed the extent to which this
Figure 3-4 shows that during Time 2, the improvement had been perceived by patients.
team collected data on the actual waiting time Although this example is only for teaching
for 13 consecutive patients and plotted the data purposes, actual QI teams use this line of think-
on a control chart. Results show that the average ing every day. Listening to the VOC without a
wait time was about 160 minutes (a little over plan for improving the process they experience
2.5 hours) and that the process showed wide demonstrates listening without responding. On
variation, with some waiting times both above the other hand, listening to the VOP without
and below the control limits. Analyzing the connecting the process’s performance to the
wait time data over time in order to understand expectations of the customers sets the stage for
the inherent variation in the process is what responding without listening. QI requires both the
Gitlow, Gitlow, Oppenheim, and Oppenheim VOC and the VOP, but it all begins with listening.
(1989) and Deming (1942) referred to as an In this chapter, I have explored the key concepts
analytic study.15 (Control charts are discussed and methods required to build a VOC listening
in detail in Chapter 9. The distinction between system. Although a majority of the discussion has
enumerative and analytic studies is discussed focused on obtaining quantitative feedback from
in detail in Chapter 2.) customers I want to stress that you need to have
Next, the QI team developed an improve- a balance between quantitative and qualitative
ment strategy (the exact details of which are feedback. Although the numbers do provide a
unimportant for this illustration) and tested useful foundation for building improvement
one improvement idea. They then collected strategies, some of the most telling responses
data on 20 new patients during Time 3 to see if come from the written and verbal comments
their improvement idea had an impact. As you of the patients, their caregivers, and staff. One
can see in Figure 3-4 Time 3 their improvement comment that I read years ago from a patient
idea did have the desired impact. The average satisfaction survey still sticks out very clearly in
wait time during Time 3 was reduced to a little my mind today. It was from an elderly female
over 2 hours and the degree of variation was also patient. The handwriting was a little unstable, but
reduced. However, this improvement was still not the words were very clear. When asked what we
meeting customer expectations (i.e., to wait less could do to improve the care and service delivered,
than 2 hours). Therefore, another improvement she responded, “I wish someone would have just
idea was introduced, and data were collected touched me and told me everything would be
on 20 more patients during Time 4. Results for all right.” We frequently forget that some of the
this period demonstrated that the average wait best healing comes from a simple smile, a pat on
time was reduced to approximately 90 minutes the hand, or a few words of encouragement. We
and the upper control limit (UCL) was now deliver services that are much more important
at the 2-hour target voiced by patients in the than installing storm windows, yet we seem
enumerative study conducted during Time 1. to miss the basic point that Pam understood
Finally, during Time 5, the same patient when she called us during dinner—if you ask
questionnaire used during Time 1 was repeated customers for their opinions they will usually
86 Chapter 3 Measuring the Voice of the Customer

provide feedback. But it begins with a serious Patient.” Dr. Peabody closed his Harvard
desire to listen. Once you listen (really listen) lecture on this day with the following
to those you serve, you will be better positioned statement, “For the secret of the care of
to respond appropriately. the patient is in caring for the patient”
In the early 1930s, the fictional Dr. James
Kildare emerged from the pen of author
Notes Frederick Schiller Faust (pen name Max
1. It is interesting to note that the U.S. gov- Brand). Dr. Kildare was featured in
ernment clearly views health care not as books, radio, and movies throughout
a science or a technology but as a service. the 1930s, 40s, and 50s. In the 1960s,
Historically, the U.S. government has the actor Richard Chamberlain played
used Standard Industrial Classification Dr. Kildare, portraying him as a caring
(SIC) codes to organize all industries physician who tended to the emotional
in the country. The service industries as well as physical needs of his patients.
are covered by codes 7200–8900. The This series was followed by the Dr. Ben
healthcare industry is specified by codes Casey character (1961–1966) played by
8000–8093. In the United States, Canada, Vince Edwards. This program depicted
and Mexico, the SIC coding system is a highly compassionate surgeon who
being supplanted by the six-digit North fought the establishment to make sure
American Industry Classification System his patients received the right care at the
(NAICS code), which was implemented right time and tailored to their needs.
in 1997. Although this new classification Probably the most well known of the
schema is now being used throughout doctor shows of this era, however, was
North America, a number of U.S. gov- Marcus Welby, MD, played by Robert
ernmental departments and agencies Young. This show ran from 1969 through
(e.g., the U.S. Securities and Exchange 1976. In his portrayal of a family practice
Commission) still use the traditional doctor, Marcus Welby was the epitome
SIC codes. So, although we often hear of the caring physician who would do
healthcare professionals talk about the anything to make sure his patients were
“science” of medicine and the technology treated both physically and emotionally.
in our industry, the provision of health Nothing could stop Dr. Welby from ex-
care is classified as a service. This should ceeding the expectations of his patents.
not come as a surprise to any of us who My point in recapping this bit of history
have worked in the healthcare industry is that the recent push for and interest
for any length of time. Consider the in patient-centered care are not a recent
starting point—the Hippocratic Oath. development. Even before these TV
The modern version of the Hippocratic shows the focus on the patient has been
Oath states, “I will remember that there an essential part of medical practice for
is art to medicine as well as science, and centuries. We just seem to be finding this
that warmth, sympathy, and understand- critical aspect of medicine once again.
ing may outweigh the surgeon’s knife 2. I knew a physician who was an excellent
or the chemist’s drug.” Another classic emergency room doctor. He could handle
perspective on the central role of the trauma cases with skill and a calming air
patient in the care process came from that made the rest of the staff less tense
Dr. Francis Peabody. On October 21, in difficult situations. In fact, he was so
1925, he titled his lecture to his Har- highly regarded by staff and management
vard medical students “The Care of the that the CEO promoted him to chief
Notes 87

medical officer (CMO). His office was the top decile received the highest PFP
next to mine in the administrative wing bonus with lesser amounts given to the
of the hospital. Although he did a good remaining ranked doctors down to the
job of striking a balance between all the 80th percentile. Then, and this is where it
challenging and oftentimes conflicting became very interesting, they funded the
forces that must be addressed by the bonus payout pool by reducing the salaries
CMO, he was internalizing many of of the doctors who fell into the bottom
the issues. All of this eventually caught 20% of the ranking. Talk about ways to
up with him and he had a heart attack. demotivate a group! A very unexpected
After a period of recovery and cardiac outcome of this procedure was experienced
rehab he returned to his previous role by two of the doctors who were married.
as head of the emergency room’s trauma The husband fell into the bottom 20% of
services. When I had a chance to sit and the ranking whereas his wife was in the
talk with him about this experience he top decile. A postscript to this story. After
was quite honest and reflective. The best all this calculating and normalizing of the
comment he made was that he was much index, the amount of money for the doctors
more relaxed now that he could deal with in the top decile was a little less than
trauma cases than trying to manage all $5,000. The staff received no bonuses.
the personalities and administrative is- 4. Today we still struggle with getting people
sues that came with the CMO position. to wash their hands properly before and
I thought this was quite interesting. He after seeing a patient. I find it fascinating
found it more relaxing to treat gunshot that regardless of what country I am
wounds, stabbings, car accident victims, working in the issue of hand hygiene
and strokes than trying to manage the is still a major challenge. Although the
social and psychological aspects of human practice of handwashing as envisioned
behavior. When I asked him if he would by Oliver Wendell Holmes may be uni-
ever consider an administrative position versally accepted as being efficacious, it
again, he quickly responded, “NO!” I certainly is not a part of daily behavior
knew with that answer that this man will of many healthcare workers. A number
not fall prey to the Peter Principle again. of years ago when I was preparing a pre-
3. One particular case comes to mind related sentation on hand hygiene for a workshop
to a medical group that I consulted with at the Illinois Hospital Association my
for several years. The management team daughter Devon who was 8 at the time
for this group decided to institute a PFP came into my office and asked what I was
system for the doctors. They created a doing. I told her I preparing a talk on why
fairly complicated formula that included healthcare workers need to wash their
measures such as number of appointments hands. She thought for a moment and
booked, number of patients seen every then said, “da ah, we learned to do that
hour, patient satisfaction survey results, in kindergarten.” I changed the title of my
revenue generated, and coworker feed- presentation from Building a Successful
back. They took all of these measures and Hand Hygiene Program to If a Kid Can,
then weighted them to create a composite Why Can’t We?
index. In order to reduce the extremes 5. Another well-known historical example
they discovered in the variation of the of how long it takes to have a proven
index numbers, they created normalized idea that provides medical benefit fully
z-scores and then rank ordered all the embraced, implemented, and spread is
physicians in the group. The doctors in the story of lemon juice and scurvy. In
88 Chapter 3 Measuring the Voice of the Customer

1601, British Captain Sir James Lancaster (especially lighting the gas stove). So I ask
ran an experiment that helped to confirm her what she would like, rice or noodles.
that the administration of lemon juice I started doing this when she was about
prevented scurvy in his crew. On one of 3 years old. Back then she would look
his four ships making a trading voyage intently at the rice and noodle packages.
to South Africa, he administered daily For whatever reason at the time, she would
doses of lemon juice to the crew. The pick one of the rice products and my wife
men on the other three ships did not or I would prepare it. Then Devon would
receive the daily lemon juice ration and proceed to eat it. Ever since we have always
within 4 months of leaving England were involved her in the decisions about what
suffering seriously from scurvy. Lancaster we will eat. She is a customer, and we
had clear evidence of the link between should listen to her voice.
lemon juice and the virtual elimination 8. At the IHI’s 2002 National Forum on
of scurvy. An interesting sidebar is that Quality Improvement in Healthcare, Don
eventually in the mid-1800s lime juice was Berwick, MD, CEO of the IHI, provided
also being used in addition to the lemon a wonderful keynote address by having a
juice to prevent scurvy in British sailors. three-way conversation with himself. He
As the story goes, lime juice was added cleverly played three roles: he served as the
to the sailors’ daily grog ration (water narrator attempting to manage a debate
and rum). This eventually led to a British between Dr. Oldway and Dr. Neway. Dr.
sailor being referred to as a “limey.” Over Oldway was the traditionalist who pined
time the term lost its naval reference and for the way things used to be. He did not
was used to refer to any individual from understand why the healthcare industry
Britain. When Lancaster presented his no longer was centered around him, why
findings to the British Admiralty, how- nurses no longer seemed to respect him,
ever, they rejected his findings. Finally, in why patients questioned him, and why all
1795, 194 years after Lancaster’s discovery these outside groups were sticking their
and after many unnecessary deaths, the noses into his business. Dr. Neway, on the
admiralty finally mandated lemon juice other hand, argued that by allowing patients
be distributed to the sailors each day. to become more involved with their care,
6. Lewis Carroll, Alice in Wonderland (New it actually made life easier for the doctor.
London: Brimax Books, 1990, p. 55), It was a very engaging and entertaining
(originally published in 1865 by Macmillan address and still available on videotape
Publishing in London). from the IHI for those who are interested.
7. When I use the term “customer” I am 9. This version of rounding is quite differ-
referring to anyone who receives the ent from “clinical rounding” or grand
output of one’s efforts. In a sense, we are rounds.” In fact, most organizations that
all customers and suppliers during most have initiated rounding programs have
transactions. When I am teaching a class, had to develop specialized training to
for example, I am the supplier of infor- help staff realize that patient rounding
mation to the participants, but I am also for service excellence is very different
a customer, because I rely on others to from conducting clinical rounds or grand
register participants, schedule rooms, and rounds. Clinical rounding usually involves
prepare handouts. My 9-year-old daughter, reasonably small groups of professionals
Devon, is also my customer. I want her seeking to confirm, refute, or debate
to eat dinner, but I am not comfortable clinical findings, processes, and theories.
with her preparing a meal from scratch Grand rounds have emerged to become
Notes 89

educational lectures for larger groups on many decades” (3). Dillman and his team
selected clinical and operational topics. then go on to describe the development
Frequently grand rounds will be delivered of mail and telephone surveys from the
by outside invited experts in a particular 1940s to the present. It is very interesting
field. Rounding for service excellence, on and worthwhile bit of history for anyone
the other hand, is an interaction designed serious about survey research.
to establish a dialogue between those 11. The United States clearly leads in the
delivering care and support and the indi- proliferation and use of surveys. Western
vidual, family, or caregiver experiencing European countries are becoming more
the care or support. Similar terms but very enamored with surveys than are eastern
different intents and designs. European countries, the Middle East,
10. It should be noted that survey research is Africa, and the Far East. The United
actually a fairly young field of endeavor. Kingdom has seen a marked increase in
Dillman et al. (2009) provide a very nice the use of national surveys over the past 5
overview of the history of survey research. years and I do not expect to see a decline
They trace how the survey process for the in this pattern in the near term. A major
first two-thirds of the 20th century relied difference between the use of surveys in
exclusively on personal interviews. Then the United States and Western Europe,
they describe how the nascent field of mail however, is that in European countries
and telephone surveys got off to a rather surveys are not distributed and collected as
shaky start in the late 1930s. The rather frequently as they are in the United States.
unsuccessful launch of mailed surveys It is not uncommon, for example, that a
is linked to the classic story of the U.S. healthcare system in the United States
presidential election of 1936 between will have a continuous process in place to
Franklin D. Roosevelt and Alf Landon. distribute and collect patient satisfaction
The Literary Digest (a popular magazine surveys on a weekly or monthly basis.
of that day) conducted a mailed survey In the United Kingdom (UK), on the
of over 10 million people sampled from other hand, national surveys are usually
telephone directories and auto registries. done once every year or two. Another
They received over 2 million completed difference is that in the UK there are
surveys and predicted from the results only a few standardized national surveys
that Alf Landon would win the presidency developed and sponsored by the National
by almost 15%. But as any student of U.S. Health Service (NHS). In the United
history knows, Roosevelt won the election States, healthcare providers are able to
with over 60% of the vote. So what went select one or more survey vendors that
wrong? Well, it was a sampling problem. suit their needs and budget. As a result
The Literary Digest ended up with a there is considerable variation in the
very biased sample because only 36% of types of surveys, the response formats,
household had telephones and even less and the procedures for data collection.
(about 20%) had cars. The people who This creates huge challenges for making
had cars and/or telephones were typically comparisons. The only national stan-
individuals of wealth and status who were dardized healthcare experience survey
primarily Republicans who voted for in the United States is one required by
Alf Landon, the Republican candidate. the Centers for Medicare and Medicaid
Dillman et al. point out that this “botched (CMS) called HCAHPS Hospital Con-
prediction cast a shadow over the use sumer Assessment of Healthcare Providers
of both mail and telephone surveys for and Systems (HCAHPS). The survey is
90 Chapter 3 Measuring the Voice of the Customer

designed to measure quality of care as formats and “scales.” Gene Summers


perceived by patients and the results are complied in 1970 one of the best compen-
reported publicly so comparisons can be diums of classic readings on the field of
made among and across providers. attitude measurement. In Summers’ book,
12. Now, I know that this sounds rather you will find articles not only on Likert’s
negative. I do not mean to be rude or approach to developing scales but also
insulting. But based on many years of other pieces on alternative approaches
teaching survey methods and helping that have played a dominant role in the
healthcare professionals design, develop, literature including Guttman’s scalogram
and use surveys correctly, I have seen a lot analysis, Coombs’ unfolding technique,
of surveys that are excellent examples of Thurstone’s method of equal-appearing
how not to develop a survey. But, this is intervals, the Bogardus social distance
not surprising. It is not uncommon, for scale, and Heise’s semantic differential.
example, for a group of employees to sit My purpose in referencing these methods
down in the break room and develop a is to highlight the fact that there is a great
survey that is distributed to patients or deal more to the methodology behind
family members by the end of the day. survey research than just saying, “And you
In this case, good intentions frequently know we are going to use a Likert scale.”
lead to low levels of knowledge and poor So, I would suggest that you, or others
decisions. A healthcare analogy might be within your organization, build on your
useful at this point. We would not think current level of knowledge about survey
of asking a person who has no knowledge research and explore the alternatives to
of or prior experience with drawing blood what Likert proposed for both response
to just start sticking patients with needles formats and building scales.
to obtain samples. Why, therefore, would 14. An enumerative study (Carey and Lloyd,
we expect people who have no formal 2001; Deming, 1942; Gitlow et al., 1989)
training in or experience with the design, is done on a static population at a fixed
development, and use of surveys to produce period in time. Its purpose is merely to
valid, reliable, and useful surveys? We describe some variable of interest. Enu-
set up the workers for failure. Although merative studies can be compared to taking
we have all filled out many surveys, the a snapshot with a camera. It provides a
mere completion of surveys does not limited view of events because it does not
make us experts in survey methodology. focus on how things may vary over time.
I have had people tell me in class, for ex- An enumerative study is based on questions
ample, that “I know a lot about surveys. such as, “What was the average turnaround
Matter of fact, I filled out another one time last month? What was the mortality
just last week for the hotel where I spent for coronary artery bypass graft (CABG)
my vacation.” Filling out surveys is not patients last year?” Enumerative studies
substitute for knowledge about survey would tell us nothing about the processes
design, administration, and use. of care the patients experienced or why
13. I know, some of you at this point are saying they received the treatment. In the most
to yourself, “So what? Who cares? You told general sense of the word, enumerative
me all this but it is a very fine point that studies are classic examples of data for
only statistical geeks care about.” Well, I evaluation or judgment. When you look
agree with you . . . to a point. The point at the results of an enumerative study, you
being that Likert is only one of numerous are left with only a few options: you can
individuals who have developed response ask, “Does this outcome number strike
References 91

me as being acceptable or not?” or “Is Campbell, D., and D. W. Fiske. “Convergent and Discrimi-
this outcome number different from the nant Validation by the Multitrait-Multimethod Matrix.”
Psychological Bulletin 56, no. 2 (1959): 81–105.
previous number?” Carey, R. “How to Choose a Patient Survey System.” Journal
15. Analytic studies are performed to answer on Quality Improvement 25, no. 1 (January 1999): 20–25.
questions about a dynamic process. They Carey, R., and R. Lloyd. Measuring Quality Improvement
are not restricted to a single point in time in Healthcare: A Guide to Statistical Process Control
and are used to predict the future rather Applications. Milwaukee: ASQ Press, 2001.
Child, D. The Essentials of Factor Analysis. New York: Holt,
than describe past outcomes. In this respect, Rinehart and Winston, 1973.
an analytic study can be compared to a Co, J., T. Ferris, B. Marino, C. Homer, and J. Perrin. “Are
video recording rather than a snapshot. Hospital Characteristics Associated with Parental Views
Analytic studies are done to determine why of Pediatric Inpatient Care Quality?” Pediatrics 111,
the outcomes were observed and how the no. 2 (2003): 308–314.
Cohen, J. “A Coefficient of Agreement for Nominal Scales.”
process that produced the outcomes can Educational and Psychological Measurement 20, no. 1
be improved. An analytic study answers (1960): 37–46.
questions such as, “What can we predict Connellan, T., and R. Zemke. Sustaining Knock Your Socks
about the length of stay for the coming Off Service. Chicago: American Management Associ-
year?” or “What were the causes of the ation, 1993.
Crabtree, B., and W. Miller, eds. Doing Qualitative Research.
observed decrease in surgical inpatients Newbury Park, CA: Sage Publications, 1992.
for the previous year?” A more extensive Cronbach, L. “Coefficient Alpha and the Internal Structure
overview of the differences between of Tests.” Psychometrika 16, no. 3 (1951): 297–334.
enumerative and analytic studies can be Deming, W. E. “On a Classification of the Problems of
found in Gitlow et al. (1989). Statistical Inference.” Journal of the American Statistical
Association 37, no. 218 (1942): 173–185.
Deming, W. E. Out of the Crisis. Cambridge, MA: Massa-
chusetts Institute of Technology, Center for Advanced
References Engineering Study, 1992.
Balas, E., and S. Boren. “Managing Clinical Knowledge for Deming, W. E. The New Economics. Cambridge, MA: MIT
Health Care Improvement.” In Yearbook of Medical Press, 1994.
Informatics 200: Patient-Centered Systems, edited by Dillman, D. A., J. D. Smyth, and L. M. Christian. Internet,
J. Bemmel and A. T. McCray, 65–70. Stuttgart, Germany: Phone, Mail, and Mixed-Mode Surveys: The Tailored
Schattauer Verlagsgesellschaft mbH, 2000. Design Method, 3rd ed. New York: Wiley, 2009.
Barry, M. J., and S. Edgman-Levitan. “Shared Decision Freiberg, K., and J. Freiberg. NUTS! Southwest Airlines’
Making: The Pinnacle of Patient-Centered Care.” Crazy Recipe for Business and Personal Success. Austin:
New England Journal of Medicine 366, no. 9 (2012): Bard Press, 1996.
780–782. Frankel A., S. P. Grillo, E. G. Baker, C. N. Huber, S. Abookire,
Bauer, J. Statistical Analysis for Decision Makers in Healthcare. M. Grenham, P. Console, M. O’Quinn, G. Thibault, and
Chicago: Irwin Professional Publishing, 1996. T. K. Gandhi. “Patient Safety Leadership WalkRounds™
Berwick, D. “Toxicity of Pay for Performace” in Measuring at Partners HealthCare: Learning from Implementation.”
Clinical Care – A Guide for Physician Executives. Tampa, Joint Commission Journal on Quality and Patient Safety
Fla.: American College of Physician Executives, 1995 31, no. 8 (2005): 423–437.
and in Quality Management in Health Care, 1995, 4(1), Garvin, D. “Building a Learning Organization.” Harvard
27–33. Business Review, 71, no. 4 (1993): 78–91.
Blalock, H. Social Statistics. New York: McGraw-Hill, 1960. Gitlow, H., S. Gitlow, A. Oppenheim, and R. Oppenheim.
Bohm, D. On Dialogue. Transcription of a seminar conducted Tools and Methods for the Improvement of Quality.
by Dr. Bohm in Ojai, California, November 6, 1990. Homewood, IL: Irwin Press, 1989.
Bohrnstedt, G. “Reliability and Validity Assessment in Grenning, T. “B. F. Skinner: 1904–1990.” Journal of Humanistic
Attitude Measurement.” In Attitude Measurement, Psychology 31, no. 2 (1991): 112–113.
edited by G. F. Summers, 80–99. Chicago: Rand Harman, H. Modern Factor Analysis, 3rd ed. Chicago:
McNally, 1970. University of Chicago Press, 1976.
Budrevics, G., and C. O’Neill, C. “Changing a Culture with Hayes, B. Measuring Customer Satisfaction: Survey Design,
Patient Safety Walkrounds.” Healthcare Quarterly Special Use and Statistical Analysis Methods. Milwaukee: ASQ
Issue, October 2003. Press, 1998.
92 Chapter 3 Measuring the Voice of the Customer

Hayes, B. E. Measuring Customer Satisfaction and Loyalty, Scherkenbach, W. The Deming Route to Quality and Produc-
3rd ed. Milwaukee: ASQ Press, 2008. tivity. Washington, DC: CEEPress Books, 1990.
Herzberg, F. “One More Time: How Do You Motivate Scherkenbach, W. Deming’s Road to Continual Improvement.
Employees?” Cambridge, MA: Harvard University, Knoxville, TN: SPC Press, 1991.
Graduate School of Business Administration, 1968. Schultz, L. Profiles in Quality. New York: Quality Resources,
Hoffman, D. Visual Intelligence. New York: Norton Pub- 1994.
lishing, 2004. Selltiz, C., M. Johada, M. Deutsch, and S. Cook. Research
Institute for Healthcare Improvement. Patient Safety Methods in Social Relations. New York: Holt, Rinehart
Leadership WalkRounds.™ Institute for Healthcare Im- and Winston, 1959.
provement Idealized Design Group and Allan Frankel, Senge, P. The Fifth Discipline: The Art and Practice of
MD ­Cambridge, Massachusetts, n.d. http://www.ihi the Learning Organization. New York: Currency/
.org/resources/Pages/Tools/PatientSafety­Leadership Doubleday, 1994.
WalkRounds.aspx Skinner, B. F. About Behaviorism. New York: Vintage
Kim, J.-O., and C. Mueller. Factor Analysis: Statistical Methods Books, 1974.
and Practical Issues. Sage University Paper Number 07-14. Stevens, S. Handbook of Experimental Psychology. New York:
Beverly Hills: Sage Publications, 1978a. John Wiley and Sons, 1951.
Kim, J.-O., and C. Mueller. Introduction to Factor Analysis: Stone, J. In the Country of Hearts: Journeys in the Art of
What It Is and How To Do It. Sage University Paper Medicine. Baton Rouge: Louisiana State University
Number 07-13. Beverly Hills: Sage Publications, 1978b. Press, 1990.
Kohn, A. No Contest. Boston: Houghton Mifflin, 1986. Summers, G. F., ed. Attitude Measurement. Chicago: Rand
Kohn, A. Punished by Rewards: The Trouble with Gold Stars, McNally, 1970.
Incentive Plans, A’s, Praise, and Other Bribes. Boston: Thomas, E., J. B. Sexton, T. B. Neilands, A. Frankel, and
Houghton Mifflin, 1993. R. L. Helmreich. “The Effect of Walk Rounds on Nurse
Krippendorff, K. Content Analysis: An Introduction to Its Safety Climate Attitudes: A Randomized Trial of Clin-
Methodology. Beverly Hills: Sage Publications, 1980. ical Units.” BioMed Central Health Services Research
Landsberger, H. Hawthorne Revisited. Ithaca, NY: New York (April 11, 2005). doi: 10.1186/1472-6963-5-28
State School of Industrial and Labor Relations, 1958. Tseng, M. M. and J. Jiao. “Mass Customization.” In Handbook
Langley, G., K. Nolan, T. Nolan, C. Norman, and L. Provost. of Industrial Engineering, Technology and Operation
The Improvement Guide. San Francisco: Jossey-Bass, 2009. Management, edited by G. Salvendy, 3rd ed. New York:
Likert, R. A. “Technique for the Measurement of Attitudes.” Wiley, 2001.
Archives of Psychology, no. 14, 1932. Tufte, E. The Visual Display of Quantitative Information.
Lloyd, R. “Improving Ambulatory Care Through Better Cheshire, CT: Graphics Press, 1983.
Listening.” Journal of Ambulatory Care Management Tufte, E. Envisioning Information. Cheshire, CT: Graphics
26, no. 2 (April–June 2003): 100–109. Press, 1990.
Nelson, B. 1001 Ways to Reward Employees. New York: Tufte, E. Visual Explanations. Cheshire, CT: Graphics
Workman, 1994. Press, 1997.
Oglesby, P., The Caring Physician: The Life of Dr. Francis W. Ware, C. Information Visualization: Perception for Design.
Peabody. Cambridge, MA: Harvard University Press, 1991. San Francisco, CA: Morgan Kaufmann, 2004.
Palmer, S. Vision Science: Photons to Phenomenology. London: Watkins, K., and V. Marsick. Sculpting the Learning Orga-
Bradford Books, 1999. nization. San Francisco: Jossey-Bass, 1993.
Patton, M. How to Use Qualitative Methods in Evaluation. Webb, E., D. Campbell, R. Schwartz, and L. Sechrest Un-
Newbury Park, CA: Sage Publications, 1987; obtrusive Measures: Nonreactive Research in the Social
Peabody, F. “The Care of the Patient.” Journal of the American Sciences. Chicago: Rand McNally, 1969.
Medical Association 88 (March 19, 1927): 877–882. Webster’s II New Riverside University Dictionary. Boston:
Peter, L. and R. Hull. The Peter Principle: Why Things Always Riverside Publishing Company, 1984.
Go Wrong. Toronto: Bantam Books, 1970. Wick, C., and L. Leaon. The Learning Edge: How Smart
Schacter, D. Psychology, 2nd ed. New York: Worth Publish- Managers and Smart Companies Stay Ahead. New York:
ers, 2011. McGraw-Hill, 1994.
CHAPTER 4
Milestones in the Quality
Measurement Journey
L
istening to the voice of the customer (VOC) indicators that will help you move from concepts
provides the starting point. Once you un- to quantifiable measures and connect the VOC
derstand the wants, needs, and expectations with the voice of the process (VOP).
of your internal and external customers, which
are usually expressed as concepts (e.g., “I want
better health service,” “Why don’t you have shorter
waiting time?,” or “Communication between ▸▸ Developing a
the staff needs to get better”), it is up to you to
translate these concepts into indicators that can Measurement
be measured and tracked to determine whether
your processes are capable of meeting the VOC
Philosophy
expectations. Unfortunately, in health care, it is The search for a few good indicators begins by
often the case that indicators are selected not having a clear understanding of why you are
because the providers of a service actually took engaged in measuring performance in the first
time to listen to the VOC, but rather because place. Historically, healthcare providers have
(1) they made a priori decisions that they know collected and analyzed data strictly for internal
what is best for the customers, (2) they have purposes that were directed at improving clinical
been given measures by external oversight or and operational effectiveness and efficiency. Over
regulatory bodies that require certain measures the years, however, the growing external demand
be submitted to them, or (3) they take the mea- for data has shifted much of the focus away from
surement journey shortcut by selecting indicators an internal need to understand the effectiveness
that they have “always collected” and assume and efficiency of the organization’s processes to
that these are good enough for the purposes at one of addressing external demands that lead to
hand. This chapter has been designed to provide judgment. In other words, the business community,
you with a roadmap for selecting and building regulatory bodies, government officials, the media,

© Michal Steflovic/Shutterstock

93
94 Chapter 4 Milestones in the Quality Measurement Journey

and consumers are all interested in answering a on provider performance and greater transparency
very simple question: “Which provider is the best?” will increase over the coming years. The simple
In an effort to answer this question, many question is “Are you prepared for it?” Healthcare
initiatives, projects, and pieces of legislation organizations that have a measurement strategy
have been developed over the years. The goal of and a proactive plan for investigating their own
these efforts has been to develop “report cards” results will be in a much better position to deal
or “score cards” on healthcare providers that can with external scrutiny than those that sit back
be used by various groups and consumers to and hope that the local or national news service
make decisions about their healthcare choices. does not show up outside their hospital. (Refer
Regardless of the country or the approach to to Chapter 2 for more on provider performance
funding health care, however, there seems to be and transparency.)
no quick or easy answer to the simple question Even though there is a renewed interest in
of “Which provider is the best?” the public release of provider data, the more
What has been the typical response to this important reason for knowing your data better
question is that external groups voluntarily ask than anyone else is that it is the right thing to
for or mandate certain performance indicators do and it makes business sense. The complexity
from providers. These numbers are then combined of today’s healthcare delivery system requires
with those from other providers, risk adjustments that leaders have a clear understanding of their
may be applied to the data to account for severity processes and the related outcomes. In order to
of the patient populations, and finally, reports meet operational and financial objectives, patient
are released to the public. These releases usually safety goals, and customer service expectations,
stimulate the following chain of events: healthcare providers should consider developing
what Caldwell (1995) refers to as a “strategic
■■ Local and/or national media become interested.
measurement deployment matrix.” Such a matrix
■■ Investigative reporters are sent out to discover
combines strategic vision with tactical measures.
why Your Hospital has a higher coronary
It allows an organization to determine if the things
artery bypass graph (CABG) mortality
they are working on are really connected to what
percentage than My Hospital and why both
the organization is supposed to be achieving.
are higher than the average for the county,
The first step, therefore, on the quality mea-
region, province, or country in which they
surement journey (QMJ) is achieved by having
are located.
some sense of why you are measuring and your
■■ The reporters present their findings in the
approach to measurement. Is measurement a part
next day’s newspaper or on the 6 o’clock
of the organization’s day-to-day functioning? Or
news, which usually focuses on the providers
is it something that is done periodically in order
at the top and bottom of the list.
to prepare reports for board meetings or external
■■ Your Hospital and My Hospital convene internal
oversight bodies? A good place to start is to develop
meetings to develop strategies (­ rationales)
the organization’s measurement philosophy and
for countering why their numbers are higher
share it with staff, patients, and caregivers. This
or lower than the average.
does not need to be a long document. Something
■■ Consumers become confused and/or cynical
as simple as this could serve as a starting point:
because the data do not necessarily reflect
their experiences (e.g., “My father went to
Responsible leadership demands that we
Your Hospital for his heart operation and
know our data better than anyone else. It
everything was fine” or “My father went to
further requires that we have processes
Your Hospital and nearly died”).
in place to accurately and consistently
Whatever your view on the public release of obtain a balanced set of measures that
data, it is quite obvious that the demand for data monitor clinical outcomes, functional
Measurement Roadblocks 95

status, customer satisfaction, process examples of how data have been used both
effectiveness, and resource utilization. internally and externally to (figuratively) “beat
Finally, we are committed to using these people up.” We often hear coworkers say that
data to develop improvement strategies they did not want to take the monthly numbers
and then take ACTION to make these to the boss because he or she “won’t like these.”
strategies a reality. Organizations have long memories when it
comes to the use of data. Seasoned employees
An organization needs to have a serious quickly tell new workers what happens when the
dialogue about its measurement philosophy numbers do not meet management’s ­expectations.
and why it is measuring (i.e., for improvement, Quickly, the new workers hear the story about
for judgment, or for research). Included in this how Gwenn, nurse manager of 3 East, did not
ongoing dialogue should be specific discussions get her patient satisfaction scores up by the end
about the role of indicators and how they will of the year and now Gwenn is no longer with the
(and will not) be used. Without a measurement organization. What the new workers didn’t know,
philosophy, your efforts to identify key indicators however, is that Gwenn left because her husband
and collect and analyze data will be nothing was transferred to another city. But her leaving
more than a random walk. and the decline of her unit’s patient satisfaction
Ideally, indicators should be designed to scores do provide the basis for a compelling yet
improve quality by: causally incorrect story. As time passes, this story
■■ Moving us away from anecdotes and focusing becomes legendary, gets embellished a little, and
on objective data becomes part of the organization’s folklore. “Re-
■■ Enhancing our understanding of the vari- member what happened to Gwenn” becomes the
ation that exists in a process standard response whenever someone’s patient
■■ Monitoring a process over time satisfaction scores are below the expected targets.
■■ Seeing the effects of changes made to a process What I find absolutely fascinating, however,
■■ Providing a common frame of reference is the fact that people actually like to measure
■■ Providing a more accurate basis for prediction things, including their own performance. There
seems to be a natural curiosity in human beings
Unfortunately, many organizations run into about measurement. When my daughter Devon
serious roadblocks when they attempt to select was 9 years old, for example, she loved to measure
indicators and use them to improve quality. things. When I was in the garage one day doing a
project she came up to me and said, “Measure me,
▸▸ Measurement Daddy.” I took my tape measure and proceeded
to measure her height. She acknowledged the
Roadblocks measurement and went about her business. Ten
minutes later, she returned and stated, “Measure
Many things impede good measurement practice. me, Daddy.” I said, “Devon, I don’t think you
Based on my 40 plus years of working in the have grown much in the last 10 minutes.” But
quality measurement arena, I believe there are five she insisted and seemed to find the actual act
major roadblocks that people usually encounter of measurement not only enlightening but also
in their QMJ: entertaining. The next time I observed her in
the garage, she was using the tape to measure her
Roadblock #1: Measurement bike, her doll, and the dog (or at least trying to
measure the dog). Even adults love to measure
Is Threatening what they do. I have a number of friends who
This is probably the largest roadblock we face participate in triathlons. They are very meticulous
with healthcare measurement. There are many about measuring and monitoring their training
96 Chapter 4 Milestones in the Quality Measurement Journey

regimens. I have seen similar behavior from our profession use the illusion of precision as
people involved with bowling, cycling, and golf. a convenient excuse for not measuring. I have
How do we drive what seems to be an almost heard, for example, the following responses many
natural curiosity about measurement out of times when I asked a team if they had finished
people when they get into work situations? The their measurement plan:
answer to me seems rather simple. Organizations ■■ “We think it will take a little longer to make
frequently use data to instill a sense of fear in
sure the survey is right.”
the employees. Once data are used for judgment ■■ “The log sheet does not seem to capture all
and fear then the data are not for learning and
the elements we think we need to collect.”
improvement but rather for intimidation and ■■ “Why don’t you check with us in a couple
control. It is not surprising, therefore, that the
of weeks? We might have a better plan in
workers rapidly conclude, “Why should I partic-
place at that time.”
ipate in a measurement system that will be used
against me?” Several years ago I experienced this The key point is that quality measurement
attitude when I was facilitating a team that was does not have to be as precise as many people seem
attempting to reduce call button response time. to think. We are not conducting research to win
During a meeting that was intended to identify the Nobel Prize in physiology or medicine. We
a measurement plan, one team member blurted are trying to understand the variation that lives
out, “Why don’t you go measure 4 West? I know within our processes in order to make things more
they are worse they we are.” When measurement effective and more efficient for those we serve.
becomes threatening, the workers will conclude Therefore, the concept of measurement that is
that measurement should be for someone else, “good enough” needs to be our guiding principle.
not for them. The truth of the matter is that the The basic purpose of quality measurement is to
primary audience for measurement is the man- inform the team or organization about its general
ager of the department or unit and the workers. direction and whether it is moving toward its
These are the people who own the process and goals and objectives. You do not need p-values
who should be responsible for its performance. at the 0.01 or 0.05 level of significance to tell you
If the organization does not have a philosophy this. As one chief executive officer (CEO) told me,
of measurement and a set of related tactics for “If it passes the sniff test, that’s good enough for
deploying measurement throughout the organi- me.” Furthermore, we are not trying to conduct
zation, then measurement will generally become research that is designed around the randomized
a threat. A strategic focus on measurement as control trial (RCT) approach. RCTs are essential
described by Caldwell (1995) will do wonders to test theories and build new knowledge. This
to overcome this roadblock. is how medical science has advanced. But when
we are engaged in quality improvement (QI) we
are designing analytic rather than enumerative
Roadblock #2: The Desire studies as was discussed in Chapter 2. Do not
make your measurement efforts so precise and
for Precision pure that you never proceed to the most important
Health care is not classified as a science. The question: “Are we making a difference?” In short,
federal government actually classifies healthcare if an organization spends its time developing
jobs as service jobs, along with car repair, lawn academically or scientifically precise measures,
service, and beauty shops. Sure, we use science it will probably never get started on its QMJ.
and technology to accomplish what we do, but The desire for precision will be a convenient
by and large health care is considered a service. detour in your QMJ and an excuse for avoiding
It is interesting, therefore, that many people in the measurement mandate.
Measurement Roadblocks 97

This detour was demonstrated very nicely to utensils $10, salt and pepper $2 each, a napkin
me by a group of physicians during an evening for $4, and a water glass will cost you $5? You
meeting designed to discuss their hospital’s QI went to the restaurant with expectations that
plan. The manager of quality was doing a very certain minimal standards would be met before
good job of presenting the plan and the related you ordered your meal. Not finding them you’d
indicators. Then she got to the project on deep probably leave.
vein thrombosis (DVT). She described the What are the minimal acceptable standards
indicator (percentage of patients evaluated for in a hospital or medical setting? In the United
DVT risk) and then showed the historic baseline States, standards for healthcare organizations are
and the results for the last 8 months. Instead set by a variety of governmental and nongov-
of discussing why the hospital’s performance ernmental bodies. The Joint Commission (JC),
on this indicator was declining, the physicians which accredits hospitals and other healthcare
became embroiled in a debate over the num- providers, is a dominant player in this field. The JC
ber of charts being reviewed and whether the sets standards and regularly through announced
sample of patient charts had sufficient “power” and unannounced visits provides assessments on
to be statistically significant. The detour they whether or not the facility “met the standards.”
took was based on not understanding sampling Once a standard is achieved, however, complacency
methods for QI projects. The sample pulled for often sets in and people say, “What more do you
the improvement project (20 charts per month want from us? We met the standard.” I have heard
through a stratified random process) was good many healthcare professionals claim that they
enough for the purposes at hand (i.e., deter- did not have to get any better because they were
mining how well the hospital was evaluating already at the JC standard. I guess this means they
the risk for a DVT). As I sat and watched this believe that their performance is acceptable and
discussion unfold, I realized that it was a perfect in need of no further improvement. If standards
example of Roadblock #2. They were questioning serve as the goal for the quality journey, then it
the method and the data instead of discussing will be a limited journey.
the processes by which they evaluate a patient’s What worked to satisfy customers or meet
potential for a DVT. Precision was creating a the prescribed standards today may not be
roadblock for improvement. acceptable tomorrow. For example, assume
that you met the JC standards during your last
survey review. What are you going to do when
Roadblock #3: Using Standards the Centers for Medicare and Medicaid Services
(CMS) starts releasing hospital data showing that
as Performance Objectives your facility is “significantly” above expected
Standards basically set limits on performance. In mortality percentages for the treatment of heart
fact, standards are usually considered minimal attack patients? Your insistence that you met the
acceptable levels of performance. Excellence JC standards will carry little weight at this point.
is a very different concept. For example, when The concepts of baseline, target, and goal provide
you go to a restaurant you have certain stan- a much better frame of reference than standards.
dards you expect to have without paying. You Compliance with standards and the desire to
expect to have a table, a chair, eating utensils, perform only at this level, therefore, guarantee
a napkin, salt and pepper, and a water glass. that an organization is not really committed to
What if, however, the waiter showed you to an QI. Improvement is a never-ending pursuit of
open area of the restaurant that had none of excellence. Meeting standards is acceptance of
these expected standard components and told current performance and a willingness to say,
you that the table would cost $20, a chair $15, “We’re good enough.”
98 Chapter 4 Milestones in the Quality Measurement Journey

Roadblock #4: Limited Quality of Manufactured Product, which stands


even today as the landmark reference on SPC.
Knowledge of Statistical The good news is that healthcare professionals
Process Control are becoming more aware of what SPC can do
to assist them in their quality journey. The bad
This roadblock relates to the use of statistical news is that we still have a long way to go before
techniques, such as Shewhart control charts, to statistical thinking is commonplace throughout
(1) understand the variation that lives in a process the healthcare industry.
and (2) determine whether interventions have
actually made a difference in the performance of
the process. Most healthcare professionals have Roadblock #5: Numerical
had at least one course in statistics at some point
in their careers. Yet exposure to basic statistics Illiteracy
is not sufficient for those who plan to manage, Having skills in the use of SPC is not enough
coach, or lead improvement efforts. to produce world-class quality. SPC provides a
Statistical process control (SPC) is a separate wonderful foundation, but the real test comes
and distinct body of knowledge from what many in applying SPC knowledge to overcome the
refer to as “traditional” or enumerative statistical fifth and final roadblock—numerical illiteracy.
methods. Individuals who attempt to apply statistical Wheeler (1993, p. vi) describes numerical illit-
notions (such as testing the null hypothesis and eracy as follows: “Numerical illiteracy is not a
using p-values to determine statistical significance) failure with arithmetic, but it is instead a failure
to their QI efforts will quickly make the wrong to know how to use the basic tools of arithmetic
decisions and then become disillusioned.1 The to understand data. Numerical illiteracy is not
reason for this disillusionment is simple: they addressed by traditional courses in primary
are using statistical techniques and methods that or secondary schools, nor is it addressed by
are designed to answer questions about efficacy advanced courses in mathematics. This is why
instead of techniques designed to answer ques- even highly educated individuals can be nu-
tions about effectiveness and efficiency (Brooke, merically illiterate.”
Kamberg, & McGlynn, 1996). What is needed to overcome numerical
I have been teaching SPC methods to illiteracy is what the Statistics Division of ASQ
healthcare professionals for more than 30 years. calls “statistical thinking.” The vision of the
During this time, there has been an increase in Statistics Division is that statistical thinking
not only the level of knowledge that healthcare will be found in all aspects of organizational
professionals have about SPC but also its appli- behavior and performance. FIGURE 4-1 depicts
cation to healthcare issues. Organizations like
the American Society for Quality (ASQ) and
the Institute for Healthcare Improvement (IHI) Systems thinking Statistical methods

have made major contributions to spreading


statistical thinking and the use of SPC methods. Process Variation Data Improvement
But we are still at the beginning stages of this
journey when compared to the use of SPC in
Philosophy Analysis Action
manufacturing and industry. Use of SPC in these
sectors can be traced back to the mid-1920s
Statistical thinking everywhere
when Dr. Walter Shewhart first formalized the
theories and methods behind the control chart. FIGURE 4-1 Vision of the ASQ Statistics Division
In 1931, Shewhart published Economic Control of Reprinted with permission of ASQ.
Milestones in the Quality Measurement Journey 99

this vision. Statistical thinking encompasses five way to approach all aspects of work. It is a way of
key components: thinking about numbers and how they can be
used to make improvements. Statistical thinking
■■ Systems thinking is the primary way to immunize yourself against
■■ Statistical methods numerical illiteracy.
■■ Philosophy (of measurement) The five roadblocks described in this sec-
■■ Analysis (and interpretation) tion are not insurmountable. The first step in
■■ Action overcoming them is merely to be aware that they
exist. Once they are acknowledged and under-
As you can see from Figure 4-1, knowledge of stood, then it is time to take steps to immunize
statistical methods is only one aspect of statis- yourself against their proliferation. The rest of
tical thinking. Statistical thinking is a much this text is directed toward this goal.
broader notion that has the ability not only to
overcome the numerical illiteracy roadblock but
also to provide a clear roadmap for the entire
quality journey.
Deming’s views on the value of statistical
▸▸ Milestones in the
thinking are well known and have been clearly Quality Measurement
detailed in his writings (1950, 1960, 1975, 1992,
1994). Mann (1989) provides an excellent over- Journey
view of how Deming and his colleagues regarded
the role of statistical thinking with examples of Any successful journey begins with a plan, a good
how Deming influenced statistical thinking in roadmap and a clear understanding of the key
this country and in Japan. In Chapter 3, titled the milestones along the way. Developing good
“Statistical Methods for Tapping into the Infor- indicators is not all that different from planning
mation Flow Generated by a Process,” Mann uses a good road trip. The roadmap we use to guide
the following quotation from Deming to clarify our QMJ is shown in FIGURE 4-2. A completed
the difference between using common sense and QMJ is shown in FIGURE 4-3. The details on each
statistical thinking to make decisions: “There milestone are presented next.
are many hazards to the use of common sense.
Common sense cannot be measured. You have to
be able to define and measure what is significant. Welcome to Conceptland
Without statistical methods you don’t know what There are two major segments of the QMJ. In
the numbers mean” (Mann, 1989, p. 62). the first segment of the journey, you will come
Along this same line, Mann references the upon two of the milestones: Aim and Concepts.
following point made by William Conway, the Specifically, the team will need to develop an aim
former CEO of the Nashua Corporation: “He for the improvement work and then identify the
pointed out [during a panel discussion] that relevant concepts that characterize or capture
one of the greatest handicaps of people who this aim. But know full well that these milestones
are trying to improve productivity and quality take you to “Conceptland” not “Measurement-
is that they attempt to deal with these matters land.” Many people seem to live permanently in
in generalities. The use of statistics is a way of Conceptland. This is not a bad place to visit but
getting into specifics that will allow managers if you never leave this land your QMJ will come
and workers to make decisions based on facts to an abrupt and unproductive end.
rather than speculation and hunches” (Mann, The first milestone, therefore, is to establish
1989, p. 62). In short, statistical thinking is a an aim for your improvement work. What does
100 Chapter 4 Milestones in the Quality Measurement Journey

indicator? No. It is a concept that captures


AIM (How good? By when?)
one dimension of harm. We still do not have
Concept specific quantifiable indicators that allow us to
Measure measure inpatient falls. Stating an aim or even
Operational Definitions the concepts that further define the components
Data Collection Plan or aspects of an aim causes teams to live in
Data Collection
Conceptland.3 For example, when I ask teams
or managers what they plan to measure they
Analysis ACTION
say things like, “We need to improve patient
satisfaction,” “We need to reduce medication
FIGURE 4-2 Milestones in the quality measurement errors,” or “We need be more efficient.” Again
journey these are visions of what might be. They are
noble and good but the statements are visions or
desired end states. The problem is that these are
not even aim statements because the concepts
AIM - reduce patient falls by 37% by the end
of the year they are referencing (i.e., patient satisfaction,
Concept - reduce patient falls medication errors, or efficiency—concepts not
Measures - Inpatient falls rate (falls per 1000
indicators) do not have a specific reference as
patient days) to how good they want performance to be and
Operational Definitions - # falls/inpatient days by when they expect to achieve this result. They
Data Collection Plan - monthly; no sampling;
are visions of what might be.
all IP units This does not mean that these two milestones
Data Collection - unit collects the data are not important. These types of statements are
essential in order to get a team pointed in the
Analysis - control chart
(u-chart) ACTION right direction for the start of their QMJ but
such statements only provide a vague sense that
we need to go “that way.”4
FIGURE 4-3 A completed quality measurement
In Figure 4-2 and 4-3, therefore, you can
journey
actually draw a line after the second milestone
(Concept), which serves as a frontier or barrier
that separates the two milestones in Concept-
the improvement team want to accomplish? How
land from the remaining five milestones in, you
good do they want to be? By when do they plan
guessed it. . . “Measurementland.”
to accomplish this outcome?2 An aim statement
is like a compass, it sets the direction for the
QMJ and points you toward your destination: You Are Now Entering
Measurementland. But Aims and Concepts are
not indicators. Measurementland
In Figure 4-3, we see that the team’s aim is Because many people (e.g., board members,
to reduce inpatient harm by 37% by the end of nonexecutives, senior leaders, the press, ­political
the calendar year. Is this an indicator? No. It is leaders, and even patients) either live in or
a desired end state or a vision of what could be. frequently visit Conceptland, an organization
So, we ask the team to be more specific in order needs to have individuals with skills that can
to measure the dimension of “patient harm.” move teams and leaders beyond visions, aim
Then they respond, “OK, a key dimension of statements, and concepts to address the milestones
inpatient harm is falls. So we need to reduce marking the road through Measurementland.
inpatient falls.” Is reducing inpatient falls an These individuals need to have skills in building
Milestones in the Quality Measurement Journey 101

quantifiable indicators (e.g., a count, a percent- time. Most healthcare professionals during the
age, a rate, a score, an index, days between an early 1980s did not readily embrace Donabedi-
event or cases between events) that are accepted an’s model for evaluating medical quality or his
as reasonable ways to capture the concepts of suggestions for building indicators that represent
interest, build data collection plans, and have structures, processes, and outcomes.
knowledge of applying SPC methods to the Kaplan and Norton (1992, 1993, 1996) made
collected data. major contributions in this area by describing
the components of what they call a “balanced
scorecard.” Even though their work has been
Indicator Milestones directed more toward for-profit companies,
Before you actually start your measurement the basic message they present is applicable
journey and reach the individual milestones in to the healthcare industry. Specifically, they
the QMJ, however, you need to make two brief argue that “no single measure can provide a
stops in order to prepare for your journey. The clear performance target or focus attention on
first stop is where you need to decide upon the the critical areas of the business” (1992, p. 71).
types of indicators that appropriately capture According to Kaplan and Norton, an organization
the team’s aim and related concepts that need should monitor a set of “balanced” indicators
to be tracked. The second stop is where you that represent the key strategic areas in the or-
will need to select specific indicators within ganization’s business plan. A well-selected and
the various types you have identified. Let’s start organized set of indicators should also place
by considering the various types of indicators strategy and vision, not control, at the center of
that could be used to capture a team’s aim and the organization (1992, 79). The key word for me
related concepts. when I first read the work of Kaplan and Norton
Besides the seminal work of Florence was “balanced.” Health care has a long and rich
Nightingale and Ernest Codman (see Chapter history when it comes to tracking data. What
1 for details) Avedis Donabedian is another we have not done particularly well, however,
physician leader who contributed significantly to is to make sure that the data we do collect is
the field of indicator development. Donabedian tied to our strategic objectives and represents,
provided the first contemporary framework for therefore, a balanced set of measures that cut
developing what I consider to be a balanced across the full range and scope of the clinical,
set of indicator types related to the delivery of operational, and customer-focused services
medical care. In his classic two-volume work being delivered.
Explorations in Quality Assessment and Mon- The Joint Commission, a U.S.-based and
itoring (1980, 1982), Donabedian described, international accreditation body for healthcare
in considerable detail, three key points in the providers, in 1993 identified nine dimensions
delivery of medical services: of clinical performance that could be used to
■■ Structures (the tools, resources, and orga- categorize indicators:
nizational components) ■■ Appropriateness
■■ Processes (activities that connect patients, ■■ Availability
physicians, and staff) ■■ Continuity
■■ Outcomes (results) ■■ Effectiveness
He then suggested that measures should be ■■ Efficacy
developed to capture these three dimensions of ■■ Efficiency
medical service. Even though Donabedian pro- ■■ Respect and caring
vided a simple model for organizing indicators, ■■ Safety
like Codman, he, too, was a little ahead of his ■■ Time lines
102 Chapter 4 Milestones in the Quality Measurement Journey

The Institute of Medicine’s (IOM) report different directions or dimensions. Balancing


Crossing the Quality Chasm (2001) played a indicators help you think about unanticipated
major role in identifying six aims that many consequences or other factors that might
organizations use to organize their indicators: influence the outcome. Indicators of this
type will help you determine if (1) you have
■■ Safe
improved one aspect of the system but made
■■ Effective
something else worse or (2) witnessed an
■■ Patient centered
improvement in the outcome that was not
■■ Timely
causally related to anything that the team
■■ Efficient
actually did to change the process.
■■ Equitable
TABLE 4-1 provides examples of outcome, process,
Another very useful way in which to think
and balancing indicators for a family practice
about categories or types of indicators is the
clinic.
value compass (Nelson, Batalden, & Godfrey
In Table 4-1, the topic of interest is fo-
2007). The authors propose two forms of the
cused on the patient experience. Two concepts
value compass: one for clinical systems and the
are being addressed: (1) waiting time and
other for the patient. The clinical system value
(2) patient satisfaction. Two outcome indicators
compass proposes organizing indicators around
have been identified: (1) the total length of time
functional outcomes, clinical outcomes, customer
(in minutes) for a scheduled appointment at
satisfaction, and costs or resource issues. On the
the clinic (note that it is only for scheduled
patient side the four dimensions are similar but
appointments); and (2) the percentage of
have slight modifications to accommodate the
patients marking Strongly Agree to the single
VOC: functional status, expectations (of the pa-
question “Would you recommend our clinic to
tient), clinical status, and costs or resource issues.
family and friends.”
At the IHI we teach teams to consider
In terms of process indicators, the team
three types of indicators: outcome, process, and
decided to track four dimension of the care
balancing. This is what we refer to as a family of
process. Two of the indicators relate to different
measures that capture three distinct and critical
components of waiting (i.e., check-in time to
aspects of any improvement effort:
being seen by the doctor and time spent wait-
■■ Outcome Indicators: These indicators ing for ancillary service). The third indicator
should reflect and capture the VOC. How focuses on the discharge process and whether
is the process or system performing in light the patient received appropriate discharge in-
of the stated aim? What are the results? structions related to the reason for the visit. The
How close are the observed outcomes to the fourth and final process indicator is qualitative
specified targets or goals? How satisfied are in nature (patient and staff comments on the
the individuals who receive the outcome(s) flow of the process).
of the process? The final column in Table 4-1 addresses
■■ Process Indicators: These should reflect the balancing indicators of which there are four.
processes and their related indicators that Remember that in specifying balancing indicators
drive the outcomes. How much variation is we are attempting to understand whether our
there in the process? Are the parts or steps in improvement efforts are creating any unintended
the process or system performing as planned? consequences. For example, consider the first
Are the process indicators you select causally balancing indicator volume of patients. What if
connected to the outcomes? the volume of patients coming to the clinic or
■■ Balancing Indicators: These indicators scheduling appointments for a particular month
help you look at a process or system from declined? What impact might a declining number
Milestones in the Quality Measurement Journey 103

TABLE 4-1 Outcome, process, and balancing indicators for a family practice clinic

Balancing
Topic Outcome Measures Process Measure Measures

Improve waiting Total length of Time from check-in Volume of patients


time and patient stay (in minutes) until seeing the Percentage of
satisfaction in the for a scheduled doctor patients leaving
family practice clinic appointment at the Patient/staff without being seen
clinic comments on the by a doctor
Percentage of flow Staff satisfaction
patients marking Percentage of Financials
Strongly Agree to the patients receiving
question “Would you discharge material
recommend our clinic Wait time for
to family and friends?” ancillary services (lab,
x-ray, ultrasound)
during a visit

of visits or scheduled appointments have on wait make changes in the clinic’s process (the de-
times? We would most likely see a drop in wait tails of which are not important at this point)
times to see the doctor because there are fewer that actually reduced wait time and improved
people in the pipeline and thus the backups and patient satisfaction? But when you assess staff
delays would be reduced. So, when you show up satisfaction you discover that it is going down.
for your appointment you get to see the doctor in When you talk to the staff you get comments
less time than your previous visit simply because like this: “Sure the changes you made to the
the volume of patients coming to the clinic has process have improved things for the patients
been reduced. Chances are that patients would but they have made operating conditions for the
also be more satisfied with the process (one of staff more complicated. If this continues I know
the outcome indicators) because they waited of a couple staff members who are considering
less time to see the doctor. leaving the clinic.” In this case, what have you
Similarly, if the percentage of patients gained? You have improved the status of one
leaving without being seen by the doctor (the group (i.e., the patients) and compromised
second balancing indicator) goes up, chances another group (i.e., the staff). Deming referred
are that those who do not leave will be seen to this as suboptimizing the system. Balancing
faster and therefore also have higher satisfac- indicators help prevent suboptimization and
tion levels. In both these situations, the team make sure you are considering unintended
has done nothing to intentionally improve the consequences of your efforts.
process. Other factors (i.e., reduced volume Irrespective of the various types of indicators
and an increase in the percentage of patients that could be identified, the key point is that
leaving without being seen) created a false a balanced approach to the types of indicators
impression that things have gotten better. is far superior to a narrow focus. A singular
Now consider staff satisfaction as a balancing or narrow focus on one or even two types of
indicator. What if the improvement team did indicators will lead to shallow knowledge and
104 Chapter 4 Milestones in the Quality Measurement Journey

ultimately suboptimal performance of improve- and specification of the indicators. Too often,
ment teams. A balanced approach to indicator however, teams focus almost exclusively on the
development does not mean, however, that you direction of change, the target, the expected goal,
have to measure 30 or 40 indicators. Focusing or the desired end state and end up developing
on the vital few (with emphasis placed on the confusing indicators.
word “few”) is preferable to assembling an In terms of our inpatient falls example from
unmanageable array of indicators that require Figure 4-3, the critical question is what specific
a small army to collect, analyze, and interpret. indicators do you propose to develop that capture
More will be said on this point in Chapter 5 the concept of inpatient falls? The following
when we look at the development of strategic specific indicators could be used:
dashboards.
■■ The number of inpatient falls (e.g., a simple
count of the number of inpatient falls each
day or week)
▸▸ Selecting a Specific ■■ The percentage of inpatients who fell once
or more while they were in the hospital
Indicator ■■ The falls rate, which includes multiple falls
by the same patient during their admission
Once you have decided which types of indicators
and is defined as the number of falls per
are most appropriate, the next step is to select the
1,000 inpatient days5
specific indicator(s) that will be measured within
■■ Days between inpatient falls
each type. Although this seems like a straight-
forward activity, I have found it surprising how Each of these indicators identifies a specific
many teams struggle with this task. An indicator way to look at the inpatient falls concept. Each
is a specific quantifiable aspect of an outcome or a indicator has value and the team will have to
process. Yet all too often teams confuse themselves decide from the various ways to measure in-
by wandering around in Conceptland and never patient falls which one or two indicators will
move on to the detailed markers indicating that best serve as the outcome indicators. The team
they have actually entered Measurementland. For will also need to identify a list of indicators for
example, in Figure 4-3 the concept of interest is processes related to the falls prevention process
to reduce inpatient falls. There are two critical and they should consider selecting one or two
aspects of moving from this concept of reducing balancing indicators to provide insights on the
inpatient falls to actually measuring whether issue of suboptimization.
falls have been reduced. First, the team needs to TABLE 4-2 provides examples of concepts
decide on the specific quantifiable indicator(s) and the specific indicators that could be used
that will represent inpatient falls. Second, the to measure each concept. The decision as to
word “reduce” is not relevant to the selection which indicator is selected (from this list or a
and development of the indicator(s) that will new list of indicators that a team might develop)
be used to measure the concept of inpatient depends on the questions that a QI team is
falls. Whether the indicator demonstrates a trying to answer, the availability of data, and
reduction or an increase or stays at the current ultimately the team’s aim. If you phrase the
level is irrelevant to specifying the indicator. question in terms of the absolute volume of an
We will find out if the indicator is moving in activity, you might be interested in tracking a
the desired direction once we collect data and simple count of the number of events (e.g., the
move to the analysis milestone in the QMJ. Until number of inpatient falls). If, on the other hand,
then, the targeted direction for the indicator you were interested in a relative measure, then
(i.e., an increase in indicator X or a decrease in you would be better off measuring falls as a
indicator Y) has no added value to the naming percentage or possibly as a rate. When it comes
Selecting a Specific Indicator 105

TABLE 4-2 Moving from a concept to a specific indicator

Concept Potential Indicators for This Concept

Patient falls ■■ The number of patient falls


■■ The percentage of patient falls
■■ The patient falls rate
■■ The number of days between inpatient falls

Cesarean sections ■■ The number of cesarean sections


■■ The percentage of cesarean sections
■■ The cesarean section rate

Care of surgical ■■ The percentage of post-op deaths (sorted by American Society of


patients Anesthesiologists class)
■■ The number of days between the occurrence of post-op deaths
■■ The percentage of unexpected returns to surgery
■■ The number of successful cases before there was a return to surgery
within 24 hours

Care of coronary ■■ Intubation time post CABG


artery bypass graft ■■ The percentage of prolonged post-op CABG intubations
(CABG) patients ■■ The percentage of CABG patients with a hospital acquired infection
■■ The percentage of CABG patients returning to surgery within 24 hours

Patient scheduling ■■ The average number of days between a call for an appointment and the
actual appointment date
■■ The percentage of appointments made within 3 days of the call for an
appointment
■■ The number of appointments scheduled each day
■■ The number of days between a call for an appointment and the first
available appointment

Employee retention ■■ Total number of full-time equivalents (FTEs)


■■ Percentage of employee turnover
■■ Employee turnover rate
■■ Average number of years employed by the organization
■■ The percentage of new hires who leave during the first year

Employee evaluations ■■ The number of evaluations completed


■■ The percentage of evaluations completed on time
■■ Variance from due date of a completed evaluation

Care of emergency ■■ The number of unplanned returns to the emergency department (ED)
patients within 24 hours
■■ The percentage of ED patients admitted as inpatients
■■ The percentage of ED transfers to other facilities
■■ The patient wait time in the ED

(continues)
106 Chapter 4 Milestones in the Quality Measurement Journey

TABLE 4-2 Moving from a concept to a specific indicator (continued)

Concept Potential Indicators for This Concept

Implementation of a ■■ The number of patients who had restraints applied


restraint protocol ■■ The percentage of patients placed in restraints
■■ The restraint usage rate

Documentation of ■■ Transcription turnaround time


histories and physicals ■■ The time from patient admission to the physician-dictated H&P
(H&Ps) ■■ The percentage of incomplete H&Ps

Medication usage ■■ The total number of medication orders placed each day
■■ The number of medication orders that had one or more errors
■■ The time it takes to deliver a med order to the unit once the order is
received in the pharmacy
■■ The medication error rate
■■ The number of wasted IVs

Customer satisfaction ■■ The number of patient complaints


■■ The percentage of patients providing positive responses to a survey
■■ The percentage of patients who indicated that they would recommend
the facility to a family member or friend
■■ The percentile ranking for employee satisfaction in a national database
■■ The percentage of physicians indicating that your hospital is an
“excellent” facility

Home care visits ■■ The number of home care visits


■■ The average time spent during a home care visit
■■ The percentage of time spent traveling during each home care visit
■■ The number of visits each days for each home care nurse
■■ The number of bottles of home oxygen delivered

Pastoral care ■■ The number of patient encounters by the pastoral care staff
■■ The number of minutes spent during a patient encounter
■■ The percentage of inpatient admissions that have properly documented
the patient’s religious preference
■■ The number of requests from nursing units for assistance

Delivery of oncology ■■ The percentage of outpatient oncology patients who have to be admitted
services ■■ An individual patient’s platelet counts
■■ The total inpatient cost to treat a cancer patient
■■ Mood scale index scores for cancer patients

Successful quality ■■ The number of participants attending a QI class


improvement (QI) ■■ The percentage of cancellations
training ■■ The percentage of no-shows
■■ The information recall scores at 30, 60, and 90 days
Selecting a Specific Indicator 107

TABLE 4-2 Moving from a concept to a specific indicator (continued)

Concept Potential Indicators for This Concept

Ventilator ■■ The number of patients on a ventilator


management ■■ The percentage of patients placed on a ventilator
■■ The number of days on a ventilator
■■ The ventilator-associated pneumonia rate

Electronic access to ■■ The percentage of med orders submitted via the computerized
information physician order entry (CPOE) system
■■ The minutes of system downtime
■■ The percentage of physicians who regularly use online protocols
■■ The number of visits (hits) to the organization’s website

Outpatient testing ■■ The total number of outpatient visits and therapy


■■ The wait time to have a blood draw (or any other procedure)
■■ The percentage of outpatient procedures with a complication
■■ The complication rate for outpatient procedures
■■ The time it takes to complete a colonoscopy procedure

Lab production ■■ Lab turnaround time


■■ The total number of lab orders
■■ The percentage of inaccurate lab orders
■■ The percentage of stat lab orders exceeding target
■■ The percentage of stat lab orders

to indicator selection, there are more options think will best capture each concept of interest.
than most people realize. It is also important Finally, indicate whether each listed measure
to realize that there are no universally accepted is an outcome, process, or balancing measure.
“best” indicators of healthcare performance. TABLE 4-4 provides an example of a completed
A concept may be the same (e.g., inpatient falls) indicator worksheet. A key point related to
or even types of measures (e.g., outcome, process, this worksheet is that you do not need to have
or balancing) across different systems, regions, a lengthy summary of each of your indicators.
provinces, or even countries but the specific You can take this completed worksheet to a
indicators and the subsequent milestones that management meeting and say, “Here are the
mark the QMJ can be very different. indicators for our improvement team.” It is
TABLE 4-3 provides a worksheet to help clear and yet specifies the key components of
you move from concepts to indicators. In the how you have moved from a concept to specific
left column of this worksheet, list the concepts indicators plus identifying the types of indicators
you are interested in measuring. The next you will be tracking.
column should then list the specific quantifi- Summary conclusions about moving from a
able indicator(s) (e.g., count, percentage, rate, concept to a quantifiable indicator are provided
score, index, days between, cases between) you in BOX 4-1.
108 Chapter 4 Milestones in the Quality Measurement Journey

TABLE 4-3 Organizing your indicators worksheet


Topic for Improvement: __________________________________________________________________

Concept Potential Indicators Outcome Process Balancing

TABLE 4-4 Example of a completed organizing your indicators worksheet


Topic for Improvement: Inpatient Falls Process

Concept Potential Indicators Outcome Process Balancing

Patient harm Inpatient falls rate ✓

Patient harm Number of falls ✓

Compliance Percentage of inpatients ✓


assessed for falls

Staff education Percentage of staff fully trained ✓


in falls assessment protocol

Assessment The additional time it takes ✓


time to conduct a proper falls
assessment
Developing Operational Definitions 109

specific steps needed to measure it consistently.


BOX 4-1 Conclusions about moving A good operational definition:
from a concept to an indicator ■■ Gives communicable meaning to a concept
or idea
1. Moving from a concept to an indicator ■■ Is clear and unambiguous
requires focused work to create ■■ Specifies the measurement method, proce-
agreement about adjectives, such as
dures, and equipment (when appropriate)
recovery, major, timely, complete,
accurate, or excellent.
■■ Provides decision-making criteria when
2. A concept may need more than necessary
one indicator and, therefore, the ■■ Enables consistency in data collection
development of more than one
Some groups are better at developing op-
operational definition.
erational definitions than others. For example,
3. The transition from concept to an
indicator doesn’t just happen; it requires political leaders typically shy away from clear
both technical and clinical decision and unambiguous operational definitions so they
making to be blended with pragmatism can change their positions or follow a different
and acceptance of the imperfections of approach. But in this age of instant information,
the measures. social media, and the ability to record statements
4. There is no such thing as a fact! easily and quickly politicians are starting to be
(W. E. Deming) more concerned about the terms and definitions
they use. For example, consider the following
list of terms that are used frequently during
political campaigns:
■■ A “fair tax”
▸▸ Developing ■■
■■
A “tax loophole”
We need to “jump start” the economy
Operational ■■ The “rich” need to give more to the “poor”
The “middle class” needs tax relief
Definitions
■■
■■ We need to get this country “moving” again
■■ The “small farmer” needs economic support
The real work of indicator development begins
after you have selected and named a specific All of these terms require clear operational
indicator. Now it is time to develop an oper- definitions if there is to be a consistent under-
ational definition. I find the specification of standing of what they mean and how we would
operational definitions to be one of the more measure them. In the political arena, however,
interesting and intriguing aspects of indicator the desire is to frequently have a certain amount
development. Every day we are challenged to of ambiguity surrounding concepts and terms
think about operational definitions. They are so that the person presenting the idea cannot be
not only essential to good measurement but held to a single position or definition.6
also critical to successful communication be- On a more personal note, I had a great
tween individuals. For example, if you tell your example of an operational definition when my
teenage son or daughter to be “home early” daughter Devon was 9 years old. Devon and
from a party, you will quickly understand the her friend Janine called up to me and asked,
necessity of establishing a clear operational “When are you going to take us for the ice cream
definition. you promised?” My answer would have made
An operational definition is a description, in any politician proud. I responded confidently,
quantifiable terms, of what to measure and the “Soon.” That appeased them for about 15 m ­ inutes.
110 Chapter 4 Milestones in the Quality Measurement Journey

Then they called up again, and this time when are fed pellets that consist of vitamin-enriched
I answered, “Soon,” they demanded to know corn, wheat, and other non-fish protein then
how many minutes made up “soon.” Even a they are defined as “nonorganic.” Seriously, you
9-year-old child understands the need for a clear cannot make up stuff that is better than what
operational definition. you discover in real life.
One of the more interesting problems with One final personal story about operational
an operational definition involved the September definitions before returning to healthcare ex-
23, 1999 incident with the Mars space probe. amples. This is absolutely one of my all-time
European scientists used metric measurements favorites. My wife Gwenn was a nationally ranked
and calibrations (newton-seconds) to guide the official for women’s field hockey in the United
spacecraft. The probe was built by Lockheed States. She was doing one of the final games for
Martin and their engineers used decimal ref- the National Collegiate Athletic Association
erents (pounds/foot-seconds) to calibrate the (NCAA) championship in Iowa City, Iowa in
maneuvering of the probe. When the probe went the middle of November. It was very cold. So
around the far side of Mars and was ordered to cold in fact, that they had to place the hockey
go down toward the surface for a closer look and balls in a small warming device so they would
flyby, it was essentially receiving two different not crack or break when hit. The game was tied
sets of operational definitions. It followed the when one of the forward players hit a wicked
programming commands but because of the shot that was headed toward the opponent’s goal
differences in the operational definitions used cage. The telltale sound of a hockey ball hitting
by the builders of the spacecraft and those the metal backplate of the cage was heard by
maneuvering it the probe took a trajectory that all. The attacking team believed they had just
took it entirely too close to the planet, causing scored a winning goal. But the defending team
it to burn up in the Martian atmosphere. The quickly pointed out that only half the ball was in
difference between metric and decimal units of the cage. The other half of the ball was still out
measurement created an inconsistent operational on the playing pitch. So now what? Is it a goal
definition of the term “unit of distance.” As a or not a goal? What is the operational definition
result, a $125 million project became a NASA of a goal? Time out was called. The officials met
embarrassment. at the center of the field along with the timer,
A more recent example of confusing oper- the backup umpires, and an NCAA judge. This
ational definitions can be found in the ongoing is a very important game so a decision has to be
debate over what is a “healthy or natural” food. made. Goal or no goal? Gwenn, who is the lead
The U.S. Food and Drug Administration (FDA) official, is asked to render an opinion. She honestly
has been debating these terms for years. Food says, “I have no idea. This has never happened
scientists maintain that a majority of our food before.” So, they turn to the official rule book
products are not natural and therefore not healthy to see if there is an operational definition of “a
because they have been processed in one form or goal.” After a couple of minutes that seemed to
another and that they are no longer a “product of last hours, Gwenn announces that there is an
the earth.” A similar debate has been occurring answer in the rule book. It clearly states that
for over 15 years on the definition of an “organic a goal occurs “when the entire ball passes the
fish.” I thought all fish were organic because I plane of the goal line.” With half the ball still
have never been served a mechanical fish but lying on the pitch and the other half in the cage
I must be missing something in this debate. It turns the answer is easy. . .no goal. On the ride home
out that the debate hinges on what the fish in a I was mulling over the operational definition
fish farm (not wild fish but farm-raised fish) are of a goal and asked Gwenn a question “What
fed. If they are fed pellets that are made of other if upon being hit the ball did break in half but
fish they are considered “organic.” But, if they both halves went across the goal line and into
Developing Operational Definitions 111

the cage? Would this now be considered a goal? One meeting I especially remember produced
The entire ball was in fact in the cage.” There was two very different views of a partial fall. After
silence for a few seconds then she said, “I’m not describing the conditions of a partial fall, half
even going to address that question.” But I still of the nurses indicated that they would classify
wonder whether it would be a goal. the situation as a partial fall because the patient
Every day healthcare professionals must did bounce around a little before ending up on
deal with operational definitions. There are many the floor. Their reasoning was that the patient
healthcare terms that beg for more precise oper- bounced around a little, came in contact with
ational definitions. How does your organization some furniture, and eventually ended up on the
define the following terms? floor. The other nurses in the group reserved
their opinion until they found out the answer
■■ A patient fall
to one question: “Did the patient’s knee hit the
■■ A restraint
floor first?” If the answer was “Yes,” then they
■■ A good outcome for the patient
agreed it would not be a partial fall. If the answer
■■ A medication error
to this question was “No,” however, this group
■■ A complete and thorough physical exam
of nurses believed that this was a partial fall.
■■ A good employee performance review
The knee touching the floor was the primary
■■ Surgical start time
determinant of a “partial fall.”
■■ An accurate patient bill
Assisted falls are even more interesting
■■ A readmission
than partial falls. When I first heard this term
■■ A successful surgical outcome
I envisioned nurses getting so fed up with a
■■ An organization that supports its workers
patient that they gave him a gentle nudge and
■■ A late food tray
“assisted” him in falling. As I learned more about
■■ A clean patient room
this topic, however, I came to realize that an
■■ Healthcare disparities
assisted fall fortunately has nothing to do with
■■ A quick admission
the nurses causing the fall. It does, however, have
■■ A blameless culture for reporting errors
a very distinctive operational definition. Here is
Consider one of these terms that has the scenario. A patient decides to go for a walk
intrigued me for years—a patient fall. One of tethered to his IV pole. The patient takes a few
the first definitions I heard for a patient fall steps then announces to the nurse that she does
was “a sudden and rapid movement from one not feel very well and that things are starting to
plane to another.” This sounds like something move in circles. As the patient begins to sway
you try to do at a busy airport rather than the the nurse moves into position, grabs the patient,
definition of a negative patient outcome. It is not and assists her to the floor. But is this really a
very precise and leaves a lot to the imagination. fall? It seems to me it is more like a recline or
I have frequently heard nurses talk about two possibly a lay-down. Most nurses I have worked
basic types of falls: partial falls and assisted with agree that being present when a patient is
falls. Partial falls usually occur when the patient starting to go down and intervening to help break
attempts to get out of bed and discovers that the patient’s fall constitutes an “assisted fall.” But
he or she does not have an adequate amount there is certainly not universal agreement on the
of strength to permit ambulation. In this case, precise operational definition of an assisted fall.
the patient might stagger a little, slump back You can see the problem that all this poses
onto the bed, try to stand again, and attempt for measurement. If you are part of a multihos-
to make it to the chair by the window but ends pital system, a region, province, or a country or
up collapsing to the floor. As I have explored plan on comparing hospital outcomes across
this scenario with nurses and asked them if this providers, then you should make sure that
constituted a partial fall, I get mixed responses. each provider being compared is defining the
112 Chapter 4 Milestones in the Quality Measurement Journey

indicator of interest in the same way. Without Goal: 95%


such consistency you will end up with apples Frequency of Data Collection: Monthly
and oranges at best and more likely apples and
carburetors. The pieces will not be comparable, Numerator Definition: Number of patients
which means that ultimately the conclusions undergoing hip or knee replacement surgery
that are derived from the data are not accurate. who have had a nasal swab specimen pro-
All good measurement begins and ends with cessed to screen for Staphylococcus aureus
operational definitions. carriage prior to surgery
An example of an operational definition for Denominator Definition: Number of
the percentage of medication errors is summa- patients undergoing elective hip or knee
rized as follows. replacement surgery
Indicator Name: Percentage of medication Numerator and Denominator Exclusions:
errors • Patients who are less than 18 years of age
• Patients who had a principal or admission
Numerator: Number of outpatient medi-
cation orders with one or more errors. An diagnosis suggestive of preoperative
error is defined as wrong med, wrong dose, infectious diseases
• Patients with physician-documented
wrong route, or wrong patient
infection prior to surgical procedures
Denominator: Number of outpatient • Patients undergoing nonelective hip
medication orders received by the family or knee replacement surgery
practice clinic pharmacy
Definition of Terms: Hip or knee replace-
Data Collection: ment surgery includes operations involving
• This indicator applies to all patients placement of a nonhuman-derived device
seen at the clinic into the hip or knee joint space. ICD-9
• The data will be stratified by type of Codes include 00.70-00.73, 00.85-00.87,
order (new versus refill) and patient age 81.51-81.53, 00.80-00.84, 81.54, and 81.55.
• The data will be tracked daily and grouped
Calculate as: (numerator/denominator *
by week
• 100, with only 1 decimal place)
The data will be pulled from the phar-
macy computer and the computerized Summary conclusions about developing
physician order entry (CPOE) systems operational definitions are provided in BOX 4-2.
• Initially, all medication orders will be
reviewed. A stratified proportional
random sample will be considered BOX 4-2 Conclusions about developing
once the variation in the process is operational definitions
fully understood and the volume of
orders is analyzed. 1. Operational definitions are not universal
truths!
A second example of an operational definition 2. Operational definitions require
provides the details for a perioperative nasal agreements on terms, measurement
swabbing indicator. methods, and decision criteria.
3. Operational definitions need to be
Indicator: Percentage of patients under-
reviewed periodically to make sure
going hip and knee replacement surgery everyone is still using the same definitions
during the measurement period who have and that the conditions surrounding each
had preoperative nasal swabs to screen for measure have not changed.
Staphylococcus aureus carriage
Developing Data Collection Plans 113

▸▸ Developing Data to discover patterns that would not otherwise


be observed if the data were all aggregated. The
Collection Plans overall strategy is to minimize the variation within
a stratification category in order to compare the
I have separated this milestone from the actual variation between categories. By doing this you
collection of data because I do not believe that can increase your knowledge about the possible
as an industry we have devoted enough time to influence that the stratification levels might
thinking about the numerous factors that influ- have on the outcome indicator. Frequently used
ence the success or failure of our data collection stratification levels include:
efforts. Most people want to move directly from
■■ Age
“I have an indicator” to “Let’s go get some data”
■■ Gender
without spending much time thinking about how
■■ Socioeconomic status
to actually collect the data. From my perspective,
■■ Prior admission for the same diagnosis
planning for data collection should occupy about
■■ Day of the week
80% of your data collection time and the actual
■■ Time of day
act of collecting the numbers should consume
■■ Month of the year
about 20% of your time.
■■ Shift (day, afternoon, night)
Data collection is not unlike other aspects
■■ Type of order (stat versus routine)
of life that require planning. Whether the ac-
■■ Type of ambulatory procedure
tivity is painting a house, planting a garden, or
■■ Type of surgery
going on a major vacation, preparation is key.
■■ Machine (such as ventilators or lab equipment)
If you do not spend enough time preparing a
■■ Severity of the patients
wooden house, the paint will not last as long
■■ Tenure of the staff
as you would like it to. Similarly, if you do not
take time to properly prepare the soil in your If you do not think about the factors that
garden, the seeds and young plants will not get might influence the outcome of your data before
off to a very good start. Finally, a major vaca- you collect the data, you run the risk of having to
tion (e.g., a cruise or a bed-and-breakfast tour try to tease out the stratification effect manually
of Ireland) usually requires more time to plan after the data have been collected. At this point
than the time you actually spend on the holiday not only is it too late to effectively address the
itself. The act of data collection is very similar. stratification question, but you will also have
Inadequately prepared data collection plans will to engage in rework and wasted time to even
usually produce unacceptable results. The data attempt to untangle the stratification questions.
will be challenged, questioned, and/or seen as FIGURES 4-4 and 4-5 provide examples of
being rather useless. stratification problems. In the first example
There are several important data collection ­(Figure 4-4), the indicator of interest is turn-
issues that require some elaboration, most no- around time (TAT) in the lab (the particular test
tably stratification and sampling. Stratification does not matter at this point). The data reveal
is one of the best things a team can discuss that the process displays extremes because
when building indicators, yet it is frequently the team did not separate the TATs for the
overlooked. Stratification is more of a logical day and evening shifts. They merely collected
issue than a statistical one. It essentially consists data and combined the two shifts, which are
of the separation and classification of data into obviously different. In this case, the average
categories or homogeneous buckets that reflect TAT will fall exactly in the middle of the two
common characteristics. The objective of stratifi- extremes of data. The average and even the
cation is to create strata or categories within the standard deviation of the TAT are meaning-
data that are mutually exclusive and allow you less statistics for data like this. The mean and
114 Chapter 4 Milestones in the Quality Measurement Journey

Monday through Friday (the higher data points)


Turnaround time (TAT)

is markedly different from that generated on


Day shift TAT
Saturday and Sunday (the lower data points).
Average TAT This hospital is clearly not a 7-day-a-week
P.M. shift TAT hospital. If you were to calculate the average
revenue generated per day for this hospital
Day of the week you would get a misleading number. Although
FIGURE 4-4 A stratification problem with the overall mean would be skewed toward the
turnaround time weekday revenue side of the chart (because of
the higher volume generated during this time
as well as more days), it would not reflect the
average generated during the weekdays. For this
example, someone should have said before the
Weekday revenue data were collected, “Because we do not generate
the same amount of revenue on the weekend
as we do on weekdays, we should stratify the
Revenue

data into two categories—weekday revenue


and weekend revenue—and analyze the data
separately.”
Sampling is the second key component of
a data collection plan. Not every data effort will
Weekend revenue require sampling. If a process does not generate a
lot of data, then you will probably analyze all the
Day of the week
occurrences. This happens most often when the
FIGURE 4-5 A stratification problem with tracking indicator is a percentage. For example, when
revenue we compute the percentage of primary C-sections
for the month we typically do not use a sampling
plan. We usually take all the C-sections for the
standard deviation can be calculated of course month and divide this numerator by the total
but they are mathematical artifacts that are the number of deliveries (the denominator) for the
result of two distributions of data—one high month. When a process generates considerable
and one low. These data should be separated data, however (e.g., lab TAT for blood tests or
and two charts should be made—one chart for all admissions during the month), a sampling
the day shift TAT and another for the evening plan is usually appropriate. From my perspec-
shift. Stratification is an essential aspect of tive, building knowledge of sampling methods
data collection. If you do not spend some time is one of the most important things you can do
discussing the implications of stratification, you to establish efficient and effective data collection
will end up thinking that your data are worse strategies.
(or better) than they should be. Like stratification, sampling deals more
In Figure 4-5, the indicator of interest is with logic than statistics. Individuals trained
revenue by day of the week. There are several in the social sciences are typically exposed to
data points in a row that are at relatively high extensive training in sampling principles and
and roughly at the same level. Then there is concepts. Unfortunately, most healthcare pro-
a sharp drop in the data for two data points. fessionals are given only a cursory foundation
This pattern demonstrates a clear problem with in this subject. The irony with this situation is
stratification. In this case, the revenue generated that sampling is actually quite easy. Healthcare
Developing Data Collection Plans 115

professionals would grasp sampling principles say, “No way do I want you to use my Thanks-
quickly if they were exposed to them throughout giving Day weight.” Their initial reaction would
their formal training. probably be correct. On the average, for exam-
Try this simple test to demonstrate this point. ple, adults consume upwards of 5,000 calories
The next time you are with a group of healthcare on Thanksgiving Day. So most people would
professionals, ask them, “Have any of you ever probably say, “If you are going to weigh me once,
drawn a random sample?”7 Rather quickly you check me in the spring when I am trying to get
will receive a bunch of positive nods. When you back into my shorts or bathing suit.” To be even
ask one of the people who was nodding rather more reasonable (reliable), I might weigh you
energetically how they actually drew the random every couple of weeks as they do in many weight
sample, they will usually announce rather proudly control programs. In this way, I would obtain a
that they “picked every 10th chart.” Selecting more representative sample of your weight as
every 10th chart is a form of random sampling it fluctuates over time. Remember, as much as
known as systematic sampling (described later), you must and as little as you dare.9
but it can introduce considerable bias if the What happens if you draw a sample and it is
steps involved in drawing a systematic sample not representative of the population from which
are not followed. it was drawn? FIGURE 4-6 shows the relationship
The purpose of sampling is to be able to between three samples and a population. The
draw a limited number of observations and to larger curve represents the total population of
be reasonably comfortable that they represent interest (e.g., all asthma patients returning to the
the larger population from which they were emergency department [ED] within 24 hours).
drawn. If you had all the time and money in Curve A identifies a properly pulled sample of
the world you would never draw a sample. You patients. The shape and location of this sample
would always do a complete enumeration of are very similar to the population. Curve C,
all cases. But time and resources are limited, on the other hand, represents a sample that
so we draw samples.8 Whenever you draw a was drawn with a negative bias. In this case,
sample, however, the key question is, “How you could get the false impression that your
much data do I need?” One of my professors results were much worse than they really were
in graduate school, Dr. Bob Bealer, had a great just because you pulled a sample that came
answer to this question. When asked by one of from the negative end of the population curve.
my fellow doctoral students how much data we Similarly, Curve B depicts a positive sampling
should collect for our dissertation research, he bias, which leads you to an overly optimistic
merely answered, “As much as you must and conclusion. A well-designed sampling plan will
as little as you dare.” At the time I thought this not only produce data that are representative of
was a clever and rather professorial response. the population but also save time and money for
But after spending many years trying to help those collecting the data.
healthcare professionals develop reasonable There are many ways to draw a sample.
sampling strategies, I have come to realize that The key question you have to ask yourself
this was very practical advice. For example, if I whenever you want to draw a sample is, “How
wanted to check your weight by weighing you representative and precise do I need to be with
on only one day of the year, would you say this this sample?” For example, if you have received
is a representative sample of your true weight? numerous calls and complaint letters about the
As an aside, I should tell you that the day I have wait time in outpatient testing and therapy, you
selected to weigh you is Thanksgiving Day (A basically have two sampling options: (1) develop
U.S. holiday celebrated on the third Thursday a statistically based sample that allows you to
of November) after you eat. Most people would generalize to your total outpatient population,
116 Chapter 4 Milestones in the Quality Measurement Journey

Population

A representative
sample
A negatively A positively
biased sample biased sample

C A B

Negative outcome Positive outcome

Ideally the sample will have the same shape and location as the total population but have fewer
observations (curve A). A sample improperly pulled could result in a positive sampling bias
(curve B) or a negative sampling bias (curve C).

FIGURE 4-6 The relationship between a sample and the population

or (2) go out on any given day, grab a convenient (e.g., when you are establishing a sampling
handful of willing patients, and ask them how plan for patient satisfaction and you want to
they like your outpatient testing and therapy be able to draw reliable samples each month
services. If the level of precision you need to or quarter). Speed may be essential when you
answer this question is low, then option 2 is have to sample a number of blood specimens
appropriate. If, on the other hand, you need to determine whether there is a contamination
to be very sure (statistically sure) that there is problem. Finally, the economics of sampling
a problem in outpatient testing and therapy, will usually pose a challenge for everyone.
then you need to formulate a more scientific Each time you draw a sample, whether it is
approach to sampling. a sample of medical records or a sample of
Ishikawa, in his classic work Guide to Quality patients, there are economic factors involved
Control (1982), identifies four conditions for with the pull of data. Complicated sampling
developing a sampling plan: plans require more time, effort, and money. In
the end, however, it all comes down to a fairly
■■ Accuracy
simple question—how can you pull an accurate,
■■ Reliability
reliable, fast, and inexpensive sample? Obviously
■■ Speed
it is difficult to obtain a sample that meets all
■■ Economy
four criteria simultaneously. Sampling, there-
These four criteria should serve as a fore, really consists of a series of compromises.
fundamental checklist for building sampling It basically gets us back to Professor Bealer’s
designs. Not every sample will maximize all four words of wisdom, “As much as you must and
criteria. There are times when accuracy will be as little as you dare.”
the primary objective of sampling (e.g., when Sampling methods are basically divided
designing a randomized clinical trial). At other into two major categories—probability and non-
times reliability will become more important probability. Any standard research methods or
Probability Sampling 117

statistics book will provide a review of sampling not make the sample a probability sample.
methods. I would encourage you to obtain several Whenever judgment, purposeful intent, or
books on this topic to see how different writers convenience enter into the sampling plan,
classify and describe the various methods. Do you have moved from probability sampling to
not worry about the age of the book. Most of nonprobability sampling, which is addressed
my books on sampling, for example, are 20 to in the next section.
30 years old. Even when I pick up a new book on Campbell (1974, p. 143) identifies three
sampling, the terms remain virtually the same characteristics of probability sampling:
as those I find in my older books. 1. A specific statistical design is followed.
The terms and approaches to sampling have 2. The selection of items from the popu-
remained rather constant since the late 1930s.10 lation is determined solely according
I do not intend to replicate in this text what to known probabilities by means of
can be found in many good references (Babbie, a random mechanism, usually using
1979; Campbell, 1974; Daniel and Terrell, 1989; a table of random digits.
Duncan, 1986; Gonick and Smith, 1993; Hess, 3. The sampling error—that is, the differ-
Riedel, & Fitzpatrick, 1975; Ishikawa, 1982; ence between results obtained from a
Miller, 1964; Selltiz, Jahoda, Deutsch, & Cook, sample survey and that which would
1959; Weiss, 1968; Western Electric Co, 1985). have been obtained from a census
I provide, however, a brief review of the major of the entire population conducted
sampling methods and let the reader explore using the same procedures as in the
the details. sample—can be estimated and, as a
result, the precision of the sample
result can be evaluated.
▸▸ Probability Sampling There are numerous ways to draw a proba-
Probability sampling is designed to provide bility sample. They are all essentially variations
the highest possible level of predictability and on the simple random sample.
confidence in the sampled data at the most eco-
nomical cost to the researcher. Although most
people have some notion of what a probability Simple Random Sampling
or random sampling entails, many are unclear A random sample is one that is drawn in such
on the specific aspects of actually designing a way that it gives every element in the pop-
and selecting the sample. At the very founda- ulation an equal and independent chance of
tion of probability sampling is trust: trust in being included in the sample. This is usually
statistical probability and the fact that when accomplished by using a random number table
you draw a random sample you do not throw (usually found in the back of any good statistics
it away merely because it does not conform book) or a computer-based random number
to your personal belief about what the data generator (found in all statistical software
are supposed to tell you. I have drawn many programs and in many spreadsheet packages).
random samples for people over the years. On Step-by-step procedures for drawing a random
numerous occasions I have been questioned sample can be found in Probability Sampling of
about the “accuracy” of the samples because Hospitals and Patients (Hess et al., 1975) and
the individuals who requested the samples did in Flaws and Fallacies in Statistical Thinking
not like the results. In their minds, they thought (Campbell, 1974). Even though this method is
they should be allowed to pick and choose referred to as a “simple” random sample, the
what should be included (and excluded) in term “simple” can be a little misleading. The
the sample. If they were to do this it would mechanics of drawing a random sample may
118 Chapter 4 Milestones in the Quality Measurement Journey

not feel simple to those who have to number 23rd, 18th, and 4th. Because the clinic basically
all the elements in the population and learn knew how many patients they had scheduled for
how to apply a random number table or a that day, they could identify those patients from
computerized random number generator. the charts that had been pulled ahead of time.
As an alternative, you can simply write the This allowed the staff to be prepared to track the
names or numbers of the population elements various steps in the process for these patients
on separate pieces of paper, place them in a bowl, (i.e., the time from check-in to being called by
and draw out the sample. I did this to develop a the nurse, time with the nurse, wait time to see
sampling plan for a medical group. They wanted the doctor, time with the doctor, and finally
to sample the wait times of patients in one of checkout time). In this example, two random
their clinics, but they did not have the resources samples were selected, one for the day of the
to sample every day of the week. They initially week and another for the patients to be tracked
said they would pull a sample of patients every within a selected day. By using the pieces of paper
Monday. I advised them that this could produce and a bowl, we were able to apply the principles
biased results, because Mondays are typically busier and precision of probability sampling and avoid
than other days of the week. So I wrote the days of some of the complexity associated with using
the week (excluding the weekend) on five pieces random number tables or computer-generated
of paper (all of the same size), placed them in a random samples.
bowl, and then drew out a day of the week. The
first day I pulled was a Wednesday. This meant
that during the first week, a sample of patients Stratified Random Sampling
would be pulled on Wednesday and their wait This method of sampling is not an alternative to
times to see the physician would be recorded. I simple random sampling but rather a variation
placed the slip of paper with Wednesday written on a theme. Simple random sampling assumes
on it back into the bowl and drew another piece that the composition of the total population is
of paper. The second piece of paper had Friday unknown. A random selection process is seen,
written on it, which would be the sampling day therefore, as the best way to obtain a “representa-
during week 2. I replaced the piece of paper and tive” sample. The problem is that the very nature
repeated this process 23 more times to obtain a of a random selection process could produce
total of 25 sample days. To pull the sample of a sample that is not truly representative of the
patients on a given day, I first asked the staff characteristics of the total population. This is
to run a report showing the actual volume of where stratification comes into the picture. By
patient visits by day for the last 6 months. From stratifying the population into relatively homo-
this report we determined the minimum and geneous strata or categories before the sample
maximum number of visits as well as the mean, is selected, you increase the representativeness
median, and standard deviation (to see if the data of the sample and decrease the sampling error.
approximate a normal distribution). We determined Once the stratification levels have been identified,
that the average number of visits each day was 74 a random selection process is applied within
with a minimum of 63 and a maximum of 86. I each stratum. For example, you might stratify
advised them to place the numbers 1 through 86 a hospital’s patients into medical and surgical
on pieces of paper (again all the same size) and strata and then sample randomly within each
place them all in the bowl. We then proceeded to group. This would help to ensure that one group
draw a random sample of eight patients from the was not over- or underrepresented in the sam-
bowl for each sampling day. The first day to be ple. A key point to remember when setting up
sampled, for example, was a Wednesday. On this a stratified random sample, however, is that it
day the following eight patients would have their requires the knowledge of people who actually
wait times tracked: 43rd, 15th, 63rd, 2nd, 47th, work in the process. As subject matter experts,
Probability Sampling 119

they can tell you what key stratification categories zero. For example, if you had a list of 500 medical
are relevant. An external statistician who is very records and you wanted to pull a sample of 50,
skilled in sampling methods, for example, will not you would pull every 10th record. To determine
have knowledge of the local characteristics that the starting place for the sample, you would pick a
affect the decisions about proper stratification. random number between 1 and 10. For argument’s
Bring the subject matter experts together with sake, imagine that when we do this we select the
a skilled statistician and you will be able to set number 6. So to start our systematic sample we
up a good sampling strategy. would go to the 6th medical record on our list,
pick it, and then proceed to select every 10th
record after this starting point. Technically, this is
Stratified Proportional known as a systematic sample with a random start
(Babbie, 1979, p. 178). The most frequent ways to
Random Sampling organize the elements are either alphabetically or
In this case, we are going to use the approach chronologically. There are two major advantages
outlined for stratified random sampling, but we of systematic sampling: it is simple and you have
are going to add another twist. We are going to to generate only the first random number. This
determine the proportion that each stratum sampling method is what many healthcare profes-
represents in the population and then replicate sionals think of as a random sample. Although it
this proportion in the sample. For example, if we is a form of random sampling, it does have certain
knew that medical services represented 50% of the limitations. The major problem with systematic
hospital’s business, surgical services represented sampling is that you are eliminating chunks of
30%, and emergency services represented 20%, data that could provide knowledge about the
then we would draw 50% of the sample from process. If, for example, you are selecting every
medical units, 30% from surgery, and 20% from 10th record, you have automatically eliminated
the ED. This would produce a sample that not from further consideration records 1 through 9.
only was representative but also proportionally You pick the 10th record, then skip 11 through 19
representative. This would further increase the and pick number 20. The records in between the
precision of the sample and reduce the sampling ones you select will never have a chance of being
error. The stratified proportional random sample included in your analysis. If there is something
is one of the more sophisticated sampling designs. that occurs regularly in the data or something that
It does require knowledge of the population being causes your data to be organized into bunches of,
sampled, however, as well as having a sufficiently say, seven or eight, then these records would be
large enough population that as you stratify the automatically eliminated from consideration. The
population into a variety of categories you will other problem I have observed with this form of
have sufficient numbers in each category to be sampling in healthcare settings is that the people
eligible for sampling. Note that a stratified pro- drawing the sample do not base the start on a
portional random sample can be more costly in random process. They merely pick a convenient
terms of both money and time. place to start and then start applying the sampling
interval they have selected. This introduces bias
and greatly increases the sampling error.
Systematic Sampling
Systematic sampling offers one of the easiest
ways to draw a sample. It consists of numbering Cluster Sampling
or ordering each element in the population and In cluster sampling, the population is divided
then selecting every kth observation after you into mutually exclusive and exhaustive clusters,
have selected a random place to start, which then a simple random sample is drawn within
should be equal to or less than k but greater than each cluster. On the surface this approach does
120 Chapter 4 Milestones in the Quality Measurement Journey

not seem very different from stratification.


Cluster sampling differs from stratified random ▸▸ Nonprobability
sampling in that cluster sampling seeks to create
“bunches” within the population. Sampling in this
Sampling
way is almost always less expensive than simple Nonprobability sampling is typically used when
random sampling (which is not as focused). The the researcher is not worried about estimating
other key distinctions between stratification and the reliability and precision of the sample or of
cluster sampling include the following: (1) with generalizing the results to a larger population.
stratified sampling, a sample of elements is se- This is not to say, however, that nonprobability
lected from within each stratum or category; and samples do not serve a useful purpose. More
(2) with cluster sampling, a sample of stratum is specifically, nonprobability approaches to sam-
selected. Because the cluster sample is selecting pling can be used when:
a sample of stratum or categories, it is desirable
to have each cluster be a small collection of the ■■ Probability samples are either too expensive
population. Cluster samples, therefore, should to collect or too complicated for the question
establish groupings that are as heterogeneous as being asked
possible. Stratified samples, on the other hand, ■■ There is no need to draw inferences or
attempt to create homogeneous categories (e.g., generalize to larger populations
all medical and all surgical patients). ■■ There is no need to estimate the probability
Another distinction with cluster sampling that each element has of being included in the
is that it is typically done with fairly large sample
populations. This method could be applied, ■■ There is no need to have assurance that every
for example, to a large system that has 15 to element (e.g., patient) had an equal oppor-
20 hospitals. Each hospital could be considered tunity to be included in the sample
a cluster, or they could be grouped into regional ■■ The objective is to conduct an exploratory
clusters. A cluster sample also could be drawn or descriptive study on an issue or process
in a large metropolitan area. Instead of looking that has not been studied in detail
at individual hospitals or hospital systems, ■■ You are testing a potential improvement
you could divide the metropolitan area into strategy and want to run a quick pilot study
neighborhoods or regions (the clusters) and (i.e., sending up a trial balloon to see if it
then sample patients within these regions. In has any hope of succeeding)
Chicago where I live, for example, it would be ■■ Mechanical selection of the sample is not
possible to divide the metropolitan area into required; personal judgment and subjective
north, south, west, and urban core clusters choice are sufficient
(east would not work because Lake Michigan
is located to the east of Chicago). If we did The major forms of nonprobability sampling
this we would not be so concerned with the are convenience sampling, quota sampling, and
individual hospitals and their organizational judgment sampling. The basic objective with
affiliations but rather with bundling people all of these methods is to select a sample that
together into common geographic areas. With the researchers believe is “typical” of the larger
large populations, therefore, cluster sampling population. The problem is that there is no way
can be a very economical approach to sampling. to actually measure how typical or representative
Cluster sampling would not apply to the unit a nonprobability sample is with respect to the
or department level because the population population it supposedly is representing. In short,
of interest (e.g., all hip or knee replacement nonprobability samples can be considered “good
patients) is not large enough to permit clusters enough samples” (i.e., they are good enough for
to be created. the people pulling the sample).
Nonprobability Sampling 121

Convenience Sampling Quota Sampling


As the name implies, convenience sampling is Quota sampling was developed in the late 1930s
designed to obtain a handful of observations and used extensively by the Gallup organization
that are readily available and convenient to to gain great recognition as well as ridicule. (See
gather. Convenience sampling is also referred Chapter 3 for additional details on how Gallup
to as “chunk” sampling (Hess et al., 1975, p. 8) benefited from quota sampling in 1936 and then
or accidental sampling (Maddox, 1981, p. 3; was criticized in 1948 for its failure to predict
Selltiz et al., 1959, p. 516). A classic example of accurately.) If you ask healthcare professionals to
convenience sampling is found in the “man on describe quota sampling, they will probably tell
the street” interview conducted by TV stations. you that it is merely a simple way to determine
The local TV channel parks its action-cam van the total minimum number of elements needed
along a busy downtown street at lunchtime. in a sample (e.g., we need a quota of 5% of the
The investigative reporter positions herself medical records) or the total minimum amount
strategically and begins to scan the people who of data that the team can afford to gather. These
walk by. She knows that she needs to get at two factors, although part of quota sampling, are
least four good comments from local citizens only part of the picture. Babbie (1979, p. 196)
(her quota sample), so she eliminates anyone nicely describes the steps involved in developing
from consideration who looks like they would a quota sample:
be (1) uncooperative, (2) argumentative, or
(3) too chatty without any substantive sound 1. Develop a matrix describing the
bites. Then she sees a likely candidate and strikes: characteristics of the target popu-
“Hi, I’m from Channel 5 News and I’d like to lation. This may entail knowing the
know how you feel about . . . (fill in the blank).” proportion of males and females;
Okay, one down and three more to go (to meet various age, racial, and ethnic pro-
the quota). So the search continues. There is no portions; as well as the educational
science behind this type of sampling. It produces and income levels of the population.
a biased sample that is essentially a collection of 2. Once the matrix has been created
anecdotes that cannot be generalized to larger and a relative proportion assigned
populations. In technical terms, this is what is to each cell in the matrix, you collect
referred to as a convenient quota sample (i.e., data from persons having all the
I need a quota of four people and I’m willing characteristics of a given cell.
to take anyone who is convenient and agrees 3. All persons in a given cell are then
to talk). In the healthcare setting, convenience assigned a weight appropriate to their
sampling is used frequently, possibly too often. I proportion of the total.
have seen it used to pull a convenient sample of 4. When all the sample elements are
medical records, obtain patient satisfaction input so weighted, the overall data should
(go grab a few people waiting in the ED and ask provide a reasonable representation
them how they feel about their wait time), or of the total population.
select a “typical” day to study call button response Theoretically, an accurate quota sampling
time. The primary question that someone should design should produce results that are reasonably
ask when a convenience sample is drawn is, representative of the larger population. Quota sam-
“How important is it to know whether the sample pling has several inherent problems, however, that
of elements we just selected are representative are related primarily to how the cells in the quota
of the larger population?” If the consequences of matrix are actually populated. If, for example, the
being wrong do not matter, then the convenience individuals collecting the quota samples are not
sample might be good enough. particularly vigilant and honest about filling their
122 Chapter 4 Milestones in the Quality Measurement Journey

quotas, the results will be biased. Remember, the has a low level of precision is based on the fact that
actual selection of the elements to fill the quota the sample is drawn on the basis of the knowledge
is left up to the individual gathering the data, not of the person(s) drawing the sample. No ­objective
to random chance. If the data collectors are not mechanical means are used to select the sample.
diligent and/or honest about their work, they will The assumption is that experience, good judgment,
end up obtaining their quotas in a manner that is and appropriate strategy can select a sample that
more like a convenience sample than a true quota is acceptable for the objectives of the researcher.
sample. This happens frequently when quota samples An example of judgment sampling is seen every
are being collected in neighborhoods. The 2000 4 years when a handful of states and communi-
census in Chicago provided a good example of ties are selected to be “pulse checks” for the U.S.
this type of bias. The census workers were given presidential election. In this case, the assumption
quotas to fill on the North Shore of the city. This is that the people in Iowa and New Hampshire
is a rather wealthy area where it is not uncommon are “typical” of the rest of the nation and that the
to find homes that are gated and monitored by responses of these citizens provide a snapshot of
security. Many of the census workers were not how the average American views the presidential
given access to these homes, even though they candidates. Obviously the major challenge to
were technically in the cell they were supposed judgment sampling is related to the knowledge
to obtain. Apparently pressured by the require- and wisdom of the person making the judgment
ment to meet their quotas, the census workers call. If everyone believes that this person exhibits
creatively began to substitute other residents for good wisdom, then they will have confidence in the
the ones defined by the quota sample. As a result, sample that the person selects. If, on the other hand,
the cells in question (i.e., neighborhoods) were people doubt the person’s wisdom and knowledge,
underreported and not properly representative of then the sample will be discredited.
the area (Chicago Tribune, July 5, 2000, “Census Now, consider the nonacademic use of
Shortcuts Alleged”). Another threat to the validity judgment sampling. Deming considered judg-
of the quota sample is that the patient population ment sampling to be the method of choice for
characteristics might be outdated and not reflect QI research. Langley, Nolan, Nolan, Norman,
the current patient population. The final threat and Provost, (1996, p. 111) maintain that “A
involves the process by which the data collectors random selection of units is rarely preferred to
actually gather the data. For example, if a quota a selection made by a subject matter expert.” In
sample was established to gather data in the ED QI circles, this type of sampling is also known
but only during the day shift, you would run as expert sampling or rational sampling. It
the risk of missing key data points during the essentially consists of having those who have
afternoon and evening shifts. expert knowledge of the process decide on how
to arrange the data into subgroups and pull the
sample. The subgroups can be elected either by
Judgment Sampling random or nonrandom procedures, which is a
I saved the discussion of judgment sampling until the major distinction between the QI perspective and
end because it can be viewed in two very different the academic view of judgment sampling. The
lights. If you approach sampling from an academic other important distinction about Deming’s view
research perspective, then judgment sampling is of judgment sampling is that the samples should
regarded as having a low level of precision and be selected at regular intervals over time, not at
statistical rigor. If, on the other hand, your objective a single point in time. Most sampling designs,
is not academic research but rather QI research, whether they are probability or nonprobability,
then judgment sampling provides a useful approach are static in nature. The researcher decides on a
to sampling. The academic view that judgment time frame then picks as much data as possible.
sampling (also referred to as purposive sampling) In contrast, Deming’s view of sampling was that
Nonprobability Sampling 123

it should be done in small doses (rather than A review of the probability and nonproba-
large quantities) and pulled as a continuous bility sampling methods is provided in TABLE 4-5.
stream of data (Deming, 1950, 1960, 1975). The Developing a working knowledge of these sam-
primary criticism of judgment sampling is that pling techniques will be one of the best ways to
the “expert” may not fully understand all facets reduce the time spent on collecting data. Done
of the population under investigation and may correctly, sampling will also be a way to ensure
therefore select a biased sample. The second that the data you do collect are directly related
criticism is that the sampling error cannot be to your QI efforts.
measured. The final challenge is that the results Now that you are familiar with the princi-
of a judgment sample cannot be generalized to pal approaches to stratification and sampling,
the larger population because the sample was it is time to start applying these techniques to
not selected by random methods. your own set of indicators. TABLE 4-6 provides

TABLE 4-5 Advantages and disadvantages of various sampling methods

Sampling
Method Description Advantages Disadvantages

Probability Sampling

Simple A sample that is ■■ Requires minimum ■■ Does not take advantage


random drawn in such a way knowledge of the of the knowledge the
sample that every member population in advance researcher might have
of a population has ■■ Free of possible about the population
an equal chance classification errors ■■ There could be over- or
of being included. ■■ Easy to analyze the underrepresentation of
A random number data and compute subgroups within the
table or a random errors population
number generator ■■ Fairly inexpensive ■■ Typically produces larger
is typically used to sampling errors for the
actually pull the same sample than a
sample. stratified sample

Stratified The population ■■ Helps to reduce the ■■ Requires knowledge of


random is divided into chances of over/ the presence of various
sample relevant strata underrepresenting characteristics within the
before random subgroups within the population
sampling is applied population ■■ Sampling costs can
to each stratum. ■■ Allows you to segment increase if knowledge
the data into “buckets” of the population is
during the analysis shallow
phase ■■ If the strata are not highly
■■ Create more efficient homogeneous then
samples sampling error goes up
■■ Reduces sampling error and efficiency goes down

(continues)
124 Chapter 4 Milestones in the Quality Measurement Journey

TABLE 4-5 Advantages and disadvantages of various sampling methods (continued)

Sampling
Method Description Advantages Disadvantages

Proportional The proportion (or ■■ Adds even more ■■ Requires more


stratified percentage) of a precision than the human and financial
random particular stratum stratified random resources than other
sample is determined in sample methods
the population and ■■ Increases sample ■■ Requires even more
then applied to the representativeness information about the
random sample. ■■ Creates very efficient population than stratified
samples random methods
■■ Reduces sampling error

Systematic Select every kth ■■ Very easy to conduct ■■ Can produce bias due
sample observation from ■■ Has “intuitive” appeal to periodic ordering
the population after ■■ Inexpensive to conduct of observation, which
a random starting produces exclusion
point has been of segments of the
selected. population
■■ Increased probability of
sampling bias

Cluster Clusters or ■■ Can be low cost, ■■ Clusters need to be


sample “bunches” of the especially if geographic as heterogeneous as
population are clusters are used possible
identified, and then ■■ If properly done, each ■■ Typically has lower
random sampling cluster is a small model statistical efficiency
is applied to each of the population ■■ Large samples are
cluster. ■■ High level of often needed to ensure
practicality precision

Nonprobability Sampling

Convenience Observations are ■■ Ease of obtaining a ■■ Extremely low


sample selected based sample generalizability
on availability ■■ Relatively low cost ■■ No way to determine
and convenience. sampling bias or
Also known as sampling error
“accidental”
samples.
Nonprobability Sampling 125

TABLE 4-5 Advantages and disadvantages of various sampling methods (continued)

Sampling
Method Description Advantages Disadvantages

Quota A population ■■ Stratification effect is ■■ The people assigned to


sample is divided into achieved if the strata collect the quotas need
relevant strata. The are appropriately to be scrupulous and free
desired proportion structured from selection bias and
of samples to be ■■ In theory, the quota follow the prescribed
obtained from sample should sampling design
each stratum is be reasonably (otherwise this method
determined, and representative of the becomes a convenience
then a fixed quota population sample)
within each stratum ■■ Human and financial ■■ It is difficult to guarantee
is set. costs can be kept that the quotas were
to a minimum if the filled accurately
strata from which the ■■ In-depth knowledge of
quotas are to be drawn the population is required
are grouped close ■■ Nonrandom selection
together (reduced the of the quotas can also
amount of travel the introduce bias
data collectors have
to perform in order to
gather the data)

Judgment Subgroups are ■■ Samples in a ■■ Sampling bias and


sample drawn from a subgroup can be small sampling error cannot be
process over time (3–5) because many calculated
based on expect subgroups will be ■■ Expert knowledge of the
knowledge. The selected process or population is
subgroup samples ■■ Data collection costs required
can be drawn can be reduced ■■ Generalization of the
either by random ■■ Provides a dynamic judgment sample to
or nonrandom picture of the data larger populations cannot
procedures. and serves as the be done
basis for process ■■ Personal bias enters
improvement into the selection of the
■■ Minimum stratification sample
effect is achieved

a Data Collection Plan Worksheet designed to related to stratification of the data, sampling
help you clarify the data collection plan for your (if appropriate), and frequency and duration of
improvement team. For each indicator that you data collection. The frequency of data collection
identified earlier in this section (see Table 4-3, addresses how often you plan to collect the data.
the Organizing Your Indicators Worksheet), Will you, for example, collect the wait time of
outline the decisions your team has made every patient or develop a sampling strategy?
126 Chapter 4 Milestones in the Quality Measurement Journey

TABLE 4-6 Data collection plan worksheet


Team Name and Improvement Topic: __________________________________________________________

Will you use Duration of


Is stratification sampling? If data collection
appropriate? Yes, describe Frequency of (i.e., how long
If Yes, list the sampling data collection do you plan
Indicator the levels of method you (e.g., hourly, to collect the
Name stratification will use daily, weekly?) data?)

BOX 4-3 Conclusions about data collection

1. Sampling should produce representative and workable numbers for the unit of interest.
2. Customers providing feedback about their service or care they receive can be very susceptible to
sampling bias (sampling and recall biases).
3. Sampling bias can be introduced if you always use the same place or time and this is not
representative of the whole. This is a major problem when single point in time audits are relied
on as the sampling method.
4. When conducting surveys recall bias occurs if the questions are reliant on the individual’s
memory.
5. The worst case scenario occurs when you have no idea where the sample came from or how
representative it is of the population or organization overall.
6. Clear guidance on data collection methods, in particular sampling and stratification, are required
whether you are collecting data for improvement, judgment, or research.
7. Data on “why did this happen” are critical to improvement efforts!

Will you collect the wait time of all patients but do it for a week, a month, or several months?
only on Mondays? These questions relate to the If you do not spend time discussing the frequency
frequency of data collection (i.e., how often do and duration questions, you will inevitably come
you need to dip into the ongoing stream of data to to a point when someone says, “How long do I
gain an adequate understanding of the variation have to collect this stuff?”
in the indicator?). The duration issue deals with Summary conclusions about key data col-
how long you plan to collect the data. Will you lection issues are provided in BOX 4-3.
The Indicator Development Worksheet 127

▸▸ The Indicator to include adjectives and adverbs


as well as targets and goals in the
Development indicator name (e.g., history and
physical transcription TAT will be
Worksheet 12 hours or less). This produces what
I call “thou shalts” (i.e., thou shalt
Now that we have reviewed the individual mile- perform this task in 12 hours or less or
stones in the QMJ it is time for you to organize there will be consequences). Indicators
your indicators into a coherent roadmap. The named in this fashion identify the
Indicator Development Worksheet is shown in desired outcome. When you include
EXHIBIT 4-1. It provides a practical and convenient the desired level of performance in
way for a team to organize the details for one the indicator name, you have basi-
specific indicator. If you can provide responses cally built in a barrier or, worse yet,
to all of the items on this worksheet you will have a threat. It sends a message to the
an indicator that, at least for the short term, will workers that you had better do this
enable you to proceed with data collection and or else. If the desired outcome is an
eventually analysis. The details related to each unrealistic goal, the workers quickly
section on the Indicator Development Worksheet figure this out. The indicator then
are provided next. becomes an unrealistic metric or a
1. What is the overall AIM of this im- joke. Consider this example: I was
provement initiative? working with a medical group on
We addressed the particulars of their indicators and asked a team
building an aim statement earlier in what they intended to measure.
this chapter. It is important, however, A member of the team said that
to make sure that the overall aim the indicator was, “No one should
is stated when developing specific have to wait more than 30 minutes
indicators so that the team is able to see the doctor.” My comment to
to clearly see how the indicator is the team was that this was not the
linked to what the team is trying name of an indicator but rather a
to accomplish. How good do you threat. The indicator name should
want to be? And by when do you have been “wait time to see the
expect to achieve the outcome? These doctor.” Although it seems like a
two simple questions are essential minor aspect of performance mea-
to a team’s journey. Again, it does surement, I believe that the naming
not have to be a long and detailed of indicators sets the tone for the
statement. One or two sentences rest of the measurement journey.
should suffice. It is the point at which you leave
2. What is the NAME of this SPECIFIC Conceptland and actually enter into
INDICATOR? Measurementland.
Naming indicators is an important 3. What TYPE of INDICATOR is this?
component of indicator development It is important to be clear about
that is frequently taken for granted. the types of indicators being developed.
Some might ask, “What’s the big Teams need to be careful to not develop
deal? Just give it a name.” Indicator so many indicators that they become
names should be objective, and they over burdened with collecting data. Most
should reflect quantifiable nouns. improvement teams will be tracking
Often, however, teams feel the need one to three outcome measures, four or
128 Chapter 4 Milestones in the Quality Measurement Journey

EXHIBIT 4-1 Indicator Development Worksheet


Team name and topic of interest: 
Date: __________________ Contact person: ________________________ Email 
1. What is the overall AIM of this improvement initiative?
(How good do you want to be? By when do you expect to achieve the outcome?)
2. What is the NAME of this SPECIFIC INDICATOR? (e.g., the number of x-ray exams performed,
the percentage of x-ray reports that could not be found, the medication error rate or the days
between a patient fall).
3. What TYPE OF INDICATOR is this?
____ Outcome ____ Process ____ Balancing
4. What is the OPERATIONAL DEFINITION for this indicator?
Define the specific components of this indicator. Specify the numerator and denominator if it is
a percentage or a rate. If it is an average, identify the calculation for deriving the average. Include
any special equipment needed to capture the data. If it is a score (such as a patient satisfaction
score) describe how the score is derived. When an indicator reflects concepts such as accuracy,
complete, timely, or an error, describe the criteria to be used to determine “accuracy.”
5. What is your DATA COLLECTION PLAN?
•• How frequently will the data be collected?
____ Every Patient____ Hourly____ Daily ____Weekly ____Monthly ____Other (please
specify)
•• What are the data sources to be used for this indicator (be specific)?
•• What is to be included or excluded (e.g., only inpatients are to be included in this measure or
only stat lab requests should be tracked).
•• How will these data be collected?
____Manually ____From a logbook ____From an automated system
•• Who will be responsible for the actual collection of the data?
_________________________________________________
•• Will you use stratification? If “Yes” specify the stratification levels you will use.
•• Will you employ sampling? If “Yes” specify your sampling strategy you plan to use.
6. Do you have BASELINE DATA for this indicator? ____Yes ____No ____Unknown
•• What is the actual baseline number? ________________________________
•• What time period was used to collect the baseline? ____________________
7. Are there TARGET OR GOALS for this indicator?
Internal target(s) or goal(s)? ____Yes ____No ____Unknown
If “yes” please list the actual internal target or goal (e.g., the number, rate, or volume, etc., as well
as the source of the target/goal)
______________________________________________________________________
External target(s) or goal(s)? ____Yes ____No ____Unknown
If “yes” please list the actual internal target or goal (e.g., the number, rate, or volume, etc., as well
as the source of the target/goal)
______________________________________________________________________
The Indicator Development Worksheet 129

five process measures, and one or two with this definition and collect
balancing measures (if appropriate). wrong data?” If you have written a
As mentioned earlier in the Types clear and unambiguous operational
of Indicators section of this chapter, definition, then you will be able to
the objective should be to identify avoid confusion during the data
a reasonably small set of indicators collection stage.
(six to eight total) that capture the 5. What is your DATA COLLECTION
critical aspects or dimensions of the PLAN?
team’s aim (i.e., build a balanced set The items listed in this sec-
of indicators). tion require knowledge, skill, and
4. What is the OPERATIONAL DEFI- experience with data collection
NITION for this indicator? practices for QI, not data collection
This is essentially the heart and practices for conducting randomized
soul of the Indicator Development clinical trials (RCTs) or traditional
Worksheet. In this section, you should academic research but QI research.
provide detail on the components of A key difference between data for
the operational definition in very spe- QI and data for RCTs or more tra-
cific terms. If it involves a percentage, ditional academic research is that
then the numerator and denominator QI data will typically be collected
should be described. Similarly, if the more frequently and in smaller
measure is a rate, then the rate-based doses than traditional research. The
statistic should be defined (i.e., falls questions related to data collection
per 1,000 patient days). The easiest are listed in the worksheet but a few
way to do this is to take the indica- suggestions on data collection are
tor name (e.g., inpatient fall rate) listed here.
and then say, “Inpatient fall rate is • Collection Frequency and
defined as . . .(fill in the blank).” If it Duration. Remember that
is an average, identify the calculation frequency deals with how often
for deriving the average. Include any you plan on collecting the data,
special equipment needed to capture whereas duration addresses the
the data. If it is a score (such as a question of how long you will
patient satisfaction score) describe continue to collect the data.
how the score is derived. When an Will you collect the wait time
indicator reflects concepts such as of every patient or develop a
accuracy, complete, timely, or an sampling strategy? Will you col-
error, describe the criteria to be used lect the wait time of all patients
to determine “accuracy.” Remember but only on Mondays? Next
to describe what is to be included focus on the duration of data
(e.g., all inpatients, including pe- collection question. How long
diatrics, and geriatrics and falls in do you plan on collecting the
the ED) and what is to be excluded data? Will you do it for a week,
(e.g., visitor falls in and out of the a month, or several months?
hospital, staff falls, and falls in the Frequency and duration are
rehabilitation unit). The litmus test critical issues for the team to
for a good operational definition is address and they need to be
really quite simple. Just ask yourself, clarified before you actually
“How could someone get confused start to collect the data.
130 Chapter 4 Milestones in the Quality Measurement Journey

• Data Sources. Where do you plan • Person(s) Responsible for


on getting the data? Will they Collecting the Data. It is not
be manually collected, or will uncommon to arrive at this step
they come from an automated in the process and realize that
system? If it is to be a manual you have not made any provi-
process, will they come from sions for actually collecting the
existing log sheets or the medical data. Everyone seems to assume
record, or do you have to create a that someone else will do the
new data collection tool? If they work of recording wait times
come from an automated system, or extracting documentation
what segment of the automated history from the medical re-
system will be used (e.g., is it the cords. Someone has to do the
registration system, the billing data collection. Frequently,
system, or the patient satisfaction however, there is a wonderful
system)? A related issue is by chain reaction when it comes
what method do you propose to this task. Physicians assume
to actually gather the data? For that the nurses will collect the
example, if you are tracking lab data. The nurses assume that the
TAT, will the recorded time unit secretaries will complete
come from the watch of the this task. The unit secretaries
individual who is recording the hope that several administrative
log-in time, the clock on the wall interns will be assigned to the
by the door, or the automated department for the summer
time stamp produced by the and this job can be pawned
computer system? If the data off on them. If all else fails, the
are to be collected manually, do volunteers can be asked to do
you have a procedure outlining the data collection. Now think
how the person recording the about this. If you have spent
data is supposed to identify the considerable time building
particular piece of data (e.g., the indicator’s operational
the log-in time to the lab of a definition and data collection
specimen) and then enter it into plan, why would you assume
the logbook? For some of you that some undetermined per-
this probably seems like a very son will magically appear and
left-brained, compulsive set of solve all your data collection
questions. If someone does not problems? In Greek dramas,
attend to the details, however, they this solution was referred to
will be ignored. This is why you as the deus ex machina (the
need both left- and right-brained god from the machine) or the
people on QI teams. A team unexpected solution to a dif-
with all left- or all right-brained ficult problem.11 If this aspect
people usually does not achieve of the measurement, journey
as much as teams that have a mix is not determined prior to the
of perspectives. Sometimes you collection of data, I can guar-
need vision and creativity, and antee you that (1) it will not
sometimes you need structure go smoothly and (2) the data
and attention to details. gathered will be questionable.
The Indicator Development Worksheet 131

• Stratification and Sampling. performance to be. Targets are usually


Considerable detail on stratifi- seen as short-term objectives (several
cation and sampling has already months to a year). Goals, on the other
been provided in this chapter. hand, are usually designed for a little
It is essential, however, that longer period of time, say 2–3 years.
the team devote time to dis- I have seen numerous examples,
cussing the relevance of these however, of how organizations have
two extremely important data confused staff by not being clear on
methods to their data collec- whether the new number was a target
tion plan. Both are extremely or a goal. Frequently, people use the
important from a practical as terms as if they were synonymous, but
well as resource perspective. note that targets are typically more
As was mentioned, they are short term (e.g., weeks or months)
less about statistical issues than whereas goals are more long term
logical issues and they can be (e.g., a year or longer). The critical
properly decided upon only by points are that you (1) develop tar-
those who have subject matter gets and/or goals that are reasonable
experience. and achievable and (2) have a plan
6. Do you have BASELINE DATA for for how the targets and/or goals are
this indicator? to be achieved. As Deming pointed
Remember that the baseline out frequently in his seminars and
is what the current process is pro- writings, “Goals are necessary for
ducing. It is not what you want it you and me, but numerical goals for
to be or expect it to be. Baseline is other people, without a road map to
a fundamental concept in medicine. reach the goal, have effects opposite
We get a baseline on a patient before to the effects sought” (Deming, 1992,
we start to prescribe medications p. 69). The simpler version of Dem-
or treatments. In a similar fashion, ing’s basic challenge was “By what
we basically want to know how the method?” That is, by what method
indicator we are tracking is per- do you propose to achieve this target
forming and what it is capable of or goal?
producing under current operating Another key aspect of targets
conditions. Also note that targets and goals is whether they are es-
and goals should not be established tablished internally or externally.
without having a clear understand- Increasingly, healthcare targets and
ing of what the current baseline goals are being established by exter-
is. Frequently, however, teams set nal bodies and given to providers.
targets and goals without knowing In this case, however, the external
the current performance, which bodies may not fully understand the
typically leads to the establishment capability of the existing processes
of arbitrary and capricious targets to achieve the targets or goals. This
or goals that have little chance of can lead and has led to considerable
being reached. stress and challenge for healthcare
7. Are there TARGETS or GOALS for providers.
this indicator? The final aspect of setting targets
This is where you identify what and goals is one that has fascinated
you want or expect the indicator’s me for years. My experience has
132 Chapter 4 Milestones in the Quality Measurement Journey

convinced me that nearly all targets innovation into a meeting propose


and goals are established as whole that the next target or goal be 11.58%
numbers that are divisible by 5. The not 15 and see what happens. I did
next time you are in a meeting where this once in a management meeting
targets and goals are being discussed and a majority of the attendees looked
test my theory. You will generally at me in a very strange way. One
find that people start a discussion of them even said, “Are you trying
by saying “I think we need a 10% to be a wise guy? Why would we
reduction in X.” This will be followed have a target of 11.58%?” I smiled
by someone else saying, “No, I think at the fellow and asked him why he
we should shoot for a reduction of thought whole numbers divisible by
15%.” If 15% is not accepted as the 5 provided a realistic foundation for
target someone else will then raise the the target. He hesitated for a moment
ante to 20% or even 25% . . . and so the responded, “Well, that is how we
on. If you want to offer a disruptive always set targets.” QED!

CASE STUDY #1: Transcription Turnaround Time


The following case study is designed to demonstrate how the milestones addressed in this chapter can
be applied to an improvement challenge: reducing transcription turnaround time (TAT) for histories
and physicals (H&Ps).

Situation
Imagine that you are the director of QI at a medium-sized hospital. One morning you receive
a call from your friend Becky, the manager of medical transcription, who asks if she can meet with
you ASAP. You sense that she is bothered by something and tell her that you will come to her office in
an hour. You have known Becky for more than 8 years and are a little surprised when she dispenses
with the usual pleasantries and jumps right into her concern. She tells you that several physicians
have been complaining recently about transcription TAT for H&Ps. She even confides in you that she
actually had a rather “energetic” exchange with the head of surgery about this issue earlier in
the morning. Now Becky is asking for your help, as she explains, “to get the doctors to realize that TAT is
actually very good.” To “prove” her point she shows you FIGURE 4-7. She points out quickly that the goal
is to get the transcriptions completed in 12 hours or less. Becky feels that the graph demonstrates that
over the last 10 months the transcription team has been able to meet this goal 92–99% of the time.
She asks, “So what is the big deal with the physicians? Why are they complaining?”

Diagnosis of the Problem


As you look at the title of Figure 4-7, the first thing that strikes you as confusing is that the
indicator of interest is transcription TAT, yet Becky has presented the data as the percentage of
H&P transcriptions completed within 12 hours. You also point out that she has actually violated
one of the basic principles of naming an indicator—she created a “thou shalt” (H&P transcription
TAT will be 12 hours or less). Her response is, “We have to have a goal; otherwise, the doctors will

(continues)
The Indicator Development Worksheet 133

CASE STUDY #1: Transcription Turnaround Time (continued)

100%

98%

96%
Percentage

94%

92%

90%

88%
Jan-00 Feb Mar Apr May Jun Jul Aug Sep Oct
Month
FIGURE 4-7 Percentage of history and physical (H&P) transcription turnaround times completed
within 12 hours

not take us seriously.” You respond by saying that goals are critical to improvement initiatives
but that they should not be built into the name of the indicator. Indicator names should be
objective statements about what is to be measured, not what you want it to be. You also point
out that by measuring TAT as a percentage, you are losing information about the true variation
in the process. For example, consider the month of August in Figure 4-7. During this month, 92%
of the transcriptions were completed in 12 hours or less. What was the variation in the TATs for
this month? What was the shortest TAT and what was the longest TAT in this month? You cannot
answer this question because the measure is whether the transcriptions were completed in more
than 12 hours or less than 12 hours. This is a yes or no type of indicator. The longest TAT might
have been 13 hours or 40 hours. But because of the way in which the indicator was developed,
you will never know the answer to this question about variation. You suggest that it would be
useful, therefore, to look at the actual time (in hours) to transcribe H&Ps. Becky is so desperate for
help that she agrees to do this even though she is not sure why.
You also point out that there is no necessary reason why this indicator is structured around
monthly data. Becky has shown you only 10 months of data. This is the minimum amount of data for
a run chart (i.e., 10 data points) but you are thinking that there is probably a sufficient amount of data
that it could be broken down into smaller subgroups that would allow the construction of a Shewhart
chart.12 So, you ask, “Why can’t you display this indicator by week or every 2 weeks?” Becky’s response
is one you have heard many times before: “Well, there are several reasons. This is how we have always

(continues)
134 Chapter 4 Milestones in the Quality Measurement Journey

CASE STUDY #1: Transcription Turnaround Time (continued)

TABLE 4-7 Summary data by month for H&P turnaround times

Month Number of H&P Reports Total Hours TAT* Average TAT Hours

January 500 5,140 10.3

February 487 5,734 11.8

March 498 4,948 9.9

April 521 6,024 11.6

May 517 5,882 11.4

June 508 5,913 11.6

July 489 5,756 11.8

August 501 6,031 12.0

September 520 5,850 11.3

Goal = 12 hours *Hours are rounded up to the nearest hour

collected this data, it is how the tracking system is set up and how management expects to see the
data reported.” You explain that by moving to more frequent subgroups such as by week or every
2 weeks, a more detailed picture of the true variation in the process will be obtained. Becky is not quite
sure what you are talking about but she is willing to bring you the detailed data.
The next day Becky comes to your office with TABLE 4-7 and FIGURE 4-8. Table 4-7 shows
the number of H&P reports completed each month, the total hours required to complete the
transcriptions, and the average hours. Figure 4-8 presents these data as a line graph with the 12-hour
goal as a reference line. Becky is quick to point out that this chart clearly “proves” her point. All of the
average TATs she says, are below the goal of 12 hours. So, again, why are the doctors upset? What
would be your next set of recommendations for these data?

Recommendations
As you look at Figure 4-8 you notice what you believe might be causing the physicians to complain
about TAT. You see from the title of this chart that the process starts with dictation and ends
when transcription is completed. You ask Becky a simple question: “What do the physicians care
most about?” She looks at you in a quizzical manner and says, “I don’t understand your question.”
You proceed to explain that the physicians probably do not care very much about the time
from dictation to transcription but rather the time from dictation to posting the results on the

(continues)
The Indicator Development Worksheet 135

CASE STUDY #1: Transcription Turnaround Time (continued)

20

15

Goal < 12 hours


Hours

10

0
Jan-00 Feb Mar Apr May Jun Jul Aug Sep Oct
Month
FIGURE 4-8 Average H&P transcription turnaround times from dictation to transcription

computer or finding the results in the chart. Becky is very quick to point out that “now you’re being
unreasonable.”
She continues by explaining that they have never looked at the end piece of the process (i.e.,
transcription to posting of the results) because that part of the process has never been very good,
and it is something they do not feel that they have much control over. She concludes her comments
by stating, “We look at what we do and expect someone else to take care of the rest of the process
by posting the results in a timely manner. It is not my problem that the people who are supposed
to post the results don’t do it according to the doctor’s demands.” You politely point out that the
customers apparently do believe that the transcription department is responsible for making the
posting of results happen in a timely manner. With reluctance Becky agrees to look at the process
from dictation to charting, but she points out that there is no way that they can look at all the cases
for these new starting and ending points. At this point you are glad she has raised this issue because
it provides you with an opportunity to return to the topic of the monthly data collection. Your
guidance is as follows:
■■ Upon looking at Table 4-7 you explain that having roughly 455 transcriptions each month
provides ample reason to make smaller subgroups.
■■ With this many transcriptions each month it would be possible to make a subgroup for every
2 weeks of data (with roughly 226 transcriptions being received every 2 weeks). Or if week was

(continues)
136 Chapter 4 Milestones in the Quality Measurement Journey

CASE STUDY #1: Transcription Turnaround Time (continued)

selected as a subgroup then there would be about 114 transcriptions received each week.
Finally with this amount of data, day could be used as the subgroup, which would provide
approximately 15 transcriptions being received each day. The problem with looking at the data
by day, however, would be that this would produce over 300 individual data points on a chart
for the 10-month period, which is clearly not necessary.
■■ In discussing these options with Becky, you reach agreement that looking at the data by week
makes the most sense. This will produce 40 data points on a chart, which is not excessive. You
go a step further to point out to Becky that analyzing all the data in a week (approximately 114
transcriptions each week) is also not necessary. A smaller subset of transcriptions (approximately
15–20 each week) can be selected by developing a sampling strategy. Becky agrees to help you
in implementing a sampling strategy on future data but asks that you use the existing data for 10
months to help her think through her current challenge of addressing her customers’ dissatisfaction
with the TAT.
When Becky returns to your office on Monday, she seems to be somewhere between shock
and embarrassment. When you look at FIGURE 4-9 you start to understand why she is in such a state
of mind. It shows that the average TAT for the transcription process, dictation to posting on the
patient’s chart, is somewhere around 18 hours—much higher than the goal of 12 hours discussed
earlier. When you ask Becky about these results, she says that they are very surprising. Because you
know her well, however, you quickly figure out that she is not being totally honest with you. With a
little probing she admits that she knew this all along and that this is why she decided to focus only

20

15

Goal < 12 hours


Hours

10

0
Jan-00 Feb Mar Apr May Jun Jul Aug Sep Oct
Month

FIGURE 4-9 Average H&P transcription turnaround times from dictation to chart

(continues)
The Indicator Development Worksheet 137

CASE STUDY #1: Transcription Turnaround Time (continued)

on the dictation-to-transcription part of the process (this is where the embarrassment starts to settle
in). So, you offer another quality measurement insight. Specifically, you ask if this process needs to
be stratified. Becky is not quite sure where you are going with this question. You explain that it might
be possible that the transcription process varies by day of the week, time of day when the dictation
is received, or possibly by type of procedure. As you explore this idea with her, she indicates that it is
possible that the TAT could vary by type of patient, namely, nonsurgical versus surgical H&Ps. When she
says this you notice a slight hesitancy in her voice, but you are not sure why.
To your surprise, Becky returns to your office later that same day. She now has two new charts.
FIGURE 4-10 shows the TAT for nonsurgical patients and FIGURE 4-11 shows the results for surgical
patients. As soon as you look at the charts you realize why Becky was looking a little embarrassed
and sounded hesitant earlier in the day. The answer to her problem is obvious. There is not one
transcription TAT process occurring here but two. There is one process for nonsurgical patients and
another for surgical patients. The nonsurgical patient TATs are turned around in roughly 21 hours
whereas those for the surgical patients have TATs of about 5 hours. This spread in the two processes
helps to explain why the average for all patients is about 18 hours (Figure 4-9): it reflects the average of
two very different processes.
So why the big discrepancy? It turns out that two of the surgeons were very upset with
Becky and her department about 8 months ago. They complained not only to the president of

25

20

15
Hours

Goal < 12 hours

10

0
Jan-00 Feb Mar Apr May Jun Jul Aug Sep Oct
Month

FIGURE 4-10 Average H&P transcription turnaround times for nonsurgical patients from dictation to
chart

(continues)
138 Chapter 4 Milestones in the Quality Measurement Journey

CASE STUDY #1: Transcription Turnaround Time (continued)

14

12 Goal < 12 hours

10

8
Hours

0
Jan-00 Feb Mar Apr May Jun Jul Aug Sep Oct
Month

FIGURE 4-11 Average H&P transcription turnaround times for surgical patients from dictation to chart

the medical staff but also to the CEO about the TAT. Becky said she still remembers vividly the
discussion she had with the CEO when she was called to her office. In order to avoid that situation
again, Becky made sure her staff gave preferential treatment to any H&P related to surgical
procedures. As a result, the nonsurgical H&Ps received less attention than the surgical patients, as
demonstrated in Figures 4-10 and 4-11.

The Improvement Plan


Once Becky was over her cathartic release of pent-up angst, she was able to acknowledge readily
that she and her department set themselves up for this outcome. Obviously, one objective will be
to maintain the performance of the surgical H&P TAT process. The challenge will be to work on the
nonsurgical process and take steps to reduce this time to meet the goal of 12 hours or less. You agree
to assist Becky by helping her (1) create an improvement team to work on the TAT process, (2) serve
as the improvement advisor for the team, (3) guide them through the next round of data collection
and analysis, (4) develop a stratified proportional random sample for a weekly subgroup of 15–20
transcription times, and (5) create a Shewhart chart that is appropriate for understanding the variation
in TAT (i.e., an X bar and S chart described in Chapter 9). Becky leaves your office not only feeling better
about revealing the entire story behind this issue but also motivated to work with her team to improve
the nonsurgical process.
Notes 139

Notes 6. Increasingly, politicians are being asked to


clarify what they say and what they mean.
1. An excellent book on this topic is The Cult It is not uncommon, for example, for a
of Statistical Significance: How the Standard political candidate or an incumbent to have
Error Costs Us Jobs, Justice and Lives by a reporter repeat what the politician said
Stephen Ziliak and Deirdre McCloskey yesterday or last week about an issue and
(The University of Michigan Press, 2011). then ask them why they are now saying
2. This is actually the first question in IHI’s something different. The evening news is
approach to improvement. It is part of a notorious for playing a clip of what the
larger framework called the Model for president said last month and then show-
Improvement (MFI), which was developed ing that he said just the opposite today.
by Associates in Process Improvement Think how easy our nation’s founders
(API). The MFI is detailed in The Im- had it in this regard. It was probably very
provement Guide (Langley et al., 2009). easy for them to take a position one day
IHI has adopted the MFI as its overarching and another position the next. No one
approach to improvement. More will be recorded their comments verbatim, and
said about the MFI in Chapter 9. there were no cameras, videotapes, or
3. The higher up individuals go in the food recording devices. Operational definitions
chain the more they seem to live in Con- in George Washington’s day could be very
ceptland. Board members, senior leaders, loose. Today, however, there is increasing
political leaders, and even the media scrutiny on the part of the public and a
frequently live in or have time-shares desire to have the politicians be more
in Conceptland. This is not necessarily precise in their definitions of terms.
bad. These folks are supposed to set di- 7. You will need to be clever in how you
rection and provide visions of what can actually fit this into a conversation or
be achieved. The problem is that if an a meeting. One possible opening is to
organization does not have people skilled couch it in light of some vague reference
in the science of improvement who can to the public release of healthcare data
move the organization out of Conceptland and indicate that you read in the paper
and into Measurementland the visions that there would be a “random sample of
will never be achieved. patients pulled from all admissions.” Now
4. When teams get stuck in Conceptland and you can innocently ask, “Have any of you
have vague notions of what they want to ever drawn a random sample?” and it will
improve it reminds me of the famous state- seem to be consistent with your setup.
ment often credited to Horace Greeley (or 8. Although it seems counterintuitive, it is
as the story goes possibly to John Babson possible to have too much data. There is
Lane Soule): “Go West young man.” A frequently a belief (although it is generally
general direction is offered but no specific a false belief) that if a little data is good,
aim or milestones are given as to where in then a lot of data must be better. This is
the West you should end up. So the journey not always the case. For example, when
stays essentially in Conceptland. a national news service conducts one of
5. Knowing the difference between a pro- its “man on the street” surveys to test
portion, a percentage, and a rate is critical. the political climate or a national public
Often in healthcare settings people refer polling agency conducts a survey of
to a rate when they are really referencing American opinions, how many people do
a percentage. Additional detail on these you think they include in their sample?
distinctions is provided in Chapter 6. Typically they shoot for 1,000 to no more
140 Chapter 4 Milestones in the Quality Measurement Journey

than 2,000 people. We have more than 280 10. There are two classic stories about sam-
million people in this country and they pling that both the critics of sampling
get only 1,500 respondents. Why don’t and its proponents have referenced
they get more? After a certain point the for years. The first is the 1936 Literary
additional data do not add anything to the Digest poll that predicted the landslide
statistical precision of a study. It merely victory of Alf Landon over incumbent
wastes resources and time. A general rule president Franklin D. Roosevelt. Using
of thumb is that 30–50 observations (data a mailed sample of more than 2 million
points, survey respondents, or numbers) voters, the Literary Digest predicted that
will start to produce a distribution. If you Landon would win by almost 15%. The
stratify your respondents by age, gender, mistake they made was in selecting the
race, region of the country, urban/rural list of individuals for the sample (this is
status, education, income, and religious referred to as the sampling frame). The
preference, which is what the national sample was drawn from telephone direc-
news polls do, then you need more than tories and automobile registration lists.
30–50 observations in order to ensure that These methods had worked in the past
each level of stratification has sufficient elections quite nicely. What the Literary
data to enable the appropriate statistical Digest pollsters forgot was that in 1936
analysis. Telemarketers are experts at the nation was still feeling the negative
sampling. They can pinpoint down to the effects of the depression and the more
neighborhood area or census track level positive impacts of Roosevelt’s New Deal
how many people represent the categories program. The 1936 election witnessed an
they need for their marketing study. A unprecedented turnout of poor voters.
stratified proportional random sampling These people were not proportionately
plan is put into place, the computer au- represented in the telephone and car
tomatically dials the numbers, and your registration lists because they could not
dinner is interrupted because you fit the afford such luxuries. The other key issue
sampling profile they need. But remember that the Literary Digest missed was that
they do not need much data to complete the poor voters were primarily Demo-
their sampling plan. crats whereas the more wealthy voters,
9. While I was a doctoral student at Penn who could afford cars and telephones,
State University in the Department of were primarily Republicans. In this same
Agricultural Economics and Rural So- year, however, George Gallup correctly
ciology, Dr. Bob Bealer used this phrase predicted that Roosevelt would be the
frequently. He used it when a student winner. Gallup’s approach was based on
would ask, “How many pages do you using quota sampling, which ensured
want for this paper?” or when one of us that samples were drawn from various
would want to know how much data we segments of society (e.g., urban, rural,
needed to produce a “significant” result. rich, poor, Republicans, and Democrats).
Professor Bealer challenged us to think As a result of this event, Gallup’s credi-
by using few words. He knew that the bility increased dramatically while that
answers were rattling around somewhere of the Literary Digest plummeted. The
within our developing brains. His skill next major sampling fiasco occurred in
was in providing a light for us to find the 1948 when Gallup, and most other public
path. As much as you must and as little opinion polling organizations, predicted
as you dare—it is a wonderfully simple that Thomas Dewey would be victorious
phrase that relates to many aspects of life. over Harry Truman. What they all missed
References 141

in this case was (1) that nearly all the lines are to so many aspects of life. Many
pollsters finished their polling too soon of the challenges we face with data and
and missed the late surge for Truman and measurement stem from the fact that the
(2) the people who in earlier polls said people who own the process do not take
they were not sure who they would vote ownership of their data and the results
for decided to vote predominantly for produced by their processes. Inevitably,
Truman. The success that Gallup had in when I am involved with assisting people in
1936 with quota sampling proved to be developing their indicators, there will be a
disastrous 12 years later. It was after the moment when their discussion about data
1948 election that academic statisticians collection makes me think of this story.
began a serious push for using probability 12. The subgroup is basically how you have
theory as a basis for drawing samples. organized your data (e.g., daily, weekly or
Today the use of probability sampling monthly) and appears on a chart as the
methods remains the accepted standard label on the x or horizontal axis. More
for drawing the least amount of data detail on selecting and using subgroups
with the highest level of predictability will be provided in Chapters 8 and 9.
and confidence.
11. The other alternative to the deus ex machina
is found in the following story: References
The Facts of Life Babbie, E. R. The Practice of Social Research. Belmont, CA:
Wadsworth, 1979.
The story that follows is about four
Brooke, R., C. Kamberg, and E. McGlynn. “Health System
people named Everybody, Somebody, Reform and Quality.” Journal of the American Medical
Anybody, and Nobody. There was an Association 276, no. 6 (1996): 476–480.
important job to be done and Every- Caldwell, C. Mentoring Strategic Change in Health Care.
body was asked to do it. Anybody Milwaukee: Quality Press, 1995.
Campbell, S. Flaws and Fallacies in Statistical Thinking.
could have done it, but Nobody did
Englewood Cliffs, NJ: Prentice-Hall, 1974.
it. Somebody got angry about that Daniel, W., and J. Terrell. Business Statistics. Dallas: Houghton
because it was Everybody’s job. Ev- Mifflin, 1989.
erybody thought Anybody could do it, Deming, W. E. Some Theory of Sampling. New York: John
but Nobody realized that Everybody Wiley & Sons, 1950.
Deming, W. E. Sample Design in Business Research. New
blamed Somebody when Nobody
York: John Wiley & Sons, 1960.
accused Anybody. Deming, W. E. “On Probability as a Basis for Action.” Amer-
ican Statistician 29, no. 4 (1975): 146–152.
I am not sure of the origin of this story.
Deming, W. E. Out of the Crisis. Cambridge, MA: Massa-
My mother gave me a copy of it when I chusetts Institute of Technology, Center for Advanced
first started college. At the time I accepted Engineering Study, 1992.
it graciously and tucked it away, thinking Deming, W. E. The New Economics for Industry, Government,
that it was one of those things mothers give Education. Cambridge, MA: MIT Press, 1994.
Donabedian, A. Explorations in Quality Assessment and
their children as they go off to college and
Monitoring. Vol. 1: The Definition of Quality and
hope that it makes them think about how Approaches to Its Assessment. Ann Arbor, MI: Health
their actions affect others. That was back in Administration Press, 1980.
1966. Today, I still have the original piece Donabedian, A. Explorations in Quality Assessment and
of paper she gave me with this story typed Monitoring. Vol. 2: The Criteria and Standards of Quality.
Ann Arbor, MI: Health Administration Press, 1982.
on it. Over the years it is funny how many
Duncan, A. Quality Control and Industrial Statistics, 5th ed.
times I have pulled out this little piece of Homewood, IL: Irwin Press, 1986.
paper or run across it in a cluttered desk Gonick, L., and W. Smith. The Cartoon Guide to Statistics.
drawer and realized how relevant the New York: Harper Perennial, 1993.
142 Chapter 4 Milestones in the Quality Measurement Journey

Hess, I., D. Riedel, and T. Fitzpatrick. Probability Sampling Maddox, B. Sampling Concepts, Strategy and Techniques
of Hospitals and Patients. Ann Arbor, MI: Health (Technical Report 81-1). Harrisburg: Pennsylvania De-
­Administration Press, 1975. partment of Health, State Health Data Center, July 1, 1981.
Institute of Medicine. Crossing the Quality Chasm. Wash- Mann, N. R. The Keys to Excellence: The Story of the Deming
ington, DC: National Academy Press, 2001. Philosophy. London: Mercury Books, 1989.
Ishikawa, K. Guide to Quality Control. White Plains, NY: Miller, D. Handbook of Research Design and Social Mea-
Quality Resources, 1982. surement. New York: David McKay Company, 1964.
Joint Commission on Accreditation of Healthcare Orga- Nelson, E., P. Batalden, and M. Godfrey. Quality by De-
nizations. The Measurement Mandate: On the Road to sign: A Clinical Microsystem Approach. San Francisco:
Performance Measurement in Health Care. Oak Brook, Jossey-Bass, 2007.
IL: JCAHO, 1993. Selltiz, C., M. Jahoda, M. Deutsch, and S. Cook. Research
Kaplan, R., and P. Norton. “The Balanced Scorecard—­ Methods in Social Relations. New York: Holt, Rinehart
Measures That Drive Performance.” Harvard Business and Winston, 1959.
Review (January–February 1992): 71–79. Shewhart, W. Economic Control of Quality of Manufactured
Kaplan, R., and P. Norton. “Putting the Balanced Scorecard Product. New York: D. Van Nostrand, 1931. Reprint,
to Work.” Harvard Business Review (September–October Milwaukee: Quality Press, 1980.
1993): 134–147. Weiss, R. Statistics in Social Research. New York: John Wiley
Kaplan, R., and P. Norton. “Using the Balanced Scorecard & Sons, 1968.
as a Strategic Management System.” Harvard Business Western Electric Co. Statistical Quality Control Handbook.
Review (January–February 1996): 75–85. Indianapolis, IN: AT&T Technologies, 1985.
Langley, G., K. Nolan, T. Nolan, C. Norman, and L. Provost. Wheeler, D. Understanding Variation: The Key to Managing
The Improvement Guide. San Francisco: Jossey-Bass, 1996. Chaos. Knoxville, TN: SPC Press, 1993.
CHAPTER 5
Organizing Indicators into a
Strategic Dashboard
M
ost quality improvement (QI) teams Each row in Table 5-1 represents a single
will create more than one indicator. indicator. The columns identify the major pieces
If your organization has 10 to 15 im- of information that need to be summarized about
provement teams, this means that you could be each indicator. TABLE 5-2 provides an example
looking at managing somewhere between 50 and of a completed measurement plan for a team
120 indicators (assuming that each team creates working to improve the medicines management
five to eight indicators, which is usually at the low process. Three key concepts provide the focus
end of what teams want to track). So, how do you for the team (i.e., volume, patient safety, and
go about organizing multiple indicators? First efficiency). Each concept will be captured by:
of all you need a simple way to summarize and
■■ The count of the number of med orders
organize the various indicators you have produced
each day (volume)
so that they can be presented in meetings and
■■ Percentage of medication orders with one
shared with others. TABLE 5-1 provides a template
or more errors (patient safety)
for organizing your indicators. The details for
■■ Turnaround time (TAT) to process a med
each indicator would be documented in the
order (efficiency)
Indicator Development Worksheet presented in
Chapter 4. You do not need (nor want) to give The details for each measure would be
all the details of the indicators to the Quality specified in the Indicator Development Work-
Council or to senior management. They need sheet, which would not be presented to others
a 40,000-foot view. As owners of the process, outside the improvement team. When it came
you and your team members need to have the time to present to the Quality Council or to the
details on each indicator. Groups listening to management team, all the improvement team
you explain your measurement plan, however, will need to do is present the summary table
want to know the big picture, not the details. (Table 5-2) to the group and briefly describe

© Michal Steflovic/Shutterstock

143
144 Chapter 5 Organizing Indicators into a Strategic Dashboard

TABLE 5-1 Dashboard summary worksheet

Operational
Definition
(Define the
Indicator indicator in
Name very specific
(Be sure to terms.
indicate Provide the
whether it numerator Data
is a count, and the Collection Plan
percentage, denominator ■ Frequency
rate, days if a ■ Location
between, percentage ■ Stratification Short-Term Long-Term
Concept etc.) or rate.) ■ Sampling Target Goal

Concept #1 Indicator #1

Concept #2 Indicator #2

Concept #3 Indicator #3

key indicators the team is measuring. Once you the highest quality at the lowest cost. The use of
have started to organize your indicators into a some sort of grading system to evaluate providers
format that can be shared easily with others, of care was also motivated by the fact that (1) most
you are well on your way to building what is providers were not being totally transparent about
known as a dashboard or instrument panel of releasing their own results, and (2) patients and
strategic indicators. their caregivers started complaining that there
was more information available on evaluating car
purchases than there was on where they should
▸▸ Evolution of the go for their hip or knee replacement surgery.
As Nelson, Batalden, Plume, Mihevc, and
Strategic Dashboard Swartz (1995) point out, three primary groups
in the United States have led the development
In the United States, the growing demand for of healthcare report cards: (1) purchasers
healthcare data can be traced back to the early of care, (2) health service researchers, and
1980s when a variety of external groups began (3) providers of care who were anxious to show
pushing for the development of healthcare report that they were concerned about cost, service, and
cards. Just like the report cards we received in quality. Today I would add two more groups to
school, these summaries were intended to evaluate these initial three: politicians and the media.
providers on a number of key clinical and opera- Even in national health systems as we find in
tional indicators. Report card sponsors (usually most other countries outside the United States
external oversight or regulatory bodies) felt that where purchasers of care (e.g., manufacturing
these assessments were needed in order to make companies or insurance groups) play little
prudent decisions about which providers offered or no role in the healthcare system, the push
TABLE 5-2 Completed dashboard summary worksheet for the medicines management team

Indicator
Name
(Be sure to
indicate
whether it Operational Definition
is a count, (Define the indicator Data Collection Plan
percentage, in very specific terms. ■ Frequency
rate, days Provide the numerator ■ Location
between, and the denominator if a ■ Stratification & Sampling
Concept etc.) percentage or rate.) ■ Data Source Short-Tem Target Long-Term Goal

Volume The ■■ This is a simple count ■■ The data will be tracked daily. Not Applicable Not Applicable
number of the individual med ■■ This measure applies to all Volumes will Volume analysis
of med orders. inpatient units. be analyzed to will be used to
orders ■■ A med order is ■■ The data will be stratified by shift determine the determine the
received defined as a count of and by type of order (stat versus ordering patterns by maximum order
in the the written or verbal routine). hour. loads the pharmacy
inpatient orders received in ■■ Initially all medication orders can handle.
pharmacy. the pharmacy from a will be reviewed. A stratified
qualified clinician. proportional random sample will
■■ A med order will be be considered once the variation
included only if it will in the process is fully understood
be administered to an and the volume of orders is
inpatient within the analyzed.
hospital. ■■ The data will be pulled from the
pharmacy computer and the
computerized physician order
entry (CPOE) systems.
Evolution of the Strategic Dashboard

(continues)
145
146

TABLE 5-2 Completed dashboard summary worksheet for the medicines management team (continued)

Indicator
Concept Name Operational Definition Data Collection Plan Short-Tem Target Long-Term Goal

Patient Percentage Numerator: Number of ■■ The data will be tracked daily and Once the baseline The long-term
Safety of inpatient medication grouped by week. data are analyzed a goal is to have
inpatient orders with one or more ■■ This measure applies to all target will be set. 0 inpatient med
medication errors. An error is defined inpatient units. errors.
orders as: wrong med, wrong ■■ The data will be stratified by shift The probability of
with an dose, wrong route, or and by type of order (stat versus reaching 0 will be
error. wrong patient. routine). better understood
Denominator: Number ■■ Initially all medication orders once the baseline is
of inpatient medication will be reviewed. A stratified established.
orders received by the proportional random sample will
pharmacy. be considered once the variation
in the process is fully understood
and the volume of orders is
analyzed.
■■ The data will be pulled from the
pharmacy computer and the
Chapter 5 Organizing Indicators into a Strategic Dashboard

CPOE systems.

Efficiency Inpatient TAT is defined as the time ■■ TAT will be tracked for each ■■ Historically the The long-term goals
med order (in whole minutes) from inpatient med order and analyzed targets were: will be set once the
turnaround when the med order is daily. ■■ 60 minutes or baseline and current
time (TAT). received in the pharmacy ■■ The orders will be stratified by less for routine performance
until it is posted in the shift and type of order (stat versus orders in TAT is better
computerized reporting routine). ■■ 30 minutes or less understood.
system. ■■ A stratified proportional random for stat orders
sample of the orders will be ■■ These will be
pulled daily. reviewed after
■■ The TAT will be pulled from the the baseline is
pharmacy order entry system. obtained.
Evolution of the Strategic Dashboard 147

from researchers, consumers, commissioners, and yank as some refer to it) providers for their
political leaders, and the media for more and performance on selected indicators.
more data on provider performance has also The most popular display format for ratings
grown in geometric proportions. This push for has become the ever popular assignment of red,
more and more data on providers that can be yellow (amber), or green assessments based on
accessed openly and transparently provides clear performance to target or goal. Star ratings come
motivation for having a measurement philosophy in a close second as a popular format. Providers
and knowing your own data better than anyone rated or ranked at or near the top (i.e., those
else, especially external groups. receiving green ratings or four or five stars) are
Irrespective of the country where care is judged to be better performers than those further
provided, the rapid growth of the Internet and down (i.e., those receiving yellow or red ratings
the related ability of anyone to obtain almost or one or two stars). It is also not uncommon for
instant access to information on a wide variety percentiles to be used as the ranking mechanism
of issues and topics also have been primary to determine performance. Percentile rankings
drivers for the increasing demand to release are frequently used in assessing patient or staff
and publish healthcare process and outcome satisfaction results.
indicators. The primary problem with thinking All of these approaches and reporting
that scorecards or report cards provide the formats have the uncanny ability to produce
solution to the growing demand to evaluate a wide variety of emotional reactions. If the
and rate providers, however, is that these ap- ranking process places your organization in the
proaches are basically designed to produce data best category (i.e., a green rating, the maximum
for judgment, not improvement. These forms of number of stars, or in the top decile), two reac-
reporting on healthcare provider performance tions typically occur: (1) unbridled joy emerges
are usually derived from historical data that are and the marketing department includes the
aggregated, static in nature, and lagged by as results in the most recent board report or puts
much as a year or more. advertisements in the local paper about how
Nelson et al. provide the following portrait high they ranked, or (2) there is complacency,
of the report card orientation (1995, p. 157): which is derived from the satisfaction that you
were not in the middle or bottom part of the
The Report Card Image. . .
ranking schema. On the other hand, when the
Who gets report cards? Students
rankings place your organization in the middle
Who gives them? Teachers
of the distribution (e.g., between the 25th and
What is the focus? Past performance
75th percentiles) some form of rationalization
Who wins? The As
is usually offered (e.g., the data are old, we have
Who loses? Everybody else
sicker patients, we are a large academic medical
What’s learned? I’m above aver-
center that treats the most difficult cases, or the
age, average, or
demographics of the population we serve create
below average
extreme challenges for us). Finally, when your
Placed within the healthcare context, the organization is in the bottom of the ranking with
report card or scorecard approach is used by a majority of red ratings, one star for a majority
both internal and external groups and organi- of the indicators, or below the 25th percentile,
zations. Those producing the report card may then fear, finger pointing, and comments such
involve the providers in the development and as “someone needs to be held accountable”
design of the final report or they may totally are typical reactions. Deming’s cycle of fear
exclude the providers from the process. The (Scherkenbach, 1990, p. 71) frequently can be
primary objective is to rate and rank (or rank found when the rating approach being used to
148 Chapter 5 Organizing Indicators into a Strategic Dashboard

groups providing the judgments do not say, “We


realize that you are in the bottom decile of our
Increased Kill the most recent ranking but we will be sending 12
Fear Messenger improvement specialists to your organization
to help you move up in the rankings.” Or, as
Nelson et al. put it (1995, 157), those passing
judgment on providers do not “help providers
know how to make improvements.” The ranking
occurs and people are merely told to “improve
your performance.” True improvement requires
not only a different approach to the collection
Filtered and analysis of data but also a different way of
Micro-
management Information thinking about what to do with data once you
have it.
The alternative to the static report card is the
development of a more dynamic performance
FIGURE 5-1 Deming’s cycle of fear measurement system that enhances decision
making and encourages improvement strate-
gies. Two terms have emerged to describe this
produce report cards or scorecards does not place alternative approach: the dashboard and the
you organization in a particularly good light. instrument panel. Some writers have favored
Deming’s cycle of fear is shown in FIGURE 5-1. the image of a car dashboard for this approach
This cycle starts when there is growing fear whereas others have used the analogy of the
that the numbers given to management may instrument panel of an airplane. Regardless
not reinforce what management wants to hear. of the term, the concept that they are trying
This then leads to a concern that the individual to promote is the same—you need to look at
delivering bad news (i.e., numbers that do not how you are performing right now and make
fit someone’s view of reality) will be “killed.” predictions about where you are headed in the
This fear then leads to filtering the data or the future. Report cards and scorecards are focused
interpretation of the data in order to report only on the past not the future. Don Wheeler (1993,
“good news.” The final step in this cycle of fear 4), a well-known writer on statistical process
is that there is increased micromanagement of control (SPC) methods and understanding
the data by the workers and of the workers by variation, puts it this way: “Managing a process
management, which leads to increased fear and on the basis of monthly (or quarterly) averages
then the cycle begins again. is like trying to drive a car by looking in the
Another major challenge with the report rear view mirror.” I think most of us would
card approach is that it does not provide any agree that driving a car this way is a very unsafe
understanding of the variation in the indica- approach. It looks at where we have been not
tors that produce the results. Without under- where we are going. Similarly, trying to look
standing the variation in the processes and at last year’s or last quarter’s results, especially
related outcomes it will be virtually impossible with aggregated numbers or summary statis-
to make improvements. More is said on this tics, provides no basis for predicting where the
point in Chapter 6. The final problem with the ­organization will go in the future. The dashboard
report card approach, especially when done to of your car, for example, is providing real-time
providers of care by external groups, is that the feedback on indicators important to the safe
Evolution of the Strategic Dashboard 149

performance of your car and your interaction A key feature is providing critical, real-time
with the system. As you drive down the road, information to the user to prompt wise decisions
you don’t say, “Gee, I wonder what my speed was and—if need be—make rapid midcourse cor-
on this stretch of road when I drove on it last rections.” The decision to create an instrument
month?” Or, “I wonder what my average miles panel or dashboard rather than a report card or
per gallon was on this highway when I drove scorecard is based on much more than seman-
this way back in December?” These questions tics. Although there are obvious differences in
are looking to the past and have no relevance the intent of each approach (judgment versus
to your current performance. Knowledge of improvement) there are also major differences in
your current speed or the amount of gasoline the ways in which data are collected, tabulated,
remaining in your tank can be used to predict and displayed.
if you run the risk of getting a speeding ticket Robert Kaplan and David Norton have
or if you will make it to your destination be- developed a third concept to describe the orga-
fore running out of gas. Current knowledge nization of key strategic indicators. They have
and prediction are hallmarks for the quality coined the term the “balanced scorecard” as their
measurement journey (QMJ). organizing rubric. In 1992, they published the
According to Bader (1993), Henry ­Berman, first of a series of articles defining the balanced
chief executive officer (CEO) of Group Health scorecard, its components, and how they envision
Northwest (Spokane, WA), was the first to its use. Subsequent articles provided examples
popularize the dashboard analogy. Nelson of companies applying their ideas (1993) and
et al. favor the instrument panel analogy over how their framework can be used to create a
the dashboard: “Just like the cockpit crew of strategic management system (1996). Although
a jet airplane need instrument panels to fly the terminology is different from that used by
safely, health care delivery system leaders need Nelson et al., Kaplan and Norton also see the
instrument panels to manage wisely” (1995, balanced scorecard as “the dials and indicators
157). The characteristics of an instrument panel in an airplane cockpit.” They continue their
are as follows” analogy by stating, “Reliance on one instru-
ment can be fatal. Similarly the complexity
Instrument Panel Image. . .
of managing an organization today requires
Who uses them? The cockpit crew
that managers be able to view performance
(pilot, copilot,
in several areas simultaneously” (Kaplan and
navigator)
Norton, 1992, p. 72)1
Who interprets The cockpit crew
In their first article (1992), Kaplan and Norton
the results?
identified the following benefits of developing
What is the focus? Present and future
a balanced scorecard:
performance
What is the utility? Real-time monitor-
■■ It brings together, in a single management
ing, predicting the
report, many of the seemingly disparate
future, and taking
elements of an organization’s strategic agenda.
action
■■ It helps to reduce information overload,
Nelson et al. (1995, p. 158) conclude, “The by focusing on the “vital few” indicators.
instrument panel or dashboard metaphor has ■■ It helps to guard against suboptimization
an entirely different aura from that of the report by forcing senior managers to consider
card. It has vitality, timeliness, and a clear-cut all the important measures together and
utility that is absent from report card thinking. lets them see whether improvement in
150 Chapter 5 Organizing Indicators into a Strategic Dashboard

one area may be achieved at the expense Based on their research, the Health Care
of another. Advisory Board identified four key elements of
■■ It puts strategy and vision, rather than an effective dashboard:
control, at the center of an organization’s
■■ Building a dashboard around a balanced
effort.
set of performance measures
■■ It is based on an understanding of interre-
■■ Selecting a fairly austere set of measures
lationships between functions, not on the
(i.e., keeping it simple by selecting the vital
performance of individual functions or units.
few measures, usually 15–30)
■■ It provides an opportunity for organizational
■■ Presenting data in graphic displays (rather
learning at the executive level.
than tabular formats)
■■ Developing action triggers (i.e., setting targets
Nugent et al. (1994) was one of the first
and goals that trigger the need for action)
clinicians to actually apply the instrument panel
notion to a healthcare situation. He and his col- They then proceeded to summarize the
leagues describe the creation of an instrument leading categories for organizing healthcare
panel to monitor and improve coronary artery dashboards:
bypass graft (CABG) surgery. In this classic
■■ Financial performance
article, they walk the reader through the selec-
■■ Operational effectiveness/efficiency
tion and definition of key indicators for CABG
■■ Quality (clinical and service quality)
surgery and then show how control charts can
■■ Satisfaction (patient and family as well as
be used to understand the variation that lives
employee and physician satisfaction)
within the process. Although the authors focus
on the details of how to create a dashboard, one Examples of specific indicators that char-
of the side benefits of the article is to show how acterize each of these four categories are also
the instrument panel concept can be applied to provided in the Health Care Advisory Board
a very specific clinical procedure and its related document. The report concludes with actual
outcomes. examples of dashboards from health systems,
In 2000, the Health Care Advisory Board stand-alone hospitals and academic medical
published one of the first comprehensive centers.
reports on healthcare dashboards. . This Designed correctly, the dashboard (or in-
document, CEO Dashboards: Performance strument panel) can be used to meet not only
Metrics for the New Health Care Economy, has internal management needs but also those in-
emerged as a general guide for the develop- creasing demands made on providers by external
ment of healthcare dashboards. It addresses groups interested in building report cards. If an
the following issues: organization builds what Caldwell (1995) refers
to as a “strategic measurement deployment
■■ The problem of inadequate performance matrix,” it will be able to measure key indicators
measurement at various levels within the organization and
■■ Elements of an effective dashboard then have the capability to roll these measures
■■ Recommendations for rapid dashboard up into overall summary measures that also
development satisfy external requirements. Dashboards and
■■ Dashboards of leading hospitals and health instrument panels can be used to navigate cor-
systems rectly at the board and senior management level
Focusing on the Vital Few 151

(macrolevel), the departmental or meso level (i.e., 80% of a problem can be attributable to
and finally at the point where care is delivered 20% of the causes). The Pareto diagram (Graham
(microlevel). The choice in how you structure and Cleary, 2000; Plsek and Onnias, 1989) is
your dashboard(s) is up to you. used most often to prioritize opportunities for
QI teams (Carey and Lloyd, 2001). Its primary
▸▸ Focusing on the goal is to identify the vital few and separate
these causal factors from the common or trivial
Vital Few many (the things that do not matter).
The Pareto principle applies very well,
One of the major challenges with creating a however, to the dashboard concept. Dashboards
dashboard is parsimony (what the Health Care should be populated with the vital few not the
Advisory Board calls austerity). I have seen many trivial many. Yet all too often organizations de-
organizations create a dashboard that contains 50 cide that nearly everything they measure needs
to 80 indicators or more. A dashboard with this to be placed on the dashboard. In this case,
many indicators is counter to what the concept their dashboard looks more like the dashboard
of the dashboard is trying to achieve. Again, (instrument panel) of a space shuttle or a 747
consider the dashboard of your car. There are jumbo jet. Although you may not be able to
a few vital indicators that you rely on regularly, achieve the simplicity of a moped’s dashboard that
most notably the speedometer and the fuel has about three indicators, you certainly do not
gauge. After these two indicators, the rest of need the complexity of the space shuttle. From
the dashboard gauges may (or may not) be of my perspective, I believe that a parsimonious
importance to you. You might pay some attention dashboard should consist of a maximum of 10
to the temperature gauge but you probably do to 15 indicators. Note, however, that depending
not give much consideration to your electrical on whether this is a macro-, meso-, or microlevel
system status indicators or the accumulating dashboard the 10 to 15 indicators being tracked
mileage indicator (a.k.a., the odometer). Then may not all be the same indicator at each level.
there is the tachometer. This is probably the most If you are interested in obtaining examples
useless indicator on your dashboard, especially if of how many indicators can or should be on a
you are driving an automatic transmission vehi- dashboard, I would suggest that you obtain a
cle. The point to be made here is that regardless copy of the Health Care Advisory Board’s CEO
of the context, cars or healthcare, there are in- Dashboard report (2000) and then contact
dicators that we have access to or even review several of the organizations listed in the report.
regularly that frequently add little value to our These organizations can be used as a basis for
decision making processes or knowledge. a healthy dialogue on both the contents of
The Pareto principle provides a good your dashboard and the data display options
conceptual and organizational framework for for you to consider. But, any external example
selecting dashboard indicators. The Pareto you use as a reference should not be accepted
principle (named after the Italian economist automatically as the preferred template for a
Vilfredo Pareto, 1848–1923) basically states that dashboard. Each organization needs to develop
for any event (or problem) a small number of and design dashboards that not only meet their
factors will account for a majority of the reasons short- and long-term strategic objectives but
why the event (or problem) occurred. Out of also, and most important, fit with the organi-
the Pareto principle emerged the 80/20 concept zation’s culture.
152 Chapter 5 Organizing Indicators into a Strategic Dashboard

CASE STUDY #1: East London National Health Service


(NHS) Foundation Trust’s Strategic Dashboard 2
The East London NHS Foundation Trust (ELFT) is one of the largest providers of mental health and
social care services in England (https://www.elft.nhs.uk/). Originally formed in 2000, ELFT has long
been recognized as a center of excellence for mental health care, innovation, and improvement. The
Trust was first established as a mental health trust to cover East London but it has since broadened its
service area to include its core area of City of London, Hackney, Newham, and Tower Hamlets. In April
2015, the ELFT also encompassed the Bedfordshire and Luton service area.
East London’s quality journey began in earnest in 2014 when they became a strategic partner of
the Institute for Healthcare Improvement (IHI). We collaborated with the Trust to help them identify
strategic objectives for quality and safety, outline tactical plans for building capacity and capability
throughout the Trust, and develop key indicators to track progress toward its goals. One of the
objectives during the first year of our partnership was to develop a strategic dashboard that could be
shared with the board, the staff, and the public. Two overarching organizational aims were established:
(1) reducing harm and (2) right care, right place, right time. Their high-level causal models and the
factors that drive these two key outcomes are shown in FIGURES 5-2 and 5-3. Building upon these two
central aims, the ELFT quality team worked with senior leaders and the board to build their strategic
dashboard. Four types of indicators were identified as the major dimensions of the dashboard:
■■ Safety
■■ Clinical Effectiveness
■■ Patient Experience
■■ Staff Experience
The specific indicators for each of these dimensions are shown in FIGURES 5-4 through 5-7. At
this point, it is not important to delve into the details of each dimension and its related indicators. The
key point is that the ELFT leaders established aims and then developed a measurement strategy and
related indicators that captured and operationalized these aims. They track the performance of these
indicators each month and everyone from the board to the frontline staff get to review and comment
on the progress that is being made. I’d highly recommend that your organization should engage in a
dialogue that creates a similar environment that will allow you to embark on your own QMJ.

Falls Reliable Reducing


delivery of delays and
evidence-based inefficiencies in
care the system
Medication Pressure
errors ulcers
Improving Improved
patient and Right care, access to
carer right place, services at the
Physical Reducing experience right time right location
Restraints
violence Harm

FIGURE 5-2 East London NHS Foundation Trust FIGURE 5-3 East London NHS Foundation Trust
reducing harm diagram right care, right place, right time diagram
East London NHS Foundation Trust East London NHS Foundation Trust

(continues)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
2
3
4
5
6
7
8
9
10
20
25
30
35
40
45
50
% Caseload Contacted In Month
% Bed Occupancy
Days
Apr 11 Apr 11 Apr 11

80%
82%
84%
46%
88%
90%
92%
94%
96%
May 11 May 11 May 11
Jun-13 Jun 11 Jun 11 Jun 11

24
29
34
39
44
49
60%
65%
70%
75%
80%
85%
90%
95%
100%
105%
Jul 11 Jul 11 Jul 11
Jul-13 Aug 11 Aug 11 Aug 11
Apr-14 May 12 Aug-13 Sep 11 Sep 11 Sep 11
Oct 11 Oct 11 Oct 11
Jun 12 Sep-13 Nov 11 Nov 11 Nov 11
May-14 Jul 12 Dec 11 Dec 11 Dec 11
Oct-13 Jan 12 Jan 12 Jan 12
Aug 12 Feb 12 Feb 12 Feb 12
Nov-13 Mar 12 Mar 12 Mar 12
Jun-14 Sep 12 Apr 12 Apr 12 Apr 12
Dec-13 May 12 May 12 May 12
Oct 12 Jan-14 Jun 12 Jun 12 Jun 12
Jul-14 Jul 12 Jul 12 Jul 12
Nov 12 Feb-14 Aug 12 Aug 12 Aug 12
Sep 12 Sep 12 Sep 12
Dec 12 Mar-14 Oct 12 Oct 12 Oct 12
Aug-14 Jan 13 Nov 12 Nov 12 Nov 12
Apr-14 Dec 12 Dec 12 Dec 12
Feb 13 Jan 13 Jan 13 Jan 13
Sep-14 May-14 Feb 13 Feb 13 Feb 13
Mar 13 Jun-14 Mar 13 Mar 13 Mar 13
Apr 13 Apr 13 Apr 13
Apr 13 Jul-14 May 13 May 13 May 13
Oct-14 Jun 13 Jun 13 Jun 13
May 13 Aug-14 Jul 13 Jul 13 Jul 13
Jun 13 Aug 13 Aug 13 Aug 13
Nov-14 Sep-14 Sep 13 Sep 13 Sep 13
Jul 13 Oct 13 Oct 13 Oct 13

East London NHS Foundation Trust


East London NHS Foundation Trust
Oct-14 Nov 13 Nov 13 Nov 13
Aug 13 Nov-14 Dec 13 Dec 13 Dec 13
Dec-14 Jan 14 Jan 14 Jan 14
Sep 13 Dec-14 Feb 14 Feb 14 Feb 14
Mar 14 Mar 14 Mar 14
Jan-15 Oct 13 Jan-15 Apr 14 Apr 14 Apr 14
May 14 May 14 May 14
Nov 13 Feb-15 Jun 14 Jun 14 Jun 14
Dec 13 Mar-15 Jul 14 Jul 14 Jul 14
Feb-15 Aug 14 Aug 14 Aug 14
Jan 14 Apr-15 Sep 14 Sep 14 Sep 14
Oct 14 Oct 14 Oct 14
Mar-15 Feb 14 May-15 Nov 14 Nov 14 Nov 14
Incidents Reported per 1,000 Bed Days u-Chart

Dec 14 Dec 14 Dec 14


Mar 14 Jun-15 Jan 15 Jan 15 Jan 15

Falls Reported per 1,000 Occupied Bed Days u-Chart


Feb 15 Feb 15 Feb 15
Apr-15 Apr 14 Jul-15 Mar 15 Mar 15 Mar 15
Apr 15 Apr 15 Apr 15

Episodes of Restraint per 1,000 Occupied Bed Days u-Chart

CPA Caseload Contacted In Month Percentage i-Chart


May 14 Aug-15 May 15
May 15 May 15
May-15 Jun 14 Sep-15 Jun 15 Jun 15 Jun 15
Jul 15 Jul 15 Jul 15
Jul 14 Oct-15 Aug 15 Aug 15 Aug 15
Sep 15 Sep 15 Sep 15
Jun-15 Aug 14 Nov-15 Oct 15 Oct 15 Oct 15
Dec-15 Nov 15 Nov 15 Nov 15
Sep 14 Dec 15 Dec 15 Dec 15
Jul-15 Jan-16 Jan 16 Jan 16 Jan 16
Oct 14 Feb 16 Feb 16 Feb 16
Nov 14 Feb-16 Mar 16 Mar 16 Mar 16
Aug-15 Dec 14 Mar-16

0.0
0.2
0.4
0.6
0.8
1.0
1.2
0
1
2
3
4
5
6
0.0
0.2
0.4
0.6
0.8
1.0
1.2

Jan 15
Sep-15 Feb 15 Apr 11 Apr 11
May 11 May 11 Oct 12

Adult Acute Mental Health Occupancy i-Chart


Oct-15 Mar 15 Jun 11 Jun 11 Nov 12
Apr 15 Jul 11 Jul 11 Dec 12

8.0%
9.0%
10.0%
11.0%
12.0%
13.0%
14.0%
Aug 11 Aug 11
May 15 Sep 11 Sep 11 Jan 13
Nov-15 Aug 12 Oct 11 Oct 11 Feb 13
Jun 15 Sep 12 Nov 11 Nov 11
Dec 11 Dec 11 Mar 13
Oct 12 Jan 12 Jan 12
Dec-15 Jul 15 Nov 12 Feb 12
Apr 13
Feb 12
Aug 15 Dec 12 Mar 12 Mar 12 May 13
Apr 12

Adult CMHTs Days Waited until First Face to Face Contact i-Chart
Jan 13 Apr 12 Jun 13
Jan-16 Sep 15 May 12 May 12
Feb 13 Jun 12 Jun 12 Jul 13
Oct 15 Mar 13 Jul 12 Jul 12 Aug 13
Apr 13 Aug 12 Aug 12 Sep 13
Feb-16 Nov 15 May 13 Sep 12 Sep 12
Oct 12 Oct 12 Oct 13
Dec 15 Jun 13 Nov 12 Nov 12 Nov 13
Jan 16 Jul 13 Dec 12 Dec 12
Mar-16 Aug 13 Jan 13 Jan 13 Dec 13
Feb 16 Feb 13 Feb 13 Jan 14
Sep 13 Mar 13 Mar 13
Oct 13 Apr 13 Apr 13 Feb 14
Mar 16 Nov 13 May 13 May 13 Mar 14
Dec 13 Jun 13 Jun 13
Jul 13 Jul 13 Apr 14
Jan 14 Aug 13 Aug 13 May 14
Feb 14 Sep 13 Sep 13 Jun 14
Mar 14 Oct 13 Oct 13
Nov 13 Nov 13 Jul 14
Days Apr 14 Dec 13 Dec 13
Days May 14 Jan 14 Aug 14
Jan 14
Jun 14 Feb 14 Feb 14 Sep 14
Jul 14 Mar 14 Mar 14 Oct 14

24
25
26
27
28
29
30
Apr 14 Apr 14

26
31
36
41
46
51
56
61
66
71
Aug 14 May 14 May 14 Nov 14

DNA Rates p-Chart


Jun 12 Sep 14 Jun 14 Jun 14 Dec 14
Apr-14 Oct 14 Jul 14 Jul 14
Aug 14 Aug 14 Jan 15
Jul 12 Nov 14 Sep 14 Sep 14 Feb 15
Dec 14 Oct 14 Oct 14
May-14 Aug 12 Nov 14 Nov 14 Mar 15
Jan 15
Dec 14 Dec 14 Apr 15
Falls Resulting in Harm / 1,000 Bed Days u-Chart

Sep 12 Feb 15 Jan 15


Jan 15 May 15
Mar 15 Feb 15 Feb 15
Jun-14 Oct 12 Apr 15 Mar 15 Mar 15 Jun 15
Serious Incident per 1,000 Occupied Bed Days u-Chart

Nov 12 May 15 Apr 15 Apr 15 Jul 15


May 15 May 15
Jun 15 Jun 15 Jun 15 Aug 15
Restraints in prone position per 1,000 Occupied Bed Days u-Chart

Jul-14 Dec 12 Jul 15 Jul 15 Jul 15 Sep 15


Jan 13 Aug 15 Aug 15 Aug 15
Sep 15 Sep 15 Oct 15
Sep 15 Oct 15
Aug-14 Feb 13 Oct 15 Nov 15
Oct 15 Nov 15 Nov 15
Nov 15 Dec 15 Dec 15 Dec 15
Mar 13 Dec 15 Jan 16 Jan 16 Jan 16
Sep-14 Feb 16 Feb 16 Feb 16
Apr 13 Jan 16 Mar 16 Mar 16
May 13 Feb 16 Mar 16
Mar 16

FIGURE 5-4 East London NHS Foundation Trust safety dashboard


Oct-14 Jun 13
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6

2
3
4
5
6
7
8
9
10
11
12

Jul 13
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9

Nov-14 Aug 13 Apr 11 Nov 12


Sep 13 Apr 12 May 11 Dec 12
Dec-14 N May 12 Jun 11

0
5
10
15
20
25
30
35
Oct 13 ov Jun 12 Jul 11 Jan 13
-1 Aug 11
4 Jul 12 Feb 13
Nov 13 Aug 12
Sep 11
Jan-15 D Oct 11 Mar 13
Dec 13 ec
-1 Sep 12 Nov 11 Apr 13
Jan 14 4 Oct 12 Dec 11
Jan 12 May 13
Feb-15 Nov 12
Feb 14 Ja
Dec 12 Feb 12 Jun 13
n-1 Mar 12
5 Jan 13 Apr 12 Jul 13
Mar-15 Mar 14 Feb 13 May 12 Aug 13
Apr 14 Fe Mar 13 Jun 12 Sep 13
b-1
Apr 13 Jul 12
May 14 5 Aug 12 Oct 13
Apr-15 May 13 Sep 12
Jun 14 M Jun 13 Oct 12 Nov 13
ar-
15 Jul 13 Nov 12 Dec 13
May-15 Jul 14 Aug 13 Dec 12 Jan 14
(NHS) Foundation Trust’s Strategic Dashboard

Sep 13 Jan 13
Aug 14 Ap Feb 13 Feb 14
r-1 Oct 13 Mar 13
Jun-15 Sep 14 5 Nov 13 Mar 14

FIGURE 5-5 East London NHS Foundation Trust clinical effectiveness dashboard
Apr 13
M Dec 13 May 13 Apr 14
Oct 14 ay Jan 14 Jun 13 May 14
-1 Jul 13
Jul-15 Nov 14 5 Feb 14 Jun 14
Aug 13
Dec 14 Ju Mar 14 Sep 13 Jul 14
n-1 Apr 14 Oct 13
Aug-15 Jan 15 5 May 14 Nov 13 Aug 14
Jun 14 Dec 13 Sep 14
Feb 15 Ju Jul 14 Jan 14 Oct 14
Sep-15 l-1 Feb 14
Mar 15 5 Aug 14 Mar 14 Nov 14
Sep 14 Apr 14
Occupied Bed Days u-Chart

Dec 14

Adult Acute Mental Health Length of Stay i-Chart


Apr 15 Au Oct 14 May 14
Oct-15 g-1
Nov 14 Jun 14 Jan 15
May 15 5 Jul 14 Feb 15
Dec 14 Aug 14
Jun 15 Se Jan 15 Sep 14 Mar 15

CAMHS Days Waited until First Face to Face Contact i-Chart


Nov-15 Jul 15 p-1 Feb 15 Oct 14 Apr 15
5
(Registered and Unregistered Staff) c-Chart

Mar 15 Nov 14
Aug 15 Dec 14 May 15
O Apr 15 Jan 15
Dec-15 ct- May 15 Jun 15
Sep 15 Feb 15
Grade 3 & 4 Pressure Ulcer Originating at ELFT per 1,000

15
Jun 15 Jul 15
Unexpected Deaths per 1,000 Occupied Bed Days u-Chart

Mar 15
Oct 15 N Jul 15 Apr 15 Aug 15
Jan-16 ov Aug 15 May 15
Nov 15 -1 Jun 15 Sep 15
5 Sep 15
Instances of Variation between Planned and Actual Staffing Levels

Jul 15
Focusing on the Vital Few

Oct 15
Oct 15
CASE STUDY #1: East London National Health Service

Feb-16 Dec 15 D Aug 15


Nov 15
ec Nov 15 Sep 15
Jan 16 -1 Dec 15 Oct 15 Dec 15
5
Reported incidents of Physical Violence per 1,000 Occupied Bed Days u-Chart

Feb 16 Jan 16 Nov 15 Jan 16


Mar-16 Ja Dec 15
n-1 Feb 16 Jan 16 Feb 16
Mar 16 6 Mar 16 Feb 16 Mar 16
Mar 16
(continued)
153

(continues)
154

13
23
33
43
53
63
73
70%
75%
80%
85%
90%
95%
100%

0
1
2
3
4
5
6
7
8
9
2%
3%
4%
5%
6%
Nov 12 Apr 11
Nov 12 Jun-13
Dec 12 Dec 12 Jun 11
Jul-13
Jan 13 Jan 13 Aug 11 Aug-13
Feb 13 Feb 13
Oct 11 Sep-13
Mar 13 Mar 13
Apr 13 Dec 11 Oct-13
Apr 13
May 13 May 13 Feb 12 Nov-13
Jun 13 Jun 13 Apr 12 Dec-13
Jul 13 Jul 13 Jan-14
Aug 13 Jun 12
Aug 13 Feb-14
Sep 13 Aug 12
Sep 13 Mar-14
Oct 13 Oct 12
Nov 13 Oct 13 Apr-14
Dec 13 Nov 13 Dec 12 May-14
Jan 14 Dec 13 Feb 13 Jun-14
Feb 14 Jan 14 Apr 13 Jul-14
Mar 14 Feb 14 Aug-14
Jun 13
Apr 14 Mar 14 Sep-14
May 14 Aug 13
Apr 14 Oct-14
Jun 14 Oct 13
May 14

East London NHS Foundation Trust


East London NHS Foundation Trust
Jul 14 Nov-14
Jun 14 Dec 13

Period
Aug 14 Dec-14
Sep 14 Jul 14 Feb 14

Complaints c-Chart
Jan-15
Oct 14 Aug 14
Apr 14 Feb-15
Nov 14 Sep 14

Physical Attacks on Staff


Jun 14 Mar-15
Dec 14 Oct 14
Jan 15 Aug 14 Apr-15 understandable answer, involvement in care.)
Nov 14
Community Health Newham PREMs

Feb 15 Oct 14 May-15


Dec 14

Sickness and Absence Levels i-Chart

per 1,000 Occupied Bed Days u-Chart


Mar 15 Jan 15 Jun-15
Dec 14
Apr 15 Feb 15 Jul-15
May 15 Feb 15
Mar 15 Aug-15
Jun 15 Apr 15
Apr 15 Sep-15
Jul 15
May 15 Jun 15 Oct-15
Aug 15
Sep 15 Jun 15 Aug 15 Nov-15
An average across 5 responses (Patients giving a “Yes - definitely” answer to questions
covering confidence in the consultant, respect and dignity, understandable information,

Oct 15 Jul 15 Oct 15 Dec-15


Nov 15 Aug 15 Jan-16
Dec 15
Dec 15 Sep 15 Feb-16
Jan 16 Feb 16
Oct 15 Mar-16
Feb 16 Nov 15
Mar 16
Dec 15

0.0
1.0
2.0
3.0
4.0
5.0
6.0

0
5
10
15
20
25
Apr 11

0.0%
2.0%
4.0%
6.0%
8.0%
10.0%
12.0%
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
Apr 11

0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
May 11 May 11
Nov 12 Jun 11 Jun 11
Dec 12 Nov 12 Nov 12 Jul 11 Jul 11
Dec 12 Aug 11 Aug 11
Jan 13 Dec 12 Sep 11
Jan 13 Sep 11
Feb 13 Jan 13 Oct 11 Oct 11
Feb 13 Feb 13 Nov 11 Nov 11
Mar 13 Dec 11 Dec 11
Mar 13 Mar 13 Jan 12
Apr 13 Jan 12
Apr 13 Apr 13 Feb 12 Feb 12
May 13 May 13 May 13 Mar 12 Mar 12
Apr 12 Apr 12
Jun 13 Jun 13 Jun 13 May 12 May 12
Jul 13 Jul 13 Jul 13 Jun 12 Jun 12
Jul 12 Jul 12
Aug 13 Aug 13 Aug 13 Aug 12 Aug 12
Sep 13 Sep 13 Sep 13 Sep 12 Sep 12
Oct 13 Oct 12 Oct 12
Oct 13 Oct 13 Nov 12 Nov 12
Nov 13 Nov 13 Nov 13 Dec 12 Dec 12
Dec 13 Jan 13 Jan 13
Dec 13 Dec 13 Feb 13 Feb 13
Jan 14 Jan 14 Mar 13 Mar 13
Jan 14 Feb 14 Apr 13
Feb 14 Apr 13
Feb 14 Mar 14 May 13 May 13
Mar 14 Mar 14 Jun 13 Jun 13
Apr 14 Jul 13 Jul 13
Apr 14 May 14
Apr 14 Aug 13 Aug 13
Jun 14 May 14 Sep 13 Sep 13
May 14 Oct 13 Oct 13
Jul 14 Jun 14 Nov 13
Jun 14 Nov 13
Aug 14 Jul 14 Dec 13 Dec 13
Jul 14 Jan 14

Period
Sep 14 Aug 14 Jan 14
Aug 14 Feb 14 Feb 14
Compliments c-Chart

Oct 14 Sep 14 Mar 14 Mar 14


Sep 14 Oct 14 Apr 14 Apr 14
Nov 14 May 14
Oct 14 May 14

Vacancy Rates p-Chart


Dec 14 Nov 14 Jun 14 Jun 14

Staff Leaving Employment


Nov 14 Jan 15 Dec 14 Jul 14 Jul 14
Aug 14 Aug 14
Dec 14 Feb 15 Jan 15 Sep 14
Sep 14

per 1,000 Occupied Bed Days u-Chart


Jan 15 Mar 15 Feb 15 Oct 14 Oct 14
Nov 14

per 1,000 Occupied Bed Days u-Chart

Health and Safety Incidents Involving Staff


Feb 15 Apr 15 Mar 15 Nov 14 Dec 14
May 15 Dec 14 Jan 15
Mar 15 Apr 15 Jan 15
Jun 15 Feb 15
PALs Enquiries Per 1000 Occupied Bed Days u-Chart

Apr 15 May 15 Feb 15 Mar 15


Jul 15 Mar 15 Apr 15
May 15 Jun 15 Apr 15
Aug 15 May 15
Jun 15 Jul 15 May 15 Jun 15
Sep 15 Aug 15 Jun 15 Jul 15
Jul 15 Oct 15 Jul 15 Aug 15
Aug 15 Sep 15 Aug 15 Sep 15
Nov 15 Sep 15 Oct 15
Sep 15 Oct 15 Oct 15
Dec 15 Nov 15
Nov 15 Nov 15 Dec 15
Oct 15 Jan 16 Dec 15
Dec 15 Jan 16
Nov 15 Feb 16 Jan 16 Feb 16
Mar 16 Jan 16 Feb 16 Mar 16
Dec 15 Feb 16 Mar 16
Jan 16 Mar 16
Feb 16
Chapter 5 Organizing Indicators into a Strategic Dashboard

30%
40%
50%
60%
70%
80%
90%
100%

0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%

0.0
0.2
0.4
0.6
0.8
1.0
1.2

0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
May 12
Nov 12 Jun 12 Apr-15
Nov 12 Jul 12
Dec 12 Dec 12
Aug 12

FIGURE 5-7 East London NHS Foundation Trust staff experience dashboard
Jan 13 Jan 13
Sep 12
Feb 13 Feb 13 Oct 12 May-15
Mar 13 Mar 13 Nov 12
Apr 13 Apr 13 Dec 12
FIGURE 5-6 East London NHS Foundation Trust patient experience dashboard

May 13 May 13 Jan 13


Jun 13 Jun 13 Feb 13 Jun-15
Jul 13 Jul 13 Mar 13
Aug 13 Aug 13 Apr 13
Sep 13 Sep 13 May 13
Jul-15
(NHS) Foundation Trust’s Strategic Dashboard

Oct 13 Jun 13
Oct 13 Jul 13
Nov 13 Nov 13
Dec 13
Aug 13
Dec 13 Sep 13
Jan 14 Jan 14 Aug-15
Oct 13
Feb 14 Nov 13
Feb 14
Mar 14 Dec 13
Mar 14
Apr 14 Jan 14
May 14
Apr 14 Sep-15
May 14 Feb 14
Jun 14 Mar 14
Jul 14 Jun 14
Apr 14
Month

Aug 14 Jul 14 May 14


Sep 14 Aug 14 Jun 14 Oct-15
Sep 14
Friends and Family

Oct 14 Jul 14
Nov 14 Oct 14 Aug 14
Dec 14 Nov 14 Sep 14
Oct 14 Nov-15
Jan 15 Dec 14
Feb 15 Jan 15 Nov 14
Mar 15 Feb 15 Dec 14
Jan 15
per 1,000 Occupied Bed Days u-Chart

Apr 15
Complaints Closed Within 25 Days p-Chart

Mar 15 Feb 15 Dec-15


May 15 Apr 15
Staff Leaving Employment Within 12 months

Mar 15

Harm/Injury per 1,000 Occupied Bed Days u-Chart


Jun 15 May 15 Apr 15
Proportion of Extremely Likely Responses p-Chart

Jul 15 Health and Safety Incidents Involving Staff Resulting in Jun 15 May 15
CASE STUDY #1: East London National Health Service

Aug 15 Jun 15
Jan-16
Jul 15
Complaints still open, withdrawn and those with agreed extension excluded

Sep 15 Jul 15
Aug 15
Oct 15 Aug 15
Sep 15
Nov 15 Sep 15
Dec 15 Oct 15 Feb-16
Oct 15
Jan 16 Nov 15
Nov 15
Feb 16 Dec 15 Dec 15
Mar 16 Jan 16 Jan 16
Feb 16 Mar-16
Feb 16
Mar 16
(continued)
The Role of Benchmarking 155

▸▸ The Role of from average or aggregate performance.


Because it represents the best, a bench-
Benchmarking mark must refer to the performance of
only one organization. (p. 2)
Once you have defined a specific indicator,
collected data, and analyzed the variation in Too often I have seen organizations use an
the data you are ready to find out just how average or aggregated number as THE bench-
good your performance really is. (See Chapters mark. It not only flies in the face of what good
6 to 9 for discussions on variation and how to benchmarking is supposed to achieve but it
analyze it.) This can be accomplished by using also leads to confusion within the organization.
comparative reference data (e.g., a national or I believe there is a real need within the
regional norm) or by benchmarking. Unfor- healthcare industry to clarify these concepts.
tunately there seems to be considerable mis- Let us start our discussion by addressing the
understanding within the healthcare industry terms target and goal. In a general sense, targets
on the nature and intent of benchmarking. The are short-term markers of performance that are
simple message about benchmarking is that it usually designed to be achieved over the span
is not about numbers! It is about the search for of several months to a year. Goals, on the other
excellence. It is about the transformation of hand, are more long term in nature, usually in
the organization’s culture and the way it does the range of 1 to 2 years.3 A target or a goal can be
business. Benchmarking is about understanding based on a benchmark (the noun) if it is derived
the relationships between structures, processes, from an organization that is considered the “best
culture, and outcomes. Unfortunately, I have of the best.” Otherwise targets and goals should
seen too many healthcare professionals think be established against the current performance
that benchmarking is strictly about meeting and capability of the system and its related
a numeric target or goal (i.e., a benchmark). processes. Targets and goals can certainly be
Before the month is over, I bet that you will hear part of a benchmarking process. But if you stop
someone say, “Here is the benchmark we need at the targets or goals that may emerge from a
to hit by the end of the next quarter.” When benchmarking initiative and then get fixated on
you hear a statement like this you are basically them you will never achieve the potential that
hearing confusion between the concepts of benchmarking offers.
a benchmark (a noun) and benchmarking (a Benchmarking as a verb, which is my focus in
verb). In cases like this, the confusion goes even this section, is a way to identify and understand
deeper because it usually involves confusion best practices that enable organizations to realize
over the relationship of targets and goals to new levels of performance. It is a journey not a
benchmarks and the benchmarking process. destination designed to establish highly reliable
This confusion was highlighted back in 2003 in structures and processes, create a new culture
a newsletter published by the Mihalik Group that is focused on continuous improvement
(2003). In a brief article titled “When Is a Per- and excellence, and create the conditions that
formance Goal a Benchmarking?” they make enable a learning organization (Senge, 1990) to
the following conclusions: emerge. Confusion over these concepts leads an
organization to accept a number, either from
Occasionally there is some confusion an internal source or an external consultant, as
over the correct use of the term “bench- “THE benchmark.” This orientation typically
mark.” A benchmark is a measure of leads to a fairly singular focus on the numbers
best performance against which an (outcomes) without giving due consideration
organization’s performance is compared. to the interplay of the structures and processes
A benchmark, however, is never derived that produce the numbers. Although you will
156 Chapter 5 Organizing Indicators into a Strategic Dashboard

hear organizations claim that, “We are bench- understanding practices that lead to performance
marking” this statement usually means that they than selecting a number to achieve. He puts this
are hoping that they hit the benchmark metric way, “Benchmarking metrics are seen as a result
someone gave them but have not developed a of understanding best practices, not something
strategy for achieving or sustaining this ethereal that can be quantified first and understood
number. The result is usually confusion within later” (13). Frequently in health care, however,
the ranks and unrealistic expectations on the we do exactly what Camp cautions against. We
part of management and the board (especially look at current performance as a single number
if the organization has paid a large fee to an and then quickly pick a new target or goal that
external consulting group that produced the is a desired end state. In many instances, the
“benchmark numbers”). decision process goes something like this; “We
The most widely referenced work on are currently at 80% on completed histories and
­benchmarking is Robert Camp’s classic book physicals for outpatient surgery so next quarter
Benchmarking: The Search for Industry Best we expect it to be at 85%. No, maybe we should
­Practices that Lead to Superior Performance (1989). pick 90%? Oh what the heck, let’s use 100% as
Camp traces the history of benchmarking, clearly the target.”
defines what it is (and is not), outlines the steps A key question related to the selection of
in the benchmarking process, and then offers a any target or goal is, how were these numbers
series of examples of how successful companies decided upon in the first place? I am constantly
have engaged in benchmarking. In the preface amazed at how we decided on new levels of
of his book, Camp introduces the Japanese term performance. Most often we seem to set targets
dantotsu, which loosely translates as striving to be and goals as whole numbers divisible by 5. If
the “best of the best.” Camp (1989, p. 3) points out the performance last year was X, then X plus
that “we in America have no such word, perhaps 5%, 10%, or 15% should do for next year. We do
because we always assumed we were the best.” this with budget cuts as well but in this case it is
Essentially when an organization is clas- usually reductions of 5%, 10%, or 15%. The next
sified as a “benchmark” it is being recognized time you are involved with setting new levels of
for its superior performance against which all performance take a moment and listen to the
other performances can be judged. Although numbers that are bandied about. A majority of
organizations as benchmarks have been iden- the time they will be whole numbers divisible by
tified in other industries (e.g., the Ritz Carlton 5. If you want to offer a provocation in a man-
Hotels, Motorola, L.L. Bean, Ben & Jerry’s Ice agement meeting, however, instead of using the
Cream, or Toyota), there is little consensus on traditional approach of whole numbers divisible by
which organization is consistently THE best in 5, propose the use of unconventional increments
health care and the standard to follow. How an for targets or goals (e.g., 4.75% or 13.837%). You
organization becomes a “benchmark” is what will most likely get some strange looks when you
benchmarking is all about. Camp (1989, p. 10) offer numbers that are not whole and divisible by
provides the following formal definition of 5. I did this in a management meeting once when
benchmarking: “Benchmarking is the continuous there was a debate on the new targets for patient
process of measuring products, services, and satisfaction. The members of the team were ap-
practices against the toughest competitors or proaching the number as if it were an item in an
those companies recognized as industry leaders.” auction. The exchange goes something like this:
He continues with a more direct working
definition of the concept: “Benchmarking is the ■■ Person #1: “Let’s set the new target for the
search for industry best practices that lead to ‘Would you recommend this hospital to
superior performance” (1989, p. 12). Accord- your family and friends?’ question on the
ing to Camp, benchmarking is more about inpatient survey at 80% Strongly Agree.”
Notes 157

■■ Person #2: “No, that seems too low to me. targets and goals as well as a strategy for achieving
Let’s go with 85%.” new levels of performance.
■■ Person #3: “No you are both too low. Let’s Camp outlines five phases of benchmarking
set a stretch target of 90%.” and 10 related specific steps. The five phases
of benchmarking include: planning, analysis,
As you can imagine, this little bidding war integration, action and maturity (Camp, 1989,
continued until we had a target of, you guessed p. 17). The key to understanding these phases
it, 100% of the patients marking the Strongly and their related steps is that they have to be
Agree response to the question. When the 100% seen as an ongoing process, not as something
bid was hit I proposed that an alternative would that the organization does once and then moves
be to have a target of 73.4%. There was actually on to the next popular management solution.
silence for a moment then one of the members The other important aspect of applying Camp’s
of the senior management team asked if I was benchmarking framework is the recognition that
joking. Another one said, “What are you, some although benchmarking is a structured process
sort of wise guy?” At least I got their attention it is first and foremost a search for knowledge
and it got the management team thinking about and learning. Organizations looking for a silver
why they pick the numbers they do and if they bullet or a desire to hit a numerical target or goal
were achievable in the proposed time frame. will quickly abandon benchmarking as being too
Note that the average percentage of respondents detailed and involved. They will find it much
selecting Strongly Agree for the previous 12 easier to merely issue a memo that says, “Next
months was only 27%. I then proposed that an year’s performance targets will be 5% higher
alternative way to approach setting the target than this year’s! Good luck! Be productive!
would be to understand current performance Work hard!”
by placing the results from the past year on
a Shewhart (control) chart, understanding
the variation that lives in the data, evaluating Notes
the spread of the control limits and process 1. Although Kaplan and Norton continue
capability, and finally establishing new targets to make major contributions to strategic
through objective statistical means. Whereas planning and performance measurement,
some members of the team thought this violated the majority of their work has been di-
some sacred rule, such as the whole numbers rected toward for-profit companies with
divisible by 5 rule, others thought it sounded the objective of returning profits to stock-
like a reasonable approach. holders. Their ideas are certainly relevant
The basic problem with how many teams to healthcare organizations (especially
and organizations approach setting targets or their notion about creating a balanced
goals is that little thought is generally given set of measures) but the term “scorecard”
to how the new level of performance will be is so similar to the concept of the report
achieved. In most instances, this leads to what card that it often leads to confusion about
Deming referred to as the creation of arbitrary the meaning and intent of the scorecard
numeric targets and goals, which demoralize the concept. Is it designed for judgment or
workers and mislead the organization. On this improvement? Nelson et al. (1995) classify
point, Deming wrote, “Goals are necessary for the balanced scorecard as a “hybrid” of
you and for me, but numerical goals set for other report card and instrument panel thinking.
people, without a road map to reach the goal, Personally, I believe the concept of the
have effects opposite to the effects sought” (1992, dashboard or instrument panel is much
p. 69). Benchmarking is one of the best ways to more appropriate for healthcare organiza-
develop a roadmap for developing appropriate tions. As stated earlier in this chapter, there
158 Chapter 5 Organizing Indicators into a Strategic Dashboard

are plenty of groups that want to develop external reviewers and the providers.
scorecards or report cards on healthcare Targets and goals need to be realistic not
providers in order to pass judgment on demoralizing or arbitrary.
our performance. It is my belief, therefore,
that the provider community needs a more
robust set of terms and tools to guide our
References
Bader, B. “CQI Progress Reports: The Dashboard Approach
improvement efforts. At the IHI, for exam- Provides a Better Way to Keep Board Informed about
ple, we have made a conscious decision to Quality.” Healthcare Executive (September-October
use the term dashboard as the organizing 1993): 8–11.
rubric for our various indicator sets. It is Caldwell, C. Mentoring Strategic Change in Health Care.
Milwaukee, WI: ASQ Press, 1995.
much more dynamic in nature, supports
Camp, R. Benchmarking: The Search for Industry Best Prac-
our commitment to QI, and helps man- tices that Lead to Superior Performance. Milwaukee,
agement and staff feel less threatened by WI: ASQ Press, 1989.
performance measurement. Carey, R., and R. Lloyd. Measuring Quality Improvement
2. I want to express my sincere gratitude and in Healthcare: A Guide to Statistical Process Control.
Milwaukee, WI: ASQ Press, 2001.
appreciation to Dr. Kevin Cleary, medical
Deming, E. Out of the Crisis. Cambridge, MA: Massachusetts
director, and Dr. Amar Shah, associate Institute of Technology, 1992.
medical director, at the East London NHS Graham, J., and M. Cleary, eds. Practical Tools for Con-
Foundation Trust for granting permission tinuous Improvement. Miamisburg, OH: PQ Systems,
to tell their story and share their strategic Inc., 2000.
Health Care Advisory Board. CEO Dashboards: Performance
dashboard data.
Metrics for the New Health Care Economy. Washington,
3. It is interesting to note that 10 to 15 years DC: Advisory Board Company, 2000.
ago both targets and goals in healthcare Kaplan, R., and D. Norton. “The Balanced Scorecard-Measures
settings were of longer durations. Targets that Drive Performance” Harvard Business Review
back then might be set for 6 months to (January-February 1992): 71–79.
Kaplan, R., and D. Norton. “Putting the Balanced Scorecard
a couple years. Goals were set for 3–5
to Work” Harvard Business Review (September-October
years. My theory is that this has happened 1993): 134–147.
primarily as a result of external groups Kaplan, R., and D. Norton. “Using the Balanced Scorecard
placing greater demands on healthcare as a Strategic Management System” Harvard Business
providers to produce better outcomes Review (January-February 1996): 75–85.
Mihalik Group, LLC. “When Is a Performance Goal a
in shorter periods of time. Consumers
Benchmark?” The Mihalik Globe 7, no. 2 (Summer 2003).
as well as political leaders want to see Nelson, E., P. Batalden, S. Plume, N. Mihevc, and W. Swartz.
healthcare change more quickly. So, instead “Report Cards or Instrument Panels: Who Needs What?”
of addressing the complexity of changing Journal on Quality Improvement 21, no. 4 (1995): 155–166.
healthcare systems and process many Nugent, W., W. Schults, S. Plume, P. Batalden, and E. Nelson.
“Designing an Instrument Panel to Monitor and Improve
external groups have started to reduce the
Coronary Artery Bypass Grafting.” Journal of Clinical
expected time periods to achieve targets Outcomes Management 1 (1994): 57–64.
and goals. Unfortunately the complexity Plsek, P., and A. Onnias. Quality Improvement Tools. Wilton,
of the issues facing healthcare provid- CT: Juran Institute, 1989.
ers has not reduced. In many ways, the Scherkenbach, W. The Deming Route to Quality and Produc-
tivity. Washington, DC: Ceep Press, 1990.
complexity of health care has increased.
Senge, P. The Fifth Discipline: The Art and Practice of the Learn-
All that has been achieved by reducing ing Organization. New York: Doubleday Currency, 1990.
the mandated time frames for achieving Wheeler, D. Understanding Variation. Knoxville, TN: SPC
targets and goals is to frustrate both the Press, 1993.
CHAPTER 6
Tapping the Knowledge
that Hides in Data
I
n the previous chapters, we focused on the This chapter addresses the first two topics.
milestones in the quality measurement Chapter 7 focuses on understanding variation
journey (QMJ) that guided the identification conceptually and making the appropriate man-
of indicators, the operational definitions for agement decision when presented with common
these indicators, and the data collection plan(s) and special causes of variation. Chapter 8 provides
needed to tap the knowledge that hides in the the details on run chart construction, analysis,
data. Now we need to address the question of, and interpretation, and Chapter 9 does the same
“What do you do with the data once I have it?” for Shewhart charts. Chapter 10 (Case Studies)
The answer is really quite simple—if you are and Chapter 11 (Connecting the Dots) provide
genuinely interested in quality improvement (QI), guidance on how to link your measurement
then you need to understand the variation that efforts to your improvement strategies.
lives within your the data. This is best achieved
by building a knowledge and skill base around
the following topics:
■■ Data versus information
▸▸ Data Versus
■■ Static versus dynamic approaches to data
analysis
Information
■■ Understanding variation conceptually Although we live in an information age, it is
(common and special causes of variation) interesting to observe how many people really do
■■ Making the appropriate management de- not clearly understand the difference between data
cisions when presented with common and and information. I have been in many meetings
special causes of variation where someone says, “We need more data,” and
■■ Statistical process control (SPC) methods I am thinking, “No, you have enough data; what
(run charts and Shewhart charts) you really need is more information.” Similarly,
■■ Linking your measurement efforts to I have heard people say, “We need more detailed
improvement information.” What they really should have said,

© Michal Steflovic/Shutterstock

159
160 Chapter 6 Tapping the Knowledge that Hides in Data

however, is “We need more detailed data.” This Data are the bits and bytes that we collect
issue was central to a discussion I remember in an effort to measure the performance of a
vividly on National Public Radio (NPR) back in process. Data are not information. Information,
2003. The topic then was the creation of the new on the other hand, can only be produced by
Cabinet-level Department of Homeland Security. submitting data to an intelligent inquiry process
The NPR announcer was interviewing various that is grounded in deductive (general to the
representatives and senators on the merits of such specific) and inductive (specific to the general)
a department. Quickly the discussion turned to thinking. This classic approach to data-based
the ability of the United States to obtain useful inquiry is referred to as the scientific method
“data and information” on terrorists groups. One (Lastrucci, 1967) and forms the foundation for
congressman stated that he thought we needed appreciative inquiry and improvement.
more data on individuals thought to be threats FIGURE 6-1 depicts the six key steps needed
to this country. He proposed keeping all of these to gather data and turn it into useful information
data in a massive database the new department for decision making. The details related to each
would manage. According to one of the senators step are highlighted here:
interviewed, however, he thought that the problem ■■ Step 1: Theoretical Concepts. All scientific
was not having enough information on various inquiry begins with theoretical concepts
terrorists groups. Finally, an official from one (ideas and hypotheses). These concepts
of the bureaus that would fall under the new are either derived from abstract thinking
department said that the central problem was about the world and the way it works (e.g.,
that they did not have enough people who knew theoretical physics) or they are generated
how to turn the massive amounts of data they from data that enable the researchers to
already have into useful information. The lack propose theories based on the observed
of consensus on this data/information issue was patterns in the accumulated data. Some-
readily apparent even in the summary of the NPR times the theories people pose are merely
reporter. He kept using the two words as if they variations on existing theories. For example,
were synonyms indicating to me that he really the basic theories about germs and their
did not understand the difference between the impacts on human populations have been
two concepts even though this was the primary around thousands of years, yet each year
theme of the report. Recent stories and examples there are new and subtle modifications to
of this data/information confusion can be found fundamental theories about germs and
almost daily by listening to the evening news or their impacts on the human body (Block,
reading stories in the daily newspaper. 2001).1 Existing theories become modified
The distinction between data and information or refined as additional research is per-
is more than mere semantics. The two concepts formed, and data are placed into context of
are quite different from each other. I have always the theoretical principles. At other times,
found Austin’s (1983, p. 24) distinction between however, the theoretical concepts present
data and information to be extremely useful: new thinking, which then creates the
Data refers to the raw facts and figures foundation for debate. The human genome
which are collected as part of the normal project, for example, has spawned new
functioning of the hospital. Informa- theories about human genetics that were
tion, on the other hand, is defined as not considered 5 or 10 years ago. Whether
data, which have been processed and a theory has been around for a while or is
analyzed in a formal, intelligent way, relatively new, the real test of any theory or
so that the results are directly useful hypothesis lies with the empirical evidence
to those involved in the operation and that can be assembled to test the validity
management of the hospital. and reliability of the idea.
Data Versus Information 161

Step 1
Deductive
Theoretical
Concepts
Information
Operational
Step 5 for Decision Step 2
Definitions
Making
Theory and
Prediction
Measurement
Step 6 Interpretation and Data Step 3
Collection
Data
Analysis and
Output
Inductive
Step 4

FIGURE 6-1 Turing data into information

■■ Step 2: Select and Define Indicators. The I believe the reasons we perform poorly at
next step in turning data into information is this step include:
to select a key set of indicators that purport • The lack of formal training in the
to measure the theoretical concepts under ­selection and specification of indicators
investigation and then reach consensus on • Time constraints (i.e., we have to get
an operational definition of each indicator.2 the data now so why waste time talking
Several key points are worth noting. First, about what the indicators mean)
I believe that this is one of the major chal- • A fatal assumption that “Everyone knows
lenges healthcare professionals face when what a (fill in your favorite indicator
attempting to measure what they do. Second, name) is, so why do we need to discuss
not devoting sufficient time to clarifying it further and waste time?”
operational definitions usually leads to If we do nothing more than improve our
(1) confusion over what the numbers really efforts in this area, we will be making a
mean, (2) challenges to the results, and/or significant improvement in our QMJ.
(3) a desire to “kill the messenger” because ■■ Step 3: Measurement and Data Collection.
the data do not fit certain individuals’ The basic principles of measurement and
views of reality (e.g., this is seen frequently data collection were reviewed in Chapter
when a politician fires his public opinion 4. Issues such as stratification, sampling,
pollster because the results of the poll do the role of pilot tests, the duration and
not match the politician’s view of herself). frequency of data collection, respondent
Finally, improving operational definitions is and data collector bias, and data collection
not that difficult. It is not a statistical issue methods are all critical to the success of this
but rather one of logic and consensus. The step. These issues are often overlooked when
healthcare industry has some of the best conducting both quantitative and quali-
educated professionals in this country. tative studies.3 I have seen teams develop
Reaching consensus on operational defini- well-defined operational definitions for their
tions, therefore, should be relatively easy. indicators and then come to a screeching
162 Chapter 6 Tapping the Knowledge that Hides in Data

halt because they did not think through all ■■ Step 5: Interpretation of the Results. This is
the details related to data collection. Data what the previous four steps were designed
do not magically collect themselves and to achieve. To use the analogy of a fine
then conveniently populate a database on dinner, the first three steps constitute the
their own. Physicians frequently assume appetizer, soup, and salad. Interpretation is
that the nurses will collect the data. The the main course. The first three steps relied
nurses hope the unit secretaries or clerks on a combination of machine power and
can collect it. The unit secretary hopes that brainpower. Interpretation relies entirely
students will be brought on board to collect on the machine that sits on top of your
the data. Finally, when there are no obvious shoulders. Interpretation basically seeks the
candidates available to complete this task, answer to one simple question—why? This
someone will offer the following suggestion, is the point at which the data and the theory
“Let’s have the hospital volunteers collect are compared to each other. Do the analytic
the data.” Teams frequently underestimate results support the theories (hypotheses)
the importance of actually designating a you initially developed? If not, are the data
properly prepared person to collect the data. correct and the theory wrong or vice versa?
Do not take this step too lightly. This is also the point at which previous
■■ Step 4: Data Analysis and Output. Failure research and data play key roles. Are the
to develop a well-thought-out data analysis results consistent with what others have
plan will lead a team to sit on their data and found? Are they consistent with what was
hope that it will eventually hatch something found when similar studies were conducted?
they can use. What often happens, however, Interpretation discussions should involve
is that when the data are not subjected issues of reliability and validity (Blalock,
to analysis, they become antiquated and 1971; Campbell and Stanley, 1963; Forcese
eventually obsolete. A well-designed and and Richer, 1970; Selltiz et al., 1959). This
executed analysis plan allows a team to move is the point at which all your hard work
to the next important step—­interpretation. should pay off because it sets the stage for
You should start to think about this step the final piece of the puzzle.
early on in the QMJ and make decisions ■■ Step 6: Information for Decision Making.
about what will be done to analyze the If interpretation is the main course, then
data and who will be responsible for doing this step represents dessert. This is what
it. For example, you should have a dia- you have been working toward (which
logue about whether you have access to a interestingly enough is also how many
statistical package to tabulate and analyze people regard dessert). This final step is
data and to produce graphical displays of crucial, and unfortunately many improve-
the results. Also, determine which type of ment initiatives never reach this step. They
statistical analysis will be conducted. Will often end up with considerable data but no
you merely calculate the average, minimums information for decision making. I have
and maximums, and standard deviations heard this referred to as being a DRIP (i.e.,
for the data (static approach), or will you being data rich and information poor!). The
analyze the variation in the data using run key to success in this final step is building a
or control charts (dynamic approach)? If you dialogue about the data and what decisions
are focusing on data for improvement (not you will make with the results. Questions
judgment or research) then you will need such as those identified in Step 5 provide
to use the dynamic approach. But who on the basis for a healthy dialogue designed
the team has knowledge of SPC methods? to build information. The dialogue should
Static Versus Dynamic Approaches to Data Analysis 163

center on the data collected, the merit of the you could come up with for improving the hand
proposed theories and concepts, the variation hygiene process, what theory do you have that a
found in the data, and the interpretation of poster will achieve this objective?” Then I’d ask
what the data mean. This dialogue should the follow-up question; “What is your prediction
prepare you to then develop action plans about the success of putting up posters? Do you
for improvement. The Plan–Do–Study–Act predict that the poster campaign will increase
(PDSA) cycle (Deming, 1992; Gaucher the percentage of staff who properly wash their
and Coffey, 1993; Langley Langley, Nolan, hands?” Now we can have a dialogue.
Nolan, Norman, & Provost, 1996) as well The second thing to note about Figure 6-1
as Deming’s notion of profound knowledge is that it is divided into two main sections. The
(Deming, 1992; Schultz, 1994, p. 18–27) all lower right portion of the figure is essentially an
identify taking action (decision making) as inductive learning process. Inductive reasoning
the primary goal of QI. The previous four is the process of coming up with a conclusion
steps are designed to bring you to this point. based on a series of events or data points that
QI has been labeled by some as taking too yield similar findings or repeat a particular
long to demonstrate an improvement. My pattern. Not to oversimplify this approach but it
experience has convinced me that when can be thought of as moving from the specific to
this occurs it is usually due to the team’s the general (i.e., deriving general principles from
lack of knowledge of how to turn data specific observed instances, facts, or data). The
into information or due to a management other side of Figure 6-1 is deductive in nature.
structure that merely wanted to have data Deductive learning arises from testing a theory
to confirm their view of reality. Just as we against a prediction (Langley et al., 1996, p. 82).
would not proceed with knee replacement In a general sense, deductive learning goes from
surgery without a properly skilled and trained general suppositions, theories, or conclusions and
surgical team, we should not proceed with seeks specific observations or data to confirm the
QI initiatives without a properly skilled theory or initial premise. You can decide how much
and trained measurement team. Making you care to dive into the inductive-deductive area.
appropriate decisions in the 21st century My point is that the scientific method (Lastrucci,
requires data and information. The steps are 1967), which is essentially what Figure 6-1 is
clear but sometimes the path is obstructed depicting, is an inductive and deductive process
with roadblocks. of thinking and learning. It not only provides the
Two final comments about Figure 6-1. First, foundation for QI and the science of improve-
notice that the diagram circles around a center ment (Perla, Provost, & Perry, 2013) but also is
block labeled theory and prediction. This block essentially how the human brain approaches
lies at the very core of improvement science. problem solving situations. .
When I am working with a team I am constantly
asking them two simple questions: (1) What is
your theory? and (2) What is your prediction?
When a team proposes an improvement idea ▸▸ Static Versus Dynamic
these two questions should be posed. For ex-
ample, if you think putting up posters around
Approaches to Data
the hospital that promote hand hygiene my first
question would be: “What is your theory as to
Analysis
why you think a poster telling people it is im- A basic data challenge that we face in health care
portant to properly wash their hands will make is that we have historically relied on aggregated
a difference? Of all the ideas for improvement data and summary statistics to understand the
164 Chapter 6 Tapping the Knowledge that Hides in Data

quality of what we do on a day-to day-basis.4 Such compare the average for this year (e.g., length of
an approach leads the researcher to a static way of stay for a particular diagnosis) with the average
thinking about data analysis and interpretation. from last year. Although this may have some
For decades, we have been using methods and utility for making determinations on incentive
tools that are best suited for efficacy research, or bonus payout programs it has very little to
judgment, or accountability and trying to use do with quality. Quality is determined by the
these tools to answer questions about quality and moment-to-moment variation in a process. It
safety. From my perspective, too many healthcare is determined by the workers who deliver care
professionals are using the wrong statistical tools at the bedside, at the registration desk, and at
and thinking to analyze and interpret the data the blood draw station. Administrators and
they have collected. managers who rely on aggregated data to make
Static approaches to data analysis focus on decisions, therefore, are not really focused on
using aggregated data and tend to compare the quality. They are focused more on making judg-
most recent data point with a previous data point ments about the difference between two numbers
(usually the last month or quarter). If the current rather than on how the numbers perform over
data point is regarded as being better than the time. Because variation exists in all that we do,
last data point, then all is right with the world. chances are quite high that two data points will
If, on the other hand, the current data point is differ. To paraphrase Deming, if you have two
determined to be worse than the previous result, data points, there is a very high probability that
then management concludes that all is not right one will be different from the other.5
with the world. If both data points are the same If aggregated data and summary of statis-
number, what do you conclude? tics are your primary frames of reference, then
The essential point to remember is that two data points are all you will need to make a
aggregated data, presented in tabular formats conclusion about performance. If, for example,
or with summary statistics, will not help you the result at Time 2 is better than the result at
measure the impact of process improvement or Time 1 then management will probably conclude
redesign efforts. It is just that simple. Deming was that things must be getting better. What people
very clear on this point. He wrote (1992, p. 312): fail to recognize when they do this, however, is
that two numbers do not determine quality, a
Students are not warned in classes nor
trend, or a “significant difference.” Again, if you
in the books that for analytic purposes
have two numbers it is very likely that one will
(such as to improve a process – author’s
be different from the other. So, when healthcare
emphasis), distributions and calculations
professionals look at the most recent data point
of mean, mode, standard deviation,
and determine success (or failure) in light of
chi-square, t-test, etc. serve no useful
the data point’s relative position to the previous
purpose for improvement of a process
data point, they will celebrate to excess, punish
unless the data were produced in a state
themselves beyond necessity, or take comfort in
of statistical control. Aggregated data,
the fact the numbers have not changed. Such an
therefore, can only lead to judgment
approach to data leads organizations to reward
not to improvement.
and punish workers for things over which they
The healthcare industry has a long and rich may have no control. It leads an organization
history of using data for judgment. For example, to suffer from what I call the “good dog/bad
it is not uncommon for a hospital or a health dog” syndrome. Caught up in this syndrome
system to compare the current month’s results an organization will establish incentives and
on a particular indicator to those for the same rewards that are not unlike Pavlov’s approach to
month a year ago. There is also a tendency to training his dogs. If the numbers are good and
Static Versus Dynamic Approaches to Data Analysis 165

High Now look at the lower portion of Figure 6-2.


In this chart, we observe the CABG surgery
mortality percentages as they occurred over the
12 months of each year. What do we observe? We
Which time period is better? note that last year the CABG surgery mortality
Low was on a constantly increasing trend upward
Last year This year and that this year reveals that the percentage of
mortality has been on a downward trend. How
High
is it that you can have two averages that are the
essentially the same and two monthly patterns
that are fundamentally different? The answer is
Now which time period simple. The top part of Figure 6-2 presents data
looks better? in a static fashion, whereas the data shown in the
Low
lower part of the figure are depicted in a dynamic
Last year This year
fashion. Because quality is about efficiency and
FIGURE 6-2 A comparison of CABG surgery effectiveness, we need to look at data in dynamic
mortality percentages rather than static fashion. This means, therefore,
that we need to plot data over time rather than
where I want to see them, I’ll reward you with a aggregate the data into summary statistics.
bonus or an incentive (good dog, you salivated FIGURE 6-3 provides another example of why
on command, here have a biscuit). On the other static approaches to variation should not be used
hand, if the numbers are not good (by whatever for QI. Imagine that Figure 6-3 represents two
criteria I use to judge good), then you do not parts of the turnaround time (TAT) process for
receive the reward (bad dog). This syndrome can a particular lab test. Process A (the top chart)
drive individuals crazy and lead organizations is the TAT from when the physician writes the
down a path to ruin. It is a complex syndrome lab order until the specimen is received in the
that encompasses an organization’s views on lab and logged into the computer. Process B, on
intrinsic and extrinsic motivation (Deming, the other hand, is the in-lab processing time. If
1992; Kohn, 1986, 1993), incentive programs, you showed only the aggregated distributions,
performance reviews, compensation practices, represented by the bell-shaped curves on the far
and data analysis.6 right side of the graphic, you would conclude
FIGURE 6-2 demonstrates the major differ- that the two processes are exactly the same. Both
ence between static and dynamic displays of curves have the same width and height and are
data. Imagine that your hospital is interested in centered at the same point. In fact, these two
gaining a better understanding of its coronary distributions could have the same mean, me-
artery bypass graft (CABG) surgery outcomes. dian, mode, and standard deviation. Applying
Historically, you have evaluated your performance aggregated or static thinking to these data would
in this area by comparing the annual percent lead the researcher to conclude that Process A
mortality. The top portion of Figure 6-2 depicts and Process B are the same.
how you have typically looked at these data. Note If we look at these data from a dynamic point
that for the last 2 years you have had essentially of view, however, we would arrange the lab test
the same percent mortality. So, what do you in chronological order and observe the TATs as
conclude from this comparison? The answer is they laid themselves out over time. These data
easy—you are the same this year as you were are shown in the main body of the chart as the
last year. The overall average leads you to the solid line that fluctuates up and down around
conclusion that nothing has changed. the centerline (average). When we look at the
166 Chapter 6 Tapping the Knowledge that Hides in Data

High

Process A

Average

Low
Static displays
of data that
have the same
Dynamic displays of data that show mean and
different underlying distributions standard
deviation
High

Average

Process B

Low

FIGURE 6-3 Static versus dynamic data displays

data in this manner we see that Process A has showing that TAT for each individual lab tests
shifted downward over time whereas Process present two very different patterns? The answer
B has been migrating gradually upward. The is rather simple. The aggregated distributions
dashed lines imposed on each chart show these (the bell-shaped curves on the far right of the
directional patterns. charts) are static displays of data that merely
How is it that the two summary curves on summarize where the data are centered (the
the far right side of the graphic are exactly the average) and the spread of the data (the spread
same yet the data in the main body of the graphic of the data would typically be reported as the
Static Versus Dynamic Approaches to Data Analysis 167

minimum and maximum values, the range, and/ ■■ Evaluate the impact of process improvement/
or the standard deviation). The main body of each redesign efforts
chart, however, represents dynamic displays of Although these issues might not be critical
data that essentially shows how process A and B to researchers conducting efficacy studies, they
vary over time. The static displays, represented by are essential information to those interested
the bell-shaped curves, have basically aggregated in efficiency and effectiveness of a process.
time out of each process.
Another example will help to further highlight
A medical example will help to further clarify this distinction. Suppose you were interested
these distinctions. Using dynamic displays of in calculating the average height of group of
data are equivalent to hooking patients up to people. You measure everyone as they come into
telemetry monitors and tracking their vital signs the room and determine that the average height
moment by moment and minute by minute. is 5 feet 11 inches tall. You then have everyone
Consider, however, what patient monitoring stand up and discover that half the people are
would look like from a static point of view. First, professional basketball players and the other half
we add up all of the patient’s heartbeats during are first-grade students. We quickly conclude
his/her length of stay. Next we would totally that there are no people in the room who are 5
ignore the telemetry strips and calculate the feet 11 inches tall. What we have is a distribution
average number of heartbeats and the standard that has two extremes with an average that does
deviation for the patient. Finally, we would want not match any of the individuals in the room.
to know if the average number of heartbeats for The average height is a mathematically correct
the current patient was statistically different, statistic but it does not provide a clear picture
using a p-value of 0.05 to determine significance, of the underlying nature of the distribution that
from the average heartbeats of a patient who was produced the average. This is the basic problem
in the same bed last month or last quarter. This of relying on aggregated data.
aggregate or static approach to patient care might The up and down fluctuation in data
save the nurses some charting time and could may be due to random variation, which is an
be seen as a way to save money, but it would be inherent part of any process. It may be due
irresponsible. Medical practice is more concerned to periodic special causes or to a true shift in
with dynamic than static displays of data. Yet, process performance. It may be up one month
in healthcare management the determination and down the next. Unfortunately, organiza-
of quality has often been based on using static tions spend a considerable amount of energy
displays of data to make conclusions. rewarding (and punishing) employees because
The objective in QI research, therefore, is the numbers are up then down. Quality is
to understand how the data present themselves not about judging whether one data point is
over time. This is how effectiveness and efficiency different from another. It is about analyzing
are determined and how QIs are made. In short, data patterns as they occur over time and then
the use of aggregated data will not allow the determining whether an intervention has made
researcher to: a difference. Dynamic not static approaches to
■■ Understand the variation that lives within data provide the road to knowledge about the
the data quality of healthcare outcomes, processes, and
■■ Understand the underlying distribution services. Healthcare organizations can enhance
that produced the observed average (i.e., their QI efforts dramatically if they do nothing
you can obtain the exact same average from more than make a deliberate effort to increase
two entirely different distributions as shown their knowledge and use of dynamic displays
in Figure 6-3) and analysis of data.
168 Chapter 6 Tapping the Knowledge that Hides in Data

CASE STUDY #1: The Monday Morning Dilemma


It’s Friday afternoon, and you have already started thinking about the relaxing weekend that lies
ahead. Then the phone rings. Within seconds of picking it up, you realize that this is the call you had
anticipated but wished would not come on a Friday afternoon. It’s your boss reminding you about the
management meeting on Monday. Typically, this would not be a major event. But this time, for you, it
will be. It seems that the numbers for your facility are “below expectations” for the past month, and you
have to explain why your facility is not meeting its targets. You thank your boss for the reminder, and
as you hang up, you remember what happened to your friend Amelia when she had to go through
this ordeal the previous month. At that time, your numbers looked fine. In retrospect, all you can really
remember about that day was how pale Amelia looked as she tried to explain why her numbers were
below average. You also remember thinking how this whole affair reminded you of an inquisition.
Now as you sit in a numb stupor looking out your window, you realize that your time has come.
Suddenly your assistant sticks his head in your office and brings you back to reality by saying, “I’m out
of here, have a nice weekend.” “Oh sure,” you think, “I’ll have a nice weekend. I’ll spend all my time trying
to develop a list of reasons why my numbers are not acceptable. I’ll spend all weekend worrying about
the inquisition on Monday!”
This is obviously a situation you do not want to be in. Yet, stories like this are legendary in
organizations that do not understand variation. So, what would you do? You basically have two
choices. First, you could develop a list of reasons why your numbers do not live up to expectations. This
usually consists of one or more of the following tactics: (1) developing a series of complex sentences
that try to divert attention from the numbers, (2) pointing fingers at other factors (some real and some
imaginary), (3) blaming individuals (who are usually not attending the meeting) for poor work habits
and low motivation, and (4) throwing yourself on the mercy of the inquisition court.
The second, and more preferable choice, is to not waste your time trying to explain why this
month’s number is different from last month’s number. The reason they are down this month is that
they were up last month. Variation occurs in all that we do. Therefore, you will drive yourself absolutely
crazy and mislead those you are presenting to if you try to explain (rationalize) why the two data points
are different. If the performance of the process over time is unacceptable, then change the process.
Focusing on aggregated numbers without understanding how they vary over time, is a futile exercise.
This type of thinking typically fosters competition between units and departments, builds barriers
between individuals, and, most important, undermines the delivery of quality care. It is reflective of
what Wheeler (1993, p. vi) refers to as “numerical illiteracy.” According to Wheeler:
Numerical illiteracy is not a failure with arithmetic, but it is instead a failure to know how to
use the basic tools of arithmetic to understand data. Numerical illiteracy is not addressed by
traditional courses in primary or secondary schools, nor is it addressed by advanced courses
in mathematics. This is why even highly educated individuals can be numerically illiterate. (vi)
Fortunately there is a cure for numerical illiteracy. It involves the following remedies:
■■ Understanding variation conceptually
■■ Understanding variation statistically
■■ Linking your measurement to improvement
If you inoculate yourself with these simple remedies (as opposed to spending your time trying to
justify/rationalize why this month’s number is lower than last month’s number) you will be in a much
better position to really explain what is occurring with the numbers on Monday morning. You will
also be able to enjoy your weekend! The details related to these three remedies are presented in the
remaining chapters.
References 169

Notes The first consists of measures of central


tendency, the mean, the median, and the
1. There are documented accounts in the mode. The second category consists of
Bible, for example, of directives to be measures of dispersion, which usually
clean in body as well as in the prepara- include the minimum value, the maximum
tion and storage of food. Disinfectants value, the range (the difference between
are referenced by Aristotle, Homer, and the maximum and the minimum values),
Hindu medical teachings. The “modern” and the standard deviation. Any basic
era in infection control can be traced to statistics book will provide the details on
the work of Swiss physician Paracelsus how to calculate these measures.
(1493–1541) and most notably to Girol- 5. This statement was taken from the author’s
amo Fracastoro (1478–1553) who wrote a personal notes while attending a Deming
three-volume set on the nature of infection 4-day seminar (Quality, Productivity, and
and its transference from person to per- Competitive Position) in Indianapolis,
son. Block (2001) provides a wonderful August 11–14, 1992.
review of the key steps in the development 6. In this chapter, I do not intend to elaborate
of this branch of medicine. Not only is on the issues of intrinsic and extrinsic
this account extremely interesting, but it motivation. I do believe that if an orga-
demonstrates the fundamental connection nization does not have a healthy debate
between theory and experience. on how it intends to approach motivation
2. I say, “purport to measure” because all and rewards, it will be haunted by this
measurement is subject to bias and error. indecision and its culture will suffer. The
Deming claimed that there is no such thing dominant writer on this topic is Kohn
as a “fact.” He wrote (1992, p. 292): “There (1986, 1993).
can be no operational definition of the true
value of anything. An observed numerical
value of anything depends on the definitions
and operations used. The definitions and
References
Austin, C. Information Systems for Hospital Administration.
operations will be constructed differently by
Chicago: Health Administration Press, 1983.
different experts in the subject matter.” (See Benneyan, J., R. Lloyd, and P. Plsek. “Statistical Process
Chapter 4 of this book for a more detailed Control as a Tool for Research and Health Care
discussion of operational definitions.) It ­Improvement.” Quality and Safety in Healthcare 12,
is reasonable to conclude, therefore, that no. 6 (2003): 458–464.
Blalock, H., ed. Causal Models in the Social Sciences. Chicago,
all measurement is merely proxy for what
IL: Aldine Publishing Company, 1971.
it intends to measure. Block, S. Disinfection, Sterilization, and Preservation. ­Philadelphia,
3. Although this text deals primarily with PA: Lippincott Williams & Wilkins, 2001.
topics related to conducting quantitative Campbell, D., and J. Stanley. Experimental and Quasi-Experimental
studies, the field of qualitative analysis Designs for Research. Boston, MA: Houghton Mifflin
Company, 1963.
should not be overlooked. Qualitative
Deming, W. E. Out of the Crisis. Cambridge, MA: M ­ assachusetts
methods should be an integral part of Institute of Technology, 1992.
any QI initiative. This can include vide- Forcese, D., and S. Richer, eds. Stages of Social Research:
ography and a variety of methods used in Contemporary Perspectives. Englewood Cliffs, NJ:
social, psychological, and anthropological Prentice-Hall, Inc., 1970.
Gaucher, E., and R. Coffey. Total Quality in Healthcare. San
research efforts. Methods discussed in
Francisco, CA: Jossey-Bass, 1993.
Chapter 3 can also be used. Kohn, A. No Contest. Boston, MA: Houghton Mifflin, 1986.
4. The traditional summary statistics are Kohn, A. Punished by Rewards. Boston, MA: Houghton
basically divided into two categories. Mifflin, 1993.
170 Chapter 6 Tapping the Knowledge that Hides in Data

Langley, G., K. Nolan, T. Nolan, C. Norman, and L. Provost. The Schultz, L. Profiles in Quality. New York: Quality Resources,
Improvement Guide. San Francisco, CA: Jossey-Bass, 1996. 1994.
Lastrucci, C. The Scientific Approach: Basic Principles of Shewhart, W. Statistical Method from the Viewpoint of
the Scientific Method. Cambridge, MA: Schenkman Quality. Washington, DC: Graduate School, Depart-
Publishing, 1967. ment of Agriculture, 1939. (Reprinted by Dover, New
Namboodiri, N., L. Carter, and H. Blalock. Applied Multi- York, 1986).
variate Analysis and Experimental Designs. New York: Van de Geer, J. Introduction to Multivariate Analysis for
McGraw-Hill, 1975. the Social Sciences. San Francisco: W. H. Freeman and
Perla, R., L. Provost, and G. Parry. “Seven Propositions of the Company, 1971.
Science of Improvement: Exploring Foundations” Quality Wheeler, D. Understanding Variation. Knoxville, TN: SPC
Management in Health Care 22, no. 3 (2013): 170–186. Press, 1993.
Selltiz, C., M. Jahoda, M. Deutsch, and S. Cook. Research Wheeler, D. Advanced Topics In Statistical Process Control.
Methods in Social Relations. New York: Holt, Rinehart & Knoxville, TN: SPC Press, 1995.
Winston, 1959.
CHAPTER 7
Overcoming Numerical
Illiteracy
Variation (var’-i-a’shun), n 1. Act or instance of varying; change in the form, position, state, or qualities
of a thing; modification, deviation, or an instance or example of such. 2. Extent to which a thing varies;
amount or rate of change.
—Webster’s Collegiate Dictionary, fifth edition, 1946
By permission. From Merriam-Webster’s Collegiate® Dictionary, 11th Edition ©2017 by Merriam-Webster, Inc. (www.Merriam-Webster.com).

I
n this chapter, three specific activities are dis- based on being able to write the first letter of
cussed that can help to immunize you against your first name three times with no variation in
numerical illiteracy. These activities include: the form, structure, or overall appearance of the
letter. If you are able to perform this simple task,
■■ Understanding variation conceptually
you will receive a 50% increase in your salary.
■■ Distinguishing common cause variation
Remember that there can be no variation in
from special cause variation
the letters. Give it a try. Now here is the second
■■ Making the appropriate responses to common
part of your performance evaluation. Place your
and special causes of variation
pen or pencil in your opposite hand and write
the same letter three times. To receive the 50%
increase in salary all six letters must be exactly
▸▸ Understanding the same with no variation. How many of you
Variation Conceptually passed the performance evaluation test? If you
are like me, your results look something like this:
Variation exists in all that we do, even in the
simplest of activities. For example, consider
writing your name. This is a simple activity
that you probably do each day. Imagine that
your annual performance review, however, was Bob’s left hand Bob’s right hand
© Michal Steflovic/Shutterstock

171
172 Chapter 7 Overcoming Numerical Illiteracy

The three letters on the right were done statement many times. But how does this clear
with my dominant (right) hand. Although these position on variation play out across the other
letters look similar, they are not identical. There items? When I have asked these questions in
are subtle variations in the curves, where the class, most participants tend to say that even
loops intersect another part of the letter, and with identical twins there is some variation—
in the size of the letters (the middle letter B is a subtle variation, not unlike the variation found
little smaller than the other two). Now compare when writing three letters with your dominant
these three letters with the three I wrote with hand. More variation usually comes into play as
my left hand. The differences are obvious. There people describe siblings. One of the classic lines
is not only obvious variation between the two related to siblings is: “I can’t believe they came
sets of letters but also the variation within the from the same gene pool!” To me, this is type of
left-handed letters is greater than the variation a statement is a clear signal that the respondent
within the right-handed set. You might even has a view that there is more variation when it
think that two different people wrote these two comes to siblings than to twins.
sets of letters. The responses to variation associated with
So it is very likely that none of us would the remaining items on the list typically depend,
receive the 50% pay raise. Why? It is because to a large extent, on personal experiences (e.g.,
variation exists in all that we do. Sometimes we How long is your commute? What stocks have
are quick to recognize this fact, but at other times you purchased recently? How skilled are you
we act as though there should be no variation. at golf?). The historical experiences you have
Our society is full of great examples of how we had with these items will typically guide your
understand (and frequently do not understand) estimates of the amount of anticipated variation
variation. Consider the following list. How much in the process in the future. This is exactly what
variation do you expect to see in each of these Deming meant when he said, “The past is helpful
items? A lot? A little? None? to us only if it helps in the future, if it predicts”
(quoted in Schultz, 1994, p. 23).
■■ Snowflakes
The last two items in the list (i.e., patient
■■ Twins
satisfaction results and monthly expenses
■■ Siblings (not twins)
compared to budget) deserve a few additional
■■ Your commute to work
comments. Even though patient satisfaction
■■ The stock market
scores are typically derived from sample data,
■■ Your household expenses each month
which requires an understanding of sampling
■■ A patient’s blood pressure readings
error, many people seem to think that patient
■■ Bowling or golf scores
satisfaction scores will (or should) keep going
■■ A patient’s heartbeat
up. The essential nature of variation in patient
■■ Your weight throughout the year
satisfaction scores, however, reveals little
■■ Patient satisfaction results
fluctuation in the numbers. Sometimes the
■■ A department’s monthly expenses compared
numbers go up, sometimes they go down but
to budget
most of the time patient satisfaction scores
As you scan, this list you will probably reflect random variation. The point is that
think that all of these items listed will exhibit assuming, wishing, or hoping that the patient
some form of variation. The snowflakes usu- satisfaction scores constantly go up is a clear
ally receive the most definitive response: “You demonstration of the lack of understanding of
know, no two snowflakes are alike.” This popular variation. In this case, the desired state (i.e.,
expression presents a view that there is clearly improved scores) is being confused with the
considerable variation among snowflakes.1 We objective evaluation of the variation that lives
have all heard and even said this snowflake within the data.
Understanding Variation Conceptually 173

When people do not understand variation because they look only at quarterly comparisons?
they will demonstrate a number of rather unique Do they use aggregated numbers and summary
behaviors. Most of these behaviors can be grouped statistics to discuss their results? If these are the
into one or more of the following behaviors: behaviors of the organization’s leaders then it
1. Deny the data (It doesn’t fit my view is not surprising that denial is one of their key
of reality!) strategies.
2. Distort the process that produced
the data
3. See trends where there are no trends
Distort the Process that Produced
4. Try to explain natural variation as the Data
special events A follow-up behavior to denying that the data are
5. Blame and give credit to people for either not relevant or not yours is to be critical
things over which they have little or of the process (and people) that produced the
no control data in the first place. We do this frequently
6. Find it very hard to understand past in health care especially when the data are
performance, make predictions, and/ assembled on us by outside groups and then
or make improvements. released to the media. The criticisms are levied
Each of these behaviors is explored next. at the dates of data collection (e.g., these are old
data), the procedures used (e.g., the sampling
design, if observers were used to collect data in
Deny the Data (They Don’t Fit My a qualitative manner, poor stratification levels,
View of Reality) or no risk adjustments were applied to the data).
Politicians and those running for elected office
Unfortunately, this reaction occurs much more
probably engage in this practice even more
frequently than we would like. Individuals at all
than those of us in the healthcare industry. If,
levels of the organization frequently get into a
for example, the results of a political poll on a
state of denial when the data do not fit their view
candidate’s performance or the degree of sup-
of reality. The major drivers of this orientation
port the candidate is receiving from the public
are internal and external targets and goals that
is not acceptable to the candidate then there is
are not linked to the current capability of the
usually a fairly strong disclosure that the results
processes being observed. Individuals expect
of the poll were biased or not representative of
their arbitrary target and goals to be achieved
the candidate’s true constituency. Many of these
without having any clear understanding the
reactions are key components of Deming’s cycle
current performance of the process and what
of fear referenced in Chapter 5.
the processes are capable of producing. Another
reaction when the data do not fit one’s view of
reality is that “This is not our data. It must be a People Will See Trends Where
mistake and our data have been confused with
some other provider’s.” The final reaction to data There Are No Trends
that do not fit one’s view of reality is “These are There are many popular views on what consti-
old data and do not reflect our current perfor- tutes a trend so that people end up confused
mance.” Irrespective of the response, the central about the true nature of a trend. For example,
issue is what are the factors that are causing the it is not uncommon for the media to describe a
denial orientation? Does the organization not “trend” in fashion, dining, the weather, or cars.
really understand its actual performance and Most of these references have little to do with
results? Are the leaders of the organization not statistical trends. At best they reflect opinions
familiar with their current levels of performance about fads, marketing plans, or buying patterns
174 Chapter 7 Overcoming Numerical Illiteracy

of consumers. Every day the media reports about This behavior can be found everywhere: in staff
an upward or downward trend in something. meetings, board meetings, casual conversations,
More often than not, this simply refers to the and the media. We pride itself on being “action
fact that the most current number is higher (an oriented.” So when a group is presented with
upward trend) or lower (a downward trend) than data, group members will look for anomalies
the previous number. The other perspective on or special events so that they feel justified in
a trend was documented nicely in the Chicago taking action or arriving at a conclusion about
Tribune Magazine (August 16, 2015). The head- the data’s performance.
line on the front page of the magazine read, “Fall I had a physician in a class one day who
Fashion: The new trend is no trends!” The cover described this behavior very nicely. As I finished
picture accompanying this headline showed a explaining the differences between common
very fashionable young lady in a rather strange (random) variation and special cause or nonran-
mixture of clothing styles that did not look like dom variation, he blurted out, “I explain natural
they actually went together. I suspect that the variation as special events.” When I asked what
picture was trying to capture the headline or vice he meant, he explained that when his diabetic
versa. The key point is that trends in fashion, food, patients come to see him for checkups, he is
cars, or music are not the same as a statistical prone to change their medications when he
trend. So what constitutes a statistical trend? notices the slightest up or down movement in
A true statistical trend is defined by an ever their blood glucose readings. He also noted that
increasing or decreasing series of consecutive his patients seem to have rather wide swings in
numbers (Carey, 2003; Carey and Lloyd, 2001; their readings—wider than the ranges his part-
Pyzdek, 1990; Western Electric Company, 1985). ners reported for their patients. As we processed
Because many people do not engage in statistical these insights, he came to the conclusion that
thinking they will start to see trends where there his patients were probably experiencing wider
are no trends. They look at two or three numbers variation because he was, in fact, trying to
that appear to be going up or down and call it a explain natural variation as special events. He
trend.2 I bet you will, in the near future, hear or continued his own self-assessment by concluding
see in the paper or on TV someone proclaiming that he was overreacting to both high and low
that there is a trend in a particular aspect of blood glucose readings. In both cases, he was
financial or social performance when in fact it attributing special cause status to readings that
is merely (1) the comparison of two numbers or were essentially common or random cause. In
(2) an individual’s opinion about a vague tendency doing so, he was essentially tampering with a
that something is moving in a new direction as stable process that actually ends up increasing
in a fashion trend. The specific statistical rules variation. More on this point in a minute. The
for determining a trend are discussed later in result of treating natural variation as if it were
this chapter. special cause is that you increase variation. When
I asked him why he thought he did overreact to
natural variation, he made two comments. First,
People Will Try to Explain Natural he said that he was basically taught to react to
each data point. Second, he thought his patients
Variation as Special Events would think he was not a “good” doctor unless
At an earlier point in time I thought that im- he took action when they came to see him. Over
properly defining a trend was the most prevalent the years, this physician and I have become good
problem that we needed to address. Over the friends. He is now a staunch supporter of what
years, however, I have changed my mind. I now Shewhart and Deming were trying to teach
think that trying to explain natural variation as people—that (1) variation exists and (2) you
special events is a much more pervasive problem. need to be able to determine if it is common or
Understanding Variation Conceptually 175

special cause variation before deciding upon a the funnel. In this case, the funnel is adjusted to
course of action. drop the next marble over the spot where the
Every day you will find people who try to previous marble came to rest. Rule 4 basically
explain natural variation as special events. The states that the variation in the system will “ex-
inevitable outcome of such behavior is overreacting plode” (Deming’s term) if you keep changing the
to the numbers, which leads to tampering and position of the funnel on every subsequent drop
ultimately to increased variation in the process. of a marble. This essentially amounts to tampering
This principle has been best demonstrated by by overreacting to individual data points without
Deming’s experiments with the funnel (Deming, understanding how they all fit together to form
1992). The objective of the funnel demonstration a system of variation. Deming described Rule 4
is to drop a series of marbles through a funnel and as follows: “Rule 4 will yield a random walk. The
see where they land on a target placed beneath successive drops of the marble resemble a drunk
the funnel. The setup for this demonstration is man, trying to reach home, who falls after each
shown in FIGURE 7-1. Deming identified four rules step and has no idea which way is north. He steps
to guide the dropping of the marbles, all of which in any direction, with no memory. His efforts
produced very different patterns of variation. The eventually send him by faltering steps further
rule of most interest to us right now is Rule 4 of and further from his target” (1992, 329).

Marble #2
does not go to
Reposition the bull’s-eye—
funnel here and it lands here
drop marble #2

Target M2

M1

Marble #1 hits the bull’s-eye then bounces and rolls


to its final resting spot on the target (M1) Marble #1 lands here

Measure the distance from marble #1 (M1) to the bull’s-eye. Then swing the
funnel an equal distance in the opposite direction and drop marble #2 at this point.
The assumption is that the 2nd marble will follow the same course as the first one
and end up in the bull’s-eye. But marble #2 does not go where marble #1 went.
So measure the distance that marble #2 is from the bull’s-eye and swing the
funnel an equal distance in the opposite direction. Continue this procedure until
you have dropped at least 50 marbles.

FIGURE 7-1 The funnel demonstration


176 Chapter 7 Overcoming Numerical Illiteracy

If you dropped all the marbles without ever examples are a little dated but they provide such
adjusting the position of the funnel, you would classic examples of not understanding variation
end up with the minimum amount of variation that they need to be highlighted. The first story
and a stable process. The resultant pattern on comes from front pages of two newspapers in
the target would essentially resemble a flatten the same city on the same day. On September 1,
normal (bell-shaped) curve (i.e., the majority of 1998, the morning edition of the San Francisco
the marbles would be clustered around the center Chronicle had the following headline: “Panic
of the target and as you proceeded out from the hammers market.” It was accompanied by a pic-
bull’s-eye you’d find a gradual reduction in the ture of a stock market analyst holding his head
number of marbles similar to the tails of a normal in dismay at the plunging stock quotes. This was
curve). In this case, the process variation is at accompanied by multicolored graphs of rapidly
a minimum because natural variation was not dropping stock prices. The evening edition of
interpreted and reacted to as special events. In the San Francisco Examiner, however, told a very
a word, there was no tampering. The reader is different story. Its headline read, “Stocks bounce
encouraged to spend time reading the details of back.” The article went on to explain how the
this demonstration and becoming familiar with stock market “soared” late in the closing hours of
all four rules of the funnel. They will provide the day to overcome the panic that had occurred
critical knowledge about the subject of variation. that morning. This the San Francisco Examiner
An example from health care will help to concluded was an upward trend in the stock
anchor Rule 4 of the funnel. Imagine that you market and a very positive sign for investors.
are the manager of an outpatient clinic. You have The morning drop and the afternoon rise were
been tracking the average wait time by day for merely natural fluctuations in the data. Neither
several weeks. As you review the figures, you one was a special event and the movement of two
notice that the average wait time yesterday was data points is definitely not a statistical trend.
higher than it was the day before. So you call The second example comes from the Wall
the staff together and tell them that this higher Street Journal (September 19, 1997).3 The headline
wait time is unacceptable. Your response to this read, “Trade deficit surges 25% to 10.3 billion.”
increased wait time is to adjust the staffing patterns The article goes into great depth describing how
to improve efficiency. The next time you look at the trade imbalance with Japan was the “largest
the data you observe that the average wait time in two years” and that the “Chinese gap widens
was lower than it had ever been during the past to 9%.” There was very colorful language in the
month. So, you change the staffing patterns to article about how the trade deficit has “ballooned”
“lighten up” on the number of workers and their to $10.3 billion. The two-and-a-half-page article
respective workloads. What have you accom- does not paint a very positive picture for the U.S.
plished? By trying to force every day’s average economy. They even showed a little line graph
wait time to be like the previous day, you have depicting the deficit over time. The article based
been demonstrating Rule 4 of the funnel. You its 25% “surge” conclusion on a comparison of
have overreacted to both the high and low aver- the most recent deficit with the previous time
age wait times and adjusted the process without period’s deficit. If one computed a percentage
knowledge of the system’s performance. The change between these two time periods, it was
prudent manager would first plot the data on a in fact about a 25% increase (from roughly
control chart and then determine if the process $8.2 billion to $10.3 billion). If they had placed
exhibited common or special causes of variation. the data on a control chart, however (which
This tendency to try to explain natural varia- they did not do), they would have realized that
tion as special events is demonstrated in nearly all the deficit over the 25-month period (July 1995
walks of life. For example, consider the following through July 1997) was nothing more than
illustrations from the newspaper industry. The common cause variation. FIGURE 7-2 shows
Understanding Variation Conceptually 177

14.0
UCL = 13.378
13.0
Was this the pinnacle of despair?
No, but
12.0 this was a
“surge”!
11.0
Billions of dollars

10.0

9.0 CL = 8.936

8.0

7.0

6.0
Was this the abyss of happiness?
5.0
LCL = 4.494
4.0
May96

May97
Nov95

Nov96
July95

Sept95

Jan96

Mar96

July96

Sept96

Jan97

Mar97

July97
Month

FIGURE 7-2 Deficit spending from July 1995 through July 1997

that the average deficit over the study period nation will continue to have an annual deficit of
was $8.9 billion with an upper control limit roughly $9 billion, then they should take steps
(UCL, to be described later in this chapter) of to (1) shift the average deficit to a lower level of
$13.4 billion and a lower control limit (LCL, performance, and/or (2) reduce the variation in
also to be described later) of $4.4 billion. The the entire process and thereby bring the UCL
average, along with the UCL and LCL, basically and LCL closer together.
describes what is known as the capability of the Regardless of the action plan, the key
process. Stated differently, the deficit process, learning point in both of these examples is that
on the average, would continue to be about the newspapers clearly saw natural variation
$8.9 billion per year. It could go as high as as a special event. In the second example, I am
$13.4 billion or as low as $4.4 billion, which actually a little surprised that they did not pick
represented the variation in the deficit process. up on earlier data patterns and give them top
The question is not “What are you going billing in this article. For example, the month of
to do about a 25% surge in the last two data January 1997 had the highest deficit during this
points?” but rather, “Are you satisfied with an entire study period. Why did they not label this
annual average deficit of $8.9 billion with a as the “pinnacle of despair”? They could have
potential swing of anywhere between $4.4 and said that prior to this point there was a “sudden
$13.4 billion?” If the government is willing to and rapid upward trend followed by a sudden
accept this process capability, then they should and rapid precipitous fall in the deficit.” If the
do nothing because the process will continue to month of January 1997 was the highest deficit,
perform in this manner into the future. If, on why did they not also highlight the period from
the other hand, they do not like the fact that this August through December 1995, when the deficit
178 Chapter 7 Overcoming Numerical Illiteracy

was below the average for 5 straight months? If every month (i.e., no variation from target). Or-
January 1997 was the pinnacle of despair, then ganizations that understand variation recognize
this 5-month period of relatively low deficit this fact. But some organizations act as though
spending must surely have been the “abyss of there should be no variation from the expected
happiness.” In short, the deficit will never im- average monthly budget figure. When a given
prove by seeing events as special when in fact month’s expenses are above the average expected
they represent nothing more than random or expenses for the month, the manager of that
common cause variation.4 area is usually blamed and required to write a
lengthy memo explaining why there is variance
above the expected average. For the manager,
People Will Blame and Give Credit this behavior creates confusion, because there
to Others for Things over Which will frequently be praise and potentially rewards
when the department’s budget is below the target.
They Have Little or No Control It gets even more confusing, however, when the
This is probably one of the most damaging aspects manager is blamed when the budget is above the
of not understanding variation, because it has expected average but receives no credit when it
a direct and immediate impact on the workers. is below the target for each month. Why is there
Consider Bill, the manager of the outpatient no requirement for a memo when the budget is
testing and therapy department. You see him below the target?5
in the cafeteria and he looks rather depressed. The final and most negative aspect of this
You ask him why he seems down, and he tells last behavior is the “kill the messenger” perspec-
you that his boss just chewed him out because tive highlighted in Deming’s cycle of fear. This
the monthly patient satisfaction scores were happens when the boss wants to hear only good
lower this month than they were last month. news. If the numbers are down this month the
The next month you also happen to run into staff draw lots to see who will have to deliver
Bill, who is in a very good mood. You say, “Bill, the bad news. Fortunately today people do not
good to see that you are happier than the last literally get killed for delivering bad news to the
time I saw you. What’s going on to make you boss but organizations have come up with rather
this happy?” Bill responds, “Well, the boss likes clever ways to deal with receiving bad news and
the patient satisfaction scores this month and he then blaming the worker for being incompetent.
just told me I could buy pizza for everyone in Dr. Laurence Peter’s classic book The Peter
the department to celebrate.” With an inquisitive Principle: Why Things Always Go Wrong (1970)
look on your face you ask, “Wow, what did you is one of the best reads on what happens when
do to increase the scores?” Bill responds with a processes fail to deliver the desired results. One of
puzzled look, “Nothing. I guess we just lucked Peter’s principles is that the messenger bringing
out this month.” Unfortunately, I have been poor results to the boss can easily be “laterally
the recipient of this type of behavior and have arabesqued.” In this case, the individual is not
observed many other individuals who have tried outright fired but moved to a new role with a new
to cope with this form of numerical illiteracy. title and stuck in a remote part of the building.
Behaviors surrounding budget variances Such a “pseudo-promotion” as Peter calls it will
provide another classic example of this behavior. generally demoralize the worker and from the
Organizations will typically take a budget and boss’s perspective, be enough to cause the person
divided it into 12 equal segments. The average to leave on their own. Actions that organizations
monthly expected expense figure is then pre- and individuals take in regard to blaming and
sented to the manager as the “target” for each giving credit for things over which the workers
month’s actual expenses. Everyone knows that have little control create what Deming (1994)
the actual expenses will not be exactly the same referred to as the “heavy losses.”
Distinguishing Common from Special Causes of Variation 179

People Will Find It Very Hard to that focus on maintaining efficient and effec-
tive processes will meet and exceed customer
Understand Past Performance, expectations. Shewhart’s recommendation for
Make Predictions, and/or Make creating efficient and effective processes was very
simple. He maintained that if you understand
Improvements the variation that occurs within a process, you
Quality improvement is based on understanding will be able to make appropriate management
these three activities. Yet if an organization ap- decisions that will produce high-quality prod-
proaches the analysis of data from a static rather ucts and services. Shewhart distinguished two
than a dynamic point of view, then there is a types of variation: assignable and unassignable.
very high probability that they will not be able to These terms were later revised by Deming to
understand past performance, make predictions, the more popular terms common and special
and/or make improvements. Demonstrating skill causes of variation (Deming 1992; Schultz
in these behaviors requires an understanding 1994; Shewhart 1931). Deming (1992, 314)
of where measures have been, where they are classified common cause variation as “faults of
now, and where they will go in the future. The the system” and special causes as “faults from
answers to these questions cannot be found in fleeting events.”
summary statistics and tests of significance. They The genius of Dr. Shewhart was that he had
can be found only in knowledge of variation a unique ability to turn academic principles and
and statistical process control (SPC) methods. concepts into practical and easy-to-understand
concepts that the workers on the shop floor of the
Western Electric Company back in the 1920s could
easily grasp. The essence of what Dr. Shewhart
▸▸ Distinguishing was conveying about variation is shown in
Common from Special FIGURE 7-3. His basic point was that if you
look at a distribution of data as a normal bell
Causes of Variation curve you will obtain a static view of the data
in which time has been removed. When you
The previous examples should help to highlight summarize an array of data as a distribution
some of the inconsistencies we have when it you have no idea which data point came first,
comes to understanding and interpreting vari- in the middle, or at the end. But, if you turn
ation. Once we begin to think about the nature this distribution on its side and then extract the
of variation and how it plays a major role in our data from the static distribution you see how
daily lives, however, we will be in a much better the data array themselves in time order. This
position to actually start distinguishing between will give you a very different view of the data.
two very distinctive types of variation, common In the static distribution, time is not relevant.
and special cause variation. When you add the dimension of time back into
The origins of common and special causes of the distribution you will see the variation in the
variation can be traced back to the early 1920s, indicator as it lays itself out in chronological
a time when this country was struggling with order (i.e., hour by hour, day by day, week by
a fairly basic question: How can you increase week, or patient by patient). When you look at
production and maintain quality? The man who data this way, Shewhart maintained that you
helped answer this question was Dr. Walter could then understand whether the data dis-
Shewhart. Shewhart argued that all work could played characteristics of controlled (common)
be viewed as a series of interrelated processes. or uncontrolled (special) causes of variation.
Because customers are the recipients of process Common cause variation is random variation
output, it stands to reason that organizations that results from regular, natural, or ordinary
180 Chapter 7 Overcoming Numerical Illiteracy

Dynamic View UCL

Static view
St time
at
ic
vi
ew

LCL
Every process displays variation:
• Controlled variation (random variation)
stable, consistent pattern of variation
“chance”, constant causes

• Special cause variation (non-random variation)


Static view “assignable”
pattern changes over time

FIGURE 7-3 Shewhart’s view of variation

forces. Common cause variation affects all 85 minutes or as little as 40 minutes.” What has
outcomes of the process and results from the she done here? She has described the variation
regular random rhythm of the process. It produces in her morning commute in a way that would
processes that are stable or “in control.” One can have pleased Dr. Shewhart. If all she said was
make predictions, within statistical limits, about “Oh, about 55 minutes” all we would have is some
a process that has only common cause variation rough estimate of her average commute time.
(this is often referred to as process capability). In But when she added the other two numbers (i.e.,
a common cause state, there are no indications from about 85 to 40) we now know something
of special cause, because the variation results about the variation in her morning commute.
only from chance fluctuations in the data. Your The key thing to remember with common
morning commute to work provides a good cause variation, however, is that it does not mean
practical example of understanding variation. that the performance of the process is good or
If you ask people, “How long does it take you even acceptable. It means only that the process
to get to work in the morning?” most respond is stable and therefore predictable. A process
“Oh about (fill in the blank) minutes.” Each can be predictably bad. For example, a patient
day’s commute is not exactly the same number may have blood pressure readings that are stable
of minutes (unless you live two blocks from and very predictable but at an unacceptably
your place of work, in which case we would high level (e.g., a systolic pressure that averages
not include you in this analysis). Let’s say that 175 with a minimum at 165 and a maximum
someone responds “Oh, about 55 minutes.” The at 185). It is stable, and therefore, predictable
word “about” is an indication that there is some but unacceptable clinically. The same could be
degree of variation around 55 minutes. Then she true for cholesterol, white blood cells counts
might add, “It could take me as long as about or blood glucose levels. In all these cases, we
Distinguishing Common from Special Causes of Variation 181

would need to shift the various process outputs exhibiting essentially common cause variation
to more acceptable levels of performance. This is with respect to a normal heart rhythm. A little
what (QI) is designed to do. Remember, though, up, a little down but within expected performance
common cause means stable and predictable, capability parameters for assessing heartbeats.
not necessarily acceptable. If, on the other hand, the patient’s heart rate
Special cause variation, on the other hand, starts to climb and goes from 65 to 70 to 75,
results from irregular or unnatural causes that continues past 100 beats per minute, and settles
are not inherent in the process. Special cause at around 140 beats per minute, this would be
variation affects some but not necessarily all seen as a signal that the patient is exceeding
outcomes of a process. When special causes normal expectations and too much variation.
are present, a process will be classified as “out At some point the patient would move from a
of control” and, therefore, unstable. Wheeler common cause state to one that reflects special
(1993) refers to special causes as “signals” that a cause variation, and action would be taken to
process is in a state of chaos. The future perfor- investigate why the heartbeats had progressed
mance of a process that exhibits special causes to such high levels and then corrective strategies
will be unpredictable. This is why improvement would be put in place to correct the heart’s rhythm
strategies should not be applied to processes and bring it back into a state of common cause
exhibiting special cause variation. Because variation. Action would not be taken, however,
they are unstable and unpredictable, attempts while the patient was demonstrating acceptable
to improve them will only lead to wasted time, common cause variation.
effort, and money as well as increase variation. Now consider a trauma patient who is
The response to special causes of variation should brought into the emergency department (ED).
be to investigate the origin of the special causes, The patient has a severe head injury, is hardly
determine why they occurred (conduct a root breathing, and has lost a considerable amount of
cause analysis), and then take steps to eliminate blood. As you are examining him you also notice
the causes from the process. If you do not extri- that he is wearing a medical bracelet indicating
cate the factors that led to a special cause they that he is diabetic. What is the first thing on the
will rear their ugly heads again in the future. No minds of the ED team? What would they do for
one can predict exactly when they will occur but this patient? They would first and foremost focus
as James Reason points out in his classic work their efforts on “stabilizing the patient.” They
Human Error (2003) the latent causes of error would work to get the patient breathing under his
(i.e., special causes) are inherent in the system own power, stop the bleeding, and deal with the
and under the right set of conditions they will head trauma. What they would not do is gather
occur again if you do not take steps to remove around a flipchart and brainstorm ideas about
these causes from the system. This is classically how this patient might improve his diet in order
referred to as the Swiss cheese theory of errors to have better control of his diabetes (maybe the
(Reason, 2000). driver passed out because of poor management
Several healthcare examples should help of his diabetes). The patient came in with several
to clarify these two types of variation. When a of the key indicators of life reflecting special
patient is connected to telemetry, variation in cause variation (unstable and unpredictable).
his or her vital signs will be observed. In fact, The trauma team would work to get the patient
there is great concern if variation is not observed into a common cause state of affairs. Once the
and the patient is considered to be a “flat line.” patient was stable and predictable relative to
If the patient’s heartbeat is observed at 59, then breathing, bleeding, and head trauma the trauma
61, then 60, and so on, there is no need for im- team might then discuss improvement strategies
mediate concern. Why? Because the patient is related to diabetes management.
182 Chapter 7 Overcoming Numerical Illiteracy

▸▸ Making the The rows of TABLE 7-1 identify the type of


variation, the right choice, the wrong choice,
Appropriate Responses and the consequences of making the wrong
choice. The columns are split in two based on
to Common and Special whether you observe a stable or unstable pro-
cess. If you have only common cause variation
Causes of Variation the right choice is to change the process by
either working to reduce variation or move the
Being able to describe the differences between entire process through redesign to a totally new
common and special causes of variation starts level of performance. The wrong choice when
you on the road to statistical thinking. But, you have common cause variation is to treat
do you and those around you use this type of normal (random) variation as if somehow it
thinking to make management decisions? The was special. This is what was described earlier
attributes of a leader who understands variation as tampering (i.e., over- or underreacting to
include the following: individual data points when they are part of
■■ Leaders understand the different ways that a common cause process) and demonstrated
variation can be viewed. by Rule 4 of Deming’s funnel demonstration.
■■ They explain changes in terms of common The right-hand column starts off with a mix
causes and special causes. of special and common cause variation. In
■■ They use time series graphical methods this case, the correct decision would be to
to learn from data and expect others to investigate the reason(s) why the special cause
consider variation in their decisions and is present and take steps to extricate these
actions. causes from the process. The wrong choice
■■ They understand the concept of stable and when you have special cause variation is to
unstable processes and the potential losses change the process or system in reaction to
due to tampering. that one special cause without investigating
■■ They understand the capability of a process why it occurred. When this happens you end
or system before establishing targets or goals up wasting resources including time, effort,
and before changes are attempted. morale, and money.

TABLE 7-1 Appropriate Responses to Common and Special Causes of Variation

Is the process stable?

YES NO

Type of variation Only common Special + Common

Right choice Change the process Investigate the origin of the special cause(s)

Wrong choice Treat normal variation as a Change the process


special cause (tampering)

Consequences of making Increased variation! Wasted resources!


the wrong choice (time effort, morale, money)
Making the Appropriate Responses to Common and Special Causes of Variation 183

Almost every day you will read a story in the But their decision was based not on the findings
newspaper or hear a TV news report demonstrat- of why this special cause occurred but because
ing how people regularly overreact to a special they did not want to spend the money to totally
cause, don’t investigate the origin of the special redesign the departure area at the airport. What
cause, and proceed to change the system. This someone should have said was, “This is a special
happened in Chicago a number of years ago. cause. It has not happened in the past but we
An elderly driver entering O’Hare Airport to do have a pretty good understanding as to why
pick up a family member became confused in it happened.” The conclusion and follow-up
the heavy traffic congestion. At some point he questions should not be about the curb height
either panicked or became confused and instead at O’Hare Airport but rather about the process
of hitting the brake put his foot on the gas. The the State of Illinois has in place for reviewing
car proceeded to jump the curb and drive down the driving skills of elderly drivers. They almost
the departure area sidewalk. Unfortunately there made the wrong decision.
was a group elementary school students on a Once you have developed a comfort level
field trip on the sidewalk when the gentleman with the concepts of common and special cause
lost control of his car. He drove into the children, variation and how to make the appropriate deci-
killing three of them and injuring many more. sion, it is time to apply this knowledge to daily
This tragedy immediately was brought to the work. Engage in a dialogue with your colleagues
attention of the mayor of Chicago and the city around these issues. Do this by:
council. Something must be done was the public
■■ Selecting several measures your organization
response. My question is was this a special cause
tracks on a regular basis.
or a common cause? It had never happened be-
■■ Then determine whether you and the leaders
fore and has not happened since. But the initial
of your organization evaluate these measures
reaction of the Chicago City Council was that they
according to the criteria for common and
should erect three-foot-high barriers along the
special causes of variation.
curbside check-in area so cars could not jump
■■ If you do not use common and special causes
the curb. Another idea was to raise the height of
of variation to interpret your indicators, what
the curbs along the departure driveway to a height
criteria do you use to determine whether
of about 15 inches. Now think of this for a minute.
data are improving or getting worse?
Placing large construction barriers all along the
curb or increasing the height of the curb at one In summary, the basic points that Shewhart
of the busiest airports in the world would prove (1931) taught about these two types of variation
to be a traveler’s nightmare. Dragging suitcases were as follows:
and other baggage up over a 15-inch high curb
■■ Variation exists in all that we do
or between construction barriers would totally
■■ Processes that exhibit common or chance
delay and irritate probably everyone. So I ask
causes of variation are predictable within
again, do you think this was a special cause or
statistical limits
a common cause? When I tell this story in class
■■ Special causes of variation can be identified
participants agree that it was a special cause. It
and eliminated
was a singular event that although tragic required
■■ Only processes that exhibit common cause
investigation not changing the system. I turns
variation can be improved
out the gentleman who was 85 years old claimed
■■ Attempting to improve processes that contain
his gas pedal stuck. The car turned out to be
special causes will increase variation and
in proper working order. He had never driven
waste resources.
into O’Hare Airport before, became confused
and then unfortunately panicked and made the With this as a backdrop it is now time to
wrong decision. So, the city council did nothing. dive into understanding variation statistically.
184 Chapter 7 Overcoming Numerical Illiteracy

Notes is that I have reoriented my financial


advisor and now twice a year we review
1. The science of snowflakes is actually the control charts on my investments
quite interesting. The first person to to see if there is really a trend or shift
actually photograph a snowflake was in any of my investments. He no longer
Wilson Bentley on January 5, 1885 in calls to tell me that this quarter’s returns
Jericho, Vermont. He spent the majority are higher than the previous quarter’s.
of his life photographing snowflakes 3. I want to acknowledge and thank my good
and guess what? He never found two friend and colleague Dr. Ray Carey for
that were alike. Subsequent research finding this example. We used this story
has essentially confirmed Mr. Bentley’s in our public seminars for many years but
supposition. If you would like to learn never placed it in print. It is a wonderful
more about this topic watch this very demonstration of the tendency to attribute
interesting You Tube video and find out special cause characteristics to common
more about snowflake science: https:// cause variation. I appreciate Dr. Carey’s
youtu.be/fUot7XSX8uA. keen eye for observing examples of vari-
2. I have experienced this reaction with my ation in everyday life.
financial advisor. I got a call from him 4. It is interesting to note that the deficit did
recently telling me that my investments demonstrate a special cause in the months
have experienced “an upward trend.” following the Wall Street Journal article of
I then looked at my portfolio and see September 19, 1997. Dr. Carey continued
that the most recent quarter’s results are to track the deficit and discovered that the
a little higher than the previous quarter’s next 4 months after July 1997 continued
results. My financial advisor was quite upward. When these four data points
excited about this “trend” in one of were combined with the last two data
my investments. I am more sanguine. points shown in Figure 7-2, an upward
So, I took the data on this particular trend was identified (i.e., six data points
investment over the past 2 years and constantly going up). If the newspaper had
made a control chart to see if there were taken the time to understand variation
any statistical shifts or trends in the in- and track it properly, they would have
vestment results. There were none. The had a real story.
control chart revealed merely common 5. I worked for an organization a number
cause variation indicating that nothing of years ago that perfected this behavior.
had really changed over the past year The project I was managing would be
for this particular investment. I had above the expected budget one month
not lost money but on the other hand, and below it the next. The individual
there was certainly not an upward trend responsible for “leading” our area was
in the data. Many professions including probably one of the most numerically
financial, legal, the media, political, and illiterate individuals I have ever met.
even health care love to see trends where Not only did I have to write a detailed
there are none. In most cases, people who justification for why the project was
voice such conclusions are looking for over the expected monthly average but
data that somehow confirm their view of I also had to sit through a ceremonial
reality. It is up to all of us to help these tongue-lashing for allowing the expected
folks see that they lack statistical thinking. budget threshold to be breached. I would
By the way, the rest of my financial story
References 185

be told I was a poor manager for allowing References


this to happen and that it would have
Carey, R. Improving Healthcare with Control Charts. ­Milwaukee:
a negative impact on my performance Quality Press, 2003.
review if it continued. I found it curious Carey, R., and R. Lloyd. Measuring Quality Improvement
that I was never praised for being a good in Healthcare: A Guide to Statistical Process Control
manager when the expenses were below Applications. Milwaukee: Quality Press, 2001.
Deming, W. E. Out of the Crisis. Cambridge, MA: M
­ assachusetts
the expected monthly average. When this
Institute of Technology, Center for Advanced Engineer-
did occur, which was about 6 months ing Study, 1992.
out of 12, I was merely told that I was Deming, W. E. The New Economics, 2nd ed. Cambridge,
doing what was expected of me. Well, MA: Massachusetts Institute of Technology, Center for
I got very tired of this numerical illiteracy Advanced Engineering Study, 1994.
Peter, L. The Peter Principle: Why Things Always Go Wrong.
and decided to prepare six memos that
New York: Bantam Books, 1970.
provided different (and very compelling) Pyzdek, T. Pyzdek’s Guide to SPC. Vol. 1: Fundamentals.
reasons why the expenses were greater Milwaukee: Quality Press, 1990.
than expected. Every time the breach Reason, J. “Human Error: Models and Management.” British
occurred, which again was about half the Medical Journal 18, no. 320 (2000): 768–770.
Reason, J. Human Error. Cambridge, England: Cambridge
year, I would select a memo that had not
University Press, 2003.
been used in a while, date it, and send Schultz, L. Profiles in Quality. New York: Quality Resources,
it on to you-know-who. This cut down 1994.
significantly on the writing process but Shewhart, W. Economic Control of Quality of Manufactured
did not help the tongue-lashing aspect. Product. New York: D. Van Nostrand, 1931. Reprint,
Milwaukee: Quality Press, 1980.
That component, unfortunately, continued
Western Electric Company. Statistical Quality Control
and quickly motivated me to find an orga- Handbook. Indianapolis: AT&T Technologies, Inc., 1985.
nization that had a better understanding Wheeler, D. Understanding Variation: The Key to Managing
of variation. Chaos. Knoxville, TN: SPC Press, 1993.
CHAPTER 8
Understanding Variation
with Run Charts
U
nderstanding variation from a con- of run charts. Shewhart charts are discussed in
ceptual point of view provides a solid the next chapter.
foundation but it is only a start. If you
are truly interested in immunizing yourself and
those around you from numerical illiteracy, then
you have to understand variation statistically
▸▸ What Is a Run Chart?
as well as conceptually. This is where statistical The run chart is a practical and easy-to-use
process control (SPC) comes into the discussion. statistical tool that is used to understand the
Although there are a number of tools and methods inherent variation that lives in your data. Al-
that can be grouped under the SPC term, two though there are many different types of Shewhart
primary tools are essential for anyone interested charts there is basically only one way to make
in applying statistical thinking to daily work: the a proper run chart. It is a plot of data over time
run chart and the Shewhart chart (a.k.a. control with the unit of time (e.g., day, week, or month)
chart). These charts are used in order to: always laid out along the horizontal or x axis
and the indicator values (i.e., the data points)
■■ Make the process performance visible
always plotted against the vertical or y axis. Any
■■ Determine if a change statistically demon-
type of data (e.g., a count, a percentage, a rate,
strates an improvement
money, time, a score, an index, or days between
■■ Determine if once an improvement has been
events) can be plotted on a run chart. The data
observed it the improvement has been sus-
must be arranged in chronological order and
tained (Provost and Murray, 2011, p. 85–86)
the median is placed through the field of data
The remainder of this chapter addresses points as a reference line.1 FIGURE 8-1 shows the
the construction, analysis, and interpretation basic elements of a run chart.

© Michal Steflovic/Shutterstock

187
188 Chapter 8 Understanding Variation with Run Charts

6.00

5.75

5.50

5.25

5.00
Measure

4.75
X (CL)
4.50 Median
4.25

4.00

3.75

3.50

3.25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1920212223242526272829
Time Point Number

FIGURE 8-1 Elements of a run chart

▸▸ How Do I Construct a The median can be placed on the run chart


after four or five data points have been col-
Run Chart? lected. You have to realize, however, that as
you obtain more data the median will move
Constructing a run chart is a very simple activity. a little. Typically it will not move as much
It can be constructed by hand or with software. I as the mean when there is a small amount
have made many run charts on flipcharts during of data but it will fluctuate some. When you
team meetings. I have even made them on a apply the run chart rules (described in the
napkin or a piece of paper at a nursing station. next section) you should have a minimum
The following steps should be followed if you of 10 data points. This issue of how much
wish to make a run chart by hand: data is needed to make a chart relates back
to the discussion in the previous chapter on
■■ Select the indicator of interest and make sure
developing a data collection plan. Remember
the data have been organized in time order.
that run and Shewhart charts are designed
In terms of the number of data points you
to help you understand the variation in the
should have before constructing a run chart,
current operation of a process. So if you are
there are mixed views. Purists would proba-
collecting data at the patient level (e.g., wait
bly tell you that you should have 12–15 data
time for a patient to see the doctor) you will
points before making a run chart. This may
most likely have more than enough data to
pose a problem for a team, however, if they
make a run chart in a day. But as you extend the
are tracking a measure that has no historical
time frame for your data collection frequency
data. For healthcare applications I support the
and move to days, weeks, or months you will
perspectives of my colleagues Lloyd Provost
have to wait a longer period of time to obtain
and Sandy Murray who write that “a run
sufficient data to make a run or Shewhart
chart should begin when data for the first
chart. Again, I would suggest steering away
point to be plotted is available” (2011, 86).
How Do I Construct a Run Chart? 189

from even thinking about using quarterly to average the two values found at the
data to make any of the charts. 12th and 13th positions to determine
■■ Lay out the horizontal (x axis) and vertical the median value.
axis (y axis) and label them clearly. It is rec- • Next, you need to find the median value
ommended that you extend the horizontal (i.e., the number that actually resides at
axis beyond the last data point you have the median position). To do this you take
collected so that future data can be plotted. a piece of paper or a ruler and place it at
On the y axis, which is the reference axis the top of the chart. Slide the paper or
for the actual data values, make sure you ruler down the chart and as you do you
extend this axis about ± 20% more than the will reveal the data points in descending
maximum or minimum data values in order order. Because the y axis is calibrated
to make sure the chart has ample space for from low to high, as you slide the paper
future data values that may be greater than down the page you are viewing the data
any that you have already collected. This in descending rank order, which is a
will also provide enough extra space on the requirement for calculating the me-
chart to accommodate a target or goal line. dian. As you slide the paper down the
Also do not forget to make a clear title for chart and reveal each data point place
the run chart that includes the name of the a checkmark next to it. Continue to
indicator, the unit, or facility involved and reveal the dots until you reach the 13th
the dates for the period of data collection. data point (using our example where
■■ Plot the data in chronological order and make the media position = 13).
sure each data point is marked by a symbol • Now draw a horizontal line through
of some sort (usually this would be a dot). the 13th data point and see where it
■■ It is customary to connect the data points intersects the y axis. The median value
(i.e., the dots) with a line but note that there is the point on the y axis where the hor-
are some situations in which you would not izontal line ends. If we use the second
connect the data points with a line.2 example where the median position was
■■ The centerline on a run chart is usually the 12.5 all you would need to do is draw
median.3 Determining the median on a run the horizontal line equidistant between
chart is quite simple. Here is what you do: the 12th and 13th data points and see
• First, you need to find the median position. where that line intersects the y axis to
This is the location in the distribution of find the median value.
data the median. This is analogous to the • To verify that you have done this cor-
median’s house number or where it lives rectly you can reverse the process by
in the distribution of data points. To find sliding the paper up the chart and you
the median position use the formula should end up at the same position as
(n + 1)/2. If you had 25 data points, you did when you slid the paper down
for example, this would work out to be the chart and you should end up with
(25 + 1) 2 = 13. In other words, the the same median value.
median lives at the 13th data position. • A final point. The median position and
Note that if you have an even number the median value will usually not be
of data points, say 24, you will not the same. It is possible that this could
get a whole number. If we put 24 into occur but extremely rare. The median
the media position equation we get position is merely the location or place
(24 + 1)/2 = 12.5. In this case, the median where you will find the median and the
will reside between the 12th and 13th other is the actual numerical value of
data points. In this case, you will need the median.
190 Chapter 8 Understanding Variation with Run Charts

Sometimes explaining in words some of these A run is defined as one or more consecutive
statistical procedures can be more confusing than data points on the same side of the median
showing you how it is done. FIGURE 8-2 provides a (Lloyd, 2010). FIGURE 8-3 shows how runs are
picture of how these steps are completed to discover determined. The first thing to do is to identify
the median value of 4.6. Note that this is the same the data points that fall exactly on the median.
chart as was shown in Figure 8-1 displaying the These data points need to be identified but they
elements of a run chart. I keep referring to this are ignored when determining the number of
same graph throughout this section so you can see runs. You can count the number of runs by
the progression of constructing and interpreting (1) drawing a circle around each run and
a run chart. Once you are comfortable with this counting the number of circles you have
method for finding the median (shown symbol- drawn or (2) count the number of times the
ically as X with a ~ above it) you will actually be sequence of data points (the line on the run
able to compute the median faster by hand than chart) crosses the median and add 1 to this
entering all the data into the computer and using count. If you have done this correctly, the
software to produce the median value. number of circles you have drawn around the
clusters of data points (i.e., the runs) should
match the number you get when you count
▸▸ How Do I Analyze a the number of times the data line crosses the
median plus 1.4
Run Chart? In Figure 8-3, there are two data points
that fall exactly on the median value of 4.6.
Once you have laid out a run chart, the first These two data points are highlighted by plac-
question to be answered is, “What is a run?” ing a square around each one to remind us to

Formula to find the Median Position (n + 1)/2


(29 + 1)/2 = 30/2 = 15 = Median Position
6.00

5.75 When you slide a piece of paper


down, you reveal the dots in
5.50 descending order. When you have
revealed the 15th data point you have
5.25 found where the median lives.
Measure

5.00
The Median
4.75 Lives here at
the15th data
4.50 point
4.25
But, the
Median
4.00
Value = 4.6
3.75

3.50

3.25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1920212223242526272829
Time Point Number

FIGURE 8-2 Locating the median on a run chart


How Do I Analyze a Run Chart? 191

6.00

5.75

5.50 14 runs

5.25

5.00
Measure

4.75
Median=4.610
4.50

4.25

4.00

3.75 Points on the Median


(don’t count these when counting
3.50
the number of runs)
3.25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1920212223242526272829
Point Number

FIGURE 8-3 Determining the number of runs on run chart

exclude them from the identifying the number When people are just starting out with
of runs. Circles have been drawn around the run charts, they have a tendency to (1) include
runs in Figure 8-3. Fourteen runs are present the data point(s) on the median in the run, or
on this chart. If you count the number of times (2) count the data before the point on the me-
the data line crosses the median you will end dian as one run and then count the data after
up with 13 crosses. If you add 1 to this num- the point on the median as a second run. For
ber you end up with 14, which is the same as example look at FIGURE 8-4. The chart has six runs
the number of circles highlighting the runs. shown by the circles. If you count the number of
The first run contains two data points, then times the data line crosses the median you get 5
the next run of two data points is below the plus 1 equals 6 runs. So either way you get the
median line. Then the next run of one data same result. The first run in Figure 8-4 contains
point goes back above the median. Remember six data points. It does not contain seven data
that a run is defined as one or more consecu- points, which would include the median data
tive points on the same side of the median, so point, and it does not reflect two separate runs
you can have a run with one data point in it. broken by the second data point, which falls
The other technical point to remember when on the median. The same logic is the same for
counting runs is that because you are ignoring the fourth run, which contains five data points.
data points that fall on the median, these data Notice that there is a data point on the median
points do not get included in the determination but it is not part of the run. This point of ignor-
of a run. You can see this in Figure 8-3. Notice ing points on the median when determining a
that the circles do not include the data points run is probably the only real challenge when
on the median (the median data points are determining the number of runs but with a
identified by the two boxes). little practice you will master this step quickly.
192 Chapter 8 Understanding Variation with Run Charts

50

45

40
Number of Patient Complaints

Don’t count Run of 4


35 Run of 5 Run
data points on
the median of 1
30

25

20

15 Run of 1 Run of 1
Run of 6
10

5 Don’t count data points on the median

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Week

FIGURE 8-4 Determining the number of runs on a patient complaints run chart (6 runs)

Once we have identified the number of It is important to realize that there are a
runs on the chart, the next question that most variety of run chart rules that different writers
people ask is “So what? I’ve made this nice little have offered over the years to identify nonrandom
line graph and circled some clusters of data, but signals on a run chart. Some of these rules, although
why did I do it?” What can we learn from the having the same name, for example a “trend,” will
number of runs? The answer should be obvious. use different decision criteria to decide if a trend
The number of runs allows you to apply statis- exists or not. This is another important aspect of
tical rules that will help you identify the type of run and Shewhart chart analysis. So, do not get
variation that lives within your data. A technical too concerned if when you present a chart and
caveat needs to be noted at this point. Earlier we start to explain how you have analyzed it some-
discussed the types of variation and pointed out one says, “But that is not what I was taught was
that variation can be classified as either common a trend.” Life is full of options and even among
cause or special cause. These terms technically statisticians there can be considerable variation.5
should be reserved for use with Shewhart charts. At the Institute for Healthcare Improvement
Because run charts are not as precise as Shewhart (IHI), we have reached agreement on four run
charts (think x-ray versus an MRI) the proper chart rules that we have found to be most relevant
terms for describing variation on a run chart to healthcare applications:
are random and nonrandom patterns rather ■■ Rule 1: A shift in the data
than common and special causes of variation. ■■ Rule 2: A trend in the data
Now, I know some of you are thinking, “Picky, ■■ Rule 3: Too many or too few runs in the data
picky, picky.” You will hear many people using ■■ Rule 4: An astronomical data point
the terms common and special to describe the
variation patterns on a run chart. You will have
to decide for yourself if it is worth pointing out Rule 1: A Shift in the Data
to these folks that the terms common and special When do you know if a process has really moved
should be applied only to variation analysis on to a new level of performance? If you suffer from
Shewhart charts and not on run charts. But, at numerical illiteracy, you will probably think that
least you are now aware of this technical point. one or two data points that are “better” (however
How Do I Analyze a Run Chart? 193

defined) constitute a shift to an improved level. data and the fact that more stringent criteria for
When looking at data on a run chart, however, a trend (e.g., six or seven constantly increasing
the key to defining a shift in the process is when data points for a trend) require more data that
you have a run that contains six or more con- many healthcare teams do not have the luxury
secutive data points on one side of the median of collecting, especially when they are just start-
or the other (Lloyd, 2010). According to Provost ing their improvement efforts. So, the decision
and Murray (2011, 77) the probability of this to use five constantly increasing or decreasing
occurring by change alone when there has been data points as a trend is grounded not only in
no real change made to the process is less than statistical theory (i.e., the probability of getting
5% chance. In Figure 8-3, a shift is observed in five data points that constantly increase or
the fourth run where six data points are in a decrease is grounded is the same probability as
run that is below the median. The question is getting five heads in a row on a coin flip or .031
what were the conditions affecting the process probability) but also practical experience. Two
that caused the data to stay below the median final points to remember when deciding whether
for six points in a row? The run chart rule does a trend exists: (1) the median does not come
not provide an answer but merely points you to a into consideration when determining whether
period when a nonrandom pattern was detected. a trend exists so a data point that falls on the
In Figure 8-4, the process starts out with a shift median can be considered as part of a trend,
of six data points in a row below the median. and (2) ignore points that repeat the previous
Then a random pattern emerges for the rest of value. FIGURE 8-5A shows a run chart with 29
the chart. Often when a run chart starts out with data points with 1 on the median (marked by a
a shift below the median it can be because the square) and 13 runs. Do you see a trend in this
team was still learning how to collect the data chart? Actually there are two downward trends
and properly apply the operational definitions highlighted in FIGURE 8-5B. Notice how both of
for the indicators. After a period of learning the trends cross over the median. Remember
how to apply the operational definition and data that a trend is not dependent on the median.
collection plan, however, the true performance of Rules 1 and 3, however, do require the median to
the process is observed. Again, the detection of make a determination as to whether nonrandom
a nonrandom rule does not provide the answers variation is present in the run chart.
but should lead the team to raise question.
Rule 3: Too Many or Too Few Runs
Rule 2: A Trend in the Data in the Data
This is probably one of the more difficult rules Another nonrandom pattern is defined as too
to apply because of all the popular notions about much or too little variation in a dataset. Like Rule
what constitutes a “trend.” People love to find 1, Rule 3 is based on probability theory. When
a trend. This issue was discussed earlier in this applying this rule, the first thing that you need to
chapter and is not elaborated further in this do is to determine the number of “useful obser-
section. The key point to remember, however, is vations” in your dataset. As stated earlier, this is
that at the IHI we define a trend statistically as determined by subtracting the number of data
five or more consecutive data points constantly points on the median from the total number of
going up or constantly going down (Lloyd, 2010; data points. Once you have determined the number
Provost and Murray, 2011). This choice, which of useful observations in your dataset, you refer
is at odds with other decision rules on trends to a table (TABLE 8-1) to identify the minimum
(see for example Pyzdek, 1990; Western Electric and maximum number of runs that should be
Company, 1985) is based on our practical expe- observed statistically for each respective amount
rience of applying run chart rules to healthcare of data that is defined as useful observations.
194 Chapter 8 Understanding Variation with Run Charts

Run Chart Median


90

80

70

60

50
Visits

40

30

20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

FIGURE 8-5A Run chart with 13 runs

Run Chart Median


90

80

70

60

50
Visits

40

30

20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

FIGURE 8-5B Run chart with 13 runs and two downward trends

The table most commonly used is one developed this would be a statistical signal that your data
in 1943 by Swed and Eisenhart. If the number of exhibit a nonrandom pattern.
runs on your chart falls between this minimum To properly use Table 8-1 you:
and maximum number of runs then you do not
have a nonrandom pattern. For example, if you ■■ First calculating the number of “useful ob-
had 23 useful observations Table 8-1 indicates servations” in your dataset. This is done by
that the minimum number of runs should be 7 subtracting the number of data points on the
and the maximum number of runs should be 17. median from the total number of data points.
If your run chart had less than 7 (i.e., too few ■■ Then, find this number in the first column.
runs) or more than 17 runs (i.e., too many runs) The lower number of expected runs for this
How Do I Analyze a Run Chart? 195

TABLE 8-1 Table to determine whether there are too many or too few runs on
a run chart

Number of Useful Lower Number of Upper Number of


Observations Expected Runs Expected Runs

10 3 9

11 3 10

12 3 11

13 4 11

14 4 12

15 5 12

16 5 13

17 5 13

18 6 14

19 6 15

20 6 16

21 7 16

22 7 17

23 7 17

24 8 18

25 8 18

26 9 19

27 10 19

28 10 20

(continues)
196 Chapter 8 Understanding Variation with Run Charts

TABLE 8-1 Table to determine whether there are too many or too few runs on
a run chart (continued)

Number of Useful Lower Number of Upper Number of


Observations Expected Runs Expected Runs

29 10 20

30 11 21

31 11 22

32 11 23

33 12 23

34 12 24

35 12 24

36 13 25

37 13 25

38 14 26

39 14 26

40 15 27

Excerpted from: Swed, F. and Eisenhart, C. (1943) “Tables for Testing Randomness of Grouping in a Sequence of Alternatives.” Annals of Mathematical
Statistics. Vol. XIV, pp. 66–87, Tables II and III. This is a segment of a longer table that goes from 10 to 60 useful observation.

number of useful observations is found in points. Table 8-1 indicates that for 24 useful
the second column. data points we should find 8 to 18 runs if the
■■ The upper number of expected runs can chart displays only random variation. Figure 8-6
be found in the third column. If the num- has 21 runs (20 crossings of the median plus
ber of runs in your data falls below the 1 = 21 runs), which exceeds the upper num-
lower number of expected runs or above ber of expected runs of 18 runs thus signaling
the upper number of expected runs then the presence of a nonrandom pattern. This is
this is a statistical signal of a nonrandom a signal that the number of home care visits
pattern. reflects a nonrandom pattern that is unstable
and therefore unpredictable. The team needs to
FIGURE 8-6 shows a run chart with too many
investigate why the number of home care visits
runs. The chart has 25 data points with 1 data
keeps fluctuating so much.
point on the median so there are 24 useful data
How Do I Analyze a Run Chart? 197

Number of Home care Visits


25

20

15
Number

Median = 11
10

5
Data point on the median

0
M T W Th F M T W Th F M T W Th F M T W Th F M T W Th F
Day of the week

FIGURE 8-6 An example of too many runs on a run chart

Run Chart for Inpatient Falls by Month


70

60

50
Number of Falls

40
Median = 39.5
30

20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Month

FIGURE 8-7 An example of too few runs on a run chart

The other aspect of Rule 3 (i.e., too few runs) if you have several runs on a chart that contain
is shown in FIGURE 8-7. This run chart contains six or more data points indicating shifts in the
24 data points with no data points on the median, data, then this will reduce the opportunity to
which gives us 24 useful observations. If we look have more runs on the chart and increase the
up 24 useful observations in Table 8-1 we see probability of having too few runs. In the case
that if the data exhibited just random variation of Figure 8-7, half the data are hanging on one
we should observe between 8 and 18 runs but side of the chart and the other half of the data
we detect only 2 runs. Did you also notice that have shifted (for some unknown reason at this
Rule 1 is present in Figure 8-7? There are two point) above the median. The challenge for the
shifts in the data. Both contain runs with 12 data team is to determine why the number of inpatient
points. The occurrence of Rule 1 and Rule 3 will falls has shifted upward to a new (and probably
often be observed together. The reason is that less desirable) level of performance. Did they
198 Chapter 8 Understanding Variation with Run Charts

institute a change between month 12 and 13? As have a high and a low data point they basically
an improvement advisor to this team I would ask balance each other out.6 We are looking for one
them to annotate the run chart and show where point that is egregiously different from the rest.
they made changes in the process so we could Now what do we do with this data point? You
see if in fact their intervention(s) have caused have several options:
the shift. If their actions do not coincide with 1. First, I would investigate whether
the shift then we would need to explore other
this data point is based on the same
possible cause for the shift. operational definition as the other
data points. Someone new to the team
Rule 4: An Astronomical Data who was collecting the data might
have and included (or excluded) ob-
Point servations that were not included (or
This is not a statistical rule but an observational excluded) in the rest of the data. It is
rule that serves as a filter to guide further in- also possible that the data collection
vestigation and dialogue. I refer to this as the process was different for data point 24.
interocular test of significance. An astronomical So when you see an extreme data
data point is one data point that is dramatically point before jumping to conclusions
different from the rest. Look at FIGURE 8-8A for check the operational definition and
an example of an astronomical data point. Data data collection plan that was used
point 24 on the far right of the run chart clearly during the week in question.
looks like it is very different from the field of 2. If you are satisfied that a different
data points lower on the chart. All you have operational definition and/or the data
to do is look at the chart and consensus would collection plan used during week 24
most likely be that data point 24 is different. The was not different from that used for
key to applying this rule is that it relates to only the rest of the data, then it is time
one data point. In every dataset, you will have a to place all 25 weeks of data on a
high data point and a low data point or possibly Shewhart chart and see whether the
several high and several low points. These are data for week 24 exceeded the upper
not astronomical data points. In fact, when you control limit (UCL). FIGURE 8-8B

25
What do you think about this data point?
Number of Cancelled Appointments

Is it astronomical?
20

15

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Week

FIGURE 8-8A An astronomical data point


How Do I Analyze a Run Chart? 199

25
The Shewhart chart confirms that this is an

Number of Cancelled Appointments


astronomical data points since it exceeds
20 the upper control limit

15

UCL
10

LCL
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Week

FIGURE 8-8B Verifying an astronomical data point on a Shewhart chart

shows how this would be done. not investigate this data point and
Placing the data on a Shewhart chart take steps to extricate the factors that
will provide clear confirmation as to caused this astronomical data point
whether this data point was astro- to occur then they can expect to see
nomical or not. More will be said the point pop up at some point in the
on the use of control limits in the future. No one can predict exactly
next chapter. when this will happen again. But
3. Finally, the team needs to conduct because the conditions that caused
a root cause analysis and determine it initially would still be inherent in
what the conditions were during week the process or system (Reason, 2003)
24 that led to a much higher number you can count on it happening at
of cancelled appointments. If they do some point in the future.

CASE STUDY #1: Hand Hygiene Compliance


As improvement advisor to a team trying to improve hand hygiene compliance you told them that
at the next meeting you would show them how to make a run chart and determine if their recent
interventions have made a difference in the key indicator, percentage of compliance with proper hand
hygiene. The indicator is defined as the percentage of compliance with proper hand hygiene by week
where the numerator is the total number of properly completed hand washings and the denominator
is the total number of hand washing observations performed in the week.
The data the team collected each week were obtained through observations of a stratified
random sample of staff on three different units. The data along with the initial run chart you created on
your laptop in the meeting are shown in FIGURE 8-9.

(continues)
200 Chapter 8 Understanding Variation with Run Charts

CASE STUDY #1: Hand Hygiene Compliance (continued)

Percent Compliance
90

85
Percent Compliance

80

75

70
Median = 81

65
How many runs on this chart

60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Week

Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Percent
Compliance 79 82 86 84 85 79 77 86 82 74 85 74 78 83 81 81 74 84 78 75 74 68 81 84 70 85 77

FIGURE 8-9 Hand hygiene compliance data for 27 weeks and initial run chart

Even though you made the run chart on your laptop using SPC software, you provide all the team
members with a paper copy of the chart so you can:
■■ Walk the team through the calculation of the median and make sure they understand how it
was obtained using the formula to find the median position [(n = 1)/2]. In this case, 27 data
points plus 1 = 28 divided by 2 = 14 which is the median position or the location of the median
in the array of data.
■■ Then you have them all slide a piece of paper down the chart to reveal the data points in
descending order.
■■ Because the formula indicated that the median position is 14 you have them stop revealing the
data points when they arrive at the 14th point. They then discover that there are actually three
data points at this location.
■■ You then have them draw a line through the points at the 14th position and have it intersect the
vertical (y axis). This line points to the median value of 81. You have the team members confirm
this by referring to the data in Figure 8-1 where they find that there are in fact three data points
at the value of 81% compliance.
■■ The next question for the team is, “How many runs are there on this chart?”
■■ You ask them all to determine the number of runs and when they have a number to shout it
out. Someone says “14” then another says “15” and after a period of contemplation another team

(continues)
How Do I Analyze a Run Chart? 201

CASE STUDY #1: Hand Hygiene Compliance (continued)

Percent Compliance
90
3 data points on the median

85
Percent Compliance

80

75

70
Median = 81

65
15 runs
60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Week

FIGURE 8-10 Number of runs for the hand hygiene compliance run chart

member says “16.” So now the learning begins. You describe the definition of a run, which you
had covered at the last meeting, then show them the two methods for determining the number
of runs (i.e., draw circles around the clusters of dots on either side of the median or count the
number of times the data line crosses the median line and add 1). Both methods should produce
the same number of runs. After a little reflection the team arrives at consensus: 15 runs as shown
in FIGURE 8-10.
■■ Now that the team members know the number of runs on the chart the next step is to have the
team analyze the run chart and determine whether any of the run chart rules are detected. Note
that as the improvement advisor to the team your job is not to analyze the chart for them. The
data are their data not yours. Your job is to provide guidance not answers.
■■ You review the four run chart rules with the team (that you printed on the back of the chart
you gave them) and ask them to interpret the chart. FIGURE 8-11 shows the result of the
analysis. A downward trend was detected from weeks 18–22, then random variation was
observed after week 22. Even though the team had put in place a hand washing
campaign during weeks 16 and 17 they realized that this was prior to the start of summer
vacations and the rotation of the resident doctors. So although the results were not
what the team expected they did gain great insights about factors affecting their
improvement efforts. Now, the next question is, “What are you going to do to make sure
the process works reliably during the next vacation period and when residents come on
the scene?”
202 Chapter 8 Understanding Variation with Run Charts

CASE STUDY #1: Hand Hygiene Compliance (continued)

Percent Compliance
90
A downward trend

85
Percent Compliance

80

75

70

65 Apply the rules and interpret the chart.


Note: 27 data points with 3 on the median gives you 24 “useful observations.”
For 24 useful observations you expect between 8 and 18 runs.
60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Week

FIGURE 8-11 Interpreting the hand hygiene run chart

▸▸ A Few Closing Thoughts start to plot it. At first this can just be a simple
line graph. When you get four or five data points
on Using Run Charts you can place an initial median on these data to
see how the data are centering around a common
As you gain more knowledge of and experience point but I find it useful to refer to these early
with run charts there are a few additional things medians as “trail medians” because the median
to think about. Specifically the following questions may change until you obtain sufficient data
are the ones I get asked most often: that you have a reasonably stable distribution.
■■ Is there a minimum and maximum amount This stability will start to occur with a median
of data I can put on a run chart? when 10 or more data points are on the chart
■■ When do I change the median on a run chart? (Provost and Murray, 2011). At the other end of
■■ Why do I need to annotate the run chart? the spectrum is the question of the maximum
amount of data that should be placed on a run
chart. You can easily place a lot of data on a run
Is There a Minimum and chart especially if you use a computer. In general,
Maximum Amount of Data I Can however, once you get upwards of 40–50 data
points you have exceeded the practical design
Put on a Run Chart? of what a run chart is trying to convey.
At the beginning of this section I mentioned that Two issues will influence the amount of
as soon as you start collecting data you should data you should consider placing on the chart.
A Few Closing Thoughts on Using Run Charts 203

First is the issue of current versus old data. I have used with teams that have proven
Remember that the primary use of a run chart to be practical yet useful. You will have to
(or Shewhart chart) is to help you understand discuss with your team what seems to be
current performance. So when you place data reasonable for your work.
that go back 2, 3, 4, or more years on a chart • If you have organized your data into
these are historical data not real-time data. monthly subgroups it would be desirable
Wheeler and Chambers refer to charts like to have approximately 10–12 months as
this as report card charts. They point out that the baseline. However, you may need
although the chart may be valid and constructed to make compromises. For example,
properly it is a “weak use of the charts” (1992, when I was setting up the measurement
p. 18). The second issue that will influence the system for our work with the Scottish
amount of data you place on a chart depends Patient Safety Program (SPSP), I made
on how you have organized the data (e.g., by a practical decision because many of
patient, day, week, or month). How you have the indicators had not been collected
organized your data, referred to as the subgroup, before at the 42 hospitals. So, we settled
will provide different amounts of data for the on collecting the first 6 months of data
chart. For example, if you are tracking patient for newly developed indicators and
wait time to see the doctor in a family practice calling this the baseline. Not ideal but
clinic that sees approximately 50 patients a day practical.
you will end up with over 200 data points in a • If you have sufficient data to organize
week if you recorded every patient’s wait time your data into weeks as the subgroup,
to see the doctor. This is way too much data. If then having a baseline of 10–12 weeks
you make the subgroup day instead of every would provide an acceptable baseline
patient, however, and select a sample of patients (e.g., the percentage of newly admitted
each day (e.g., 5–10) then you will have a more inpatients properly assessed for pressure
manageable amount of data to place on a chart. ulcers each week).
• If you have organized your data into
daily subgroups then 12–15 days of
When Do I Change the Median on data could serve as a baseline (e.g.,
a Run Chart? the number of food trays produced
each day).
The answer to this question is essential in order • If you have organized your data by
to decide whether the change a team has made individual patients, tests, surgeries, or
to a process or the system actually made a dif- some other discrete unit that is essen-
ference. Here are the steps to follow in order to tially an n of 1, then I would build a
answer this question: baseline with roughly 25–30 individual
■■ Establish a baseline for the indicator of in- occurrences (e.g., the wait time for each
terest. Ideally it would be nice to have a solid patient to see the doctor at the family
baseline that established the performance practice clinic or the turnaround time
of the indicator prior to any intervention. in minutes for a stat medication order).
Again, the number of data points in this ■■ Once you have a baseline, calculate the median
baseline will depend on how you have or- for this block of data. Then you “freeze” the
ganized your subgroups (i.e., the labels on median for the baseline and extend it into
the horizontal axis of your run chart) and the future as a reference marker. Ideally the
your data collection plan. The following are baseline period is the time when the process
not strict recommendations based on hard was functioning without any improvement
and fast statistical rules but rather guidelines work occurring. At the end of the baseline
204 Chapter 8 Understanding Variation with Run Charts

period you would draw a vertical line on for the new data. In Figure 8-12c, we
the run chart to show that this is the point see that not only has the median shifted
at which the team began to test new ideas downward but also the variation in the wait
that they believed would improve the per- time to see the doctors has been reduced
formance of the indicator being tracked. The (i.e., the dots are all closer together). This
vertical line on the chart provides a visual actually indicates that two improvements
demarcation point between the old way and in the process have occurred: (1) a shift in
the improvement journey. The question is, the median in the desired direction and
what will happen to the indicator after the (2) a reduction in the variation.
team tested new ideas for improvement. ■■ If the team planned further improvements,
The new data will be plotted against the this sequence of freezing and extending the
baseline median. The run chart rules will second median, making the next intervention,
then be applied to the new data using the and then plotting the new data against the
baseline median as the reference point. This second frozen median would be the strat-
is shown in FIGURE 8-12A. egy. If after making the next improvement
■■ FIGURE 8-12B shows how the new data after intervention the data did not demonstrate a
the intervention performed against the shift or trend then we would conclude that
baseline median that was extended. Not only the intervention did not have the expected
did six data points stay below the baseline impact on the process. The team would then
median indicating a shift in the data but need to rethink its strategy. Was the idea the
the remainder of the new data also stayed wrong idea? Did we have the wrong theory
below the baseline median. This not only about how this idea would affect the process?
indicates that there has been a downward Did we collect the data in the same way we
shift in the wait times but also that this shift did previously? All these questions would
has been sustained. be part of the team’s learning based on the
■■ FIGURE 8-12C shows the final step in calculat- results they observed.
ing a second median. Once the nonrandom ■■ In summary, the sequence for creating a
pattern (Rule 1: a shift) has been detected it new median is to establish a baseline, create
is now appropriate to create a new median the initial baseline median, freeze it, and

Wait Time to See the Doctor


30
Baseline
25
Wait Time (in minutes)

20

15
Freeze the Median, extend it and
10 compare the new process performance
to this reference line to determine if a
Intervention
5 run chart rule has been detected.

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Individual Patients

FIGURES 8-12A Freezing the median on a run chart


A Few Closing Thoughts on Using Run Charts 205

Wait Time to See the Doctor


30
Baseline
25
Wait Time (in minutes)

20

15

10
Intervention
5

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Individual Patients

A nonrandom pattern is deteced


A run of 6 or more data points below the
median is signal of a shift in the process.
Note that the rest of the data also is below the
median which shows the shift has been
sustained.

FIGURES 8-12B Plotting the new data against the baseline median

Wait Time to See the Doctor


30
Baseline
25
Wait Time (in minutes)

20 Calculate a new median on for the


process to show the improvement
15

10

Intervention
5

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Individual Patients

FIGURES 8-12C Creating a second median on new level of performance

then extend it into the future. Next, put in is, then start the sequence all over. On the
place the new idea that the team believes other hand, if a run chart rule is not evident
will make an improvement in performance. then the team needs to evaluate why the
Plot new data against the frozen median improvement idea it introduced did not
to see if a run chart rule is detected. If it have the expected impact.
206 Chapter 8 Understanding Variation with Run Charts

Why Do I Need to Annotate the or unit where the improvement work


is occurring. This is usually in a little
Run Chart? larger font (e.g., a size 14 or 16).
A challenge with any form of data analysis in • The secondary title for a chart should
this day and age is that it is very easy to push indicate the indicator by name and the
a button on a computer and get a variety of dates for the data (e.g., Percentage of
numbers, charts, and graphs. Some of these Compliance with the Ventilator Bundle,
displays of data are very intricate and colorful. January 2017–March 2018).
Some are three dimensional and some you can • Try to leave space to insert the oper-
even rotate. Yet, I think we frequently rely on ational definition for the data. If the
the machine at our fingertips and not the one chart takes up most of the page of
that rests upon our shoulders. Pushing the your handout you can always place
buttons on the computer to get statistical and the operational definition and data
graphical results is in many ways the easy answer. collection plan on the reverse side of
Computers produce data not information. The the page. If you are using slides, try
goal is to turn data into information (Austin, reducing the size of the chart a little
1983), which requires the machine on the top so you can place a small footnote (in
of your shoulders to be functioning fully and a text box) at the bottom of the slide
engaged. summarizing the operational definition
Annotating your charts is a perfect way to and data collection plan.
start moving from data to information. Items • If the indicator is a percentage or a
that should be annotated on a chart include: rate it is always a good idea to at least
provide a note (usually at the bottom of
■■ The baseline period.
the chart under the horizontal axis) that
■■ The point at which the improvement team
reports the minimum and maximum
began testing new ideas. For example,
size of the denominators. You do not
annotating every Plan–Do–Study–Act
necessarily have to include a full data
(PDSA) cycle the team ran provides a
table, which often is entirely too small
well-documented journey.
and hard to read, but indicating the size
■■ Annotating the detection of a run chart
of the denominators (min to max) will
rule (e.g., a shift or trend) on the chart is
usually be enough to keep people from
very beneficial especially when present-
asking about them.
ing your work to a quality council or in a • Finally, I find it useful to put the name
management meeting where there might
of the individual who made the chart
be individuals who are not familiar with
and contact information (in about a
run chart analysis and interpretation.
size 10 font) somewhere on the chart
Moments like this provide a good op-
in case there are ­follow-up questions
portunity for you to help educate others
about the chart or its interpretation.
in your organization about run charts.
This also comes in handy when ma-
But without annotations the key learning
terials handed out at a meeting are
points might be missed.
sent around to people who did not
■■ Besides annotating the actual run chart
attend the meeting and they would like
other useful annotations include:

to talk to someone about the chart.
Making sure you have a full and complete
title on the chart. This should include FIGURE 8-13 provides an example of applying
a primary title identifying the facility these annotation suggestions to a run chart.
Notes 207

Stay Healthy Family Practice Clinic


Wait Time to See the Doctor, June 6th through June 17th

Wait Time to See the Doctor


30
Baseline (week of June 6th)
25
PDSA#1
Wait Time (in minutes)

PDSA#3
June 13th PDSA#2
20 June 16th
June 15th

15

10
Team began PDSA testing here
5 June 13th

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Individual Patients

Operational Definition: Wait Time to See the Doctor is defined as the amount of time (measured in whole
minutes) from when the patient checks in at the reception desk until he or she is seen by the doctor in the
exam room.

Data Collection Plan: A systematic sample of 5 patients is pulled on 3 randomly selected days each week.
Data are pulled during regular office hours. Weekends and scheduled holidays are excluded.
This chart was prepared by R. Lloyd who can be reached at Extension 1234.
FIGURE 8-13 Example of an annotated run chart

Notes 2. The data points on a run or Shewhart


chart are usually not connected when you
1. There is frequent confusion over the are interested in comparing the variation
difference between the mean and the among members of a group (e.g., indi-
median. They are both measures of cen- vidual hospitals, clinics, or doctors) at a
tral tendency (i.e., they identify where a fixed point in time. For example, imagine
distribution of data is centered), but they that you are part of a nine-hospital system
are different. The mean is the arithmetic and want to see how all nine hospitals
average (you add up all the numbers compare on the percentage of patients
and divide by the total number of data being properly assessed for the risk of a
points). The median, on the other hand, fall. Are all our hospitals performing as
is the midpoint of a distribution of data a system or do we have some doing very
points. This is the numerical value that well and others not so well? In this case,
divides the dataset exactly in half, the you would collect data on each of the nine
50th percentile. The run chart uses the hospitals for 15 to 20 months and then
median as a centerline, but the control make a run chart where each hospital
chart uses the mean as the centerline.
208 Chapter 8 Understanding Variation with Run Charts

would be arranged along the horizontal cut and the rest of the loaf is actually the
axis of the run chart. The hospitals could second slice. Similarly when the data line
be in alphabetical order, for example, or crosses the median the first time you have
by bed size. The order of the units being two segments of day. Adding 1 to the final
compared would not matter for the pur- count of the number of crossing accounts
poses of comparison if the indicator is a for the two segments that were created with
percentage. Each hospital’s data would be the first crossing. You can verify this the
plotted and because this is not data over next time you cut a loaf of bread. Count
time but in the aggregate we would not the number of times you have sliced the
connect the dots with a line. Typically, loaf and you will discover that you have
connecting the dots with a line indicates one more slice of bread than the number
that this is a process that is flowing over of cuts you made.
time. The objective is to understand the 5. I think a lot of people who don’t have
variation amongst the dots irrespective formal training in statistics think that it
of whether you connect them or not. is a rather clear and definitive discipline.
An example of not connecting the dots It uses numbers, formulae, Greek letters,
is shown in Chapter 10, Case Study 1. and a variety of symbols that certainly
3. Although it is standard practice to place the make statistics and math look precise.
median on a run chart as centerline (Carey We even refer to the precision of the
and Lloyd, 2001; Provost and Murray, 2011; p-value and talk about the .05 or even
Pyzdek, 1990), some writers (e.g., Torki, .01 level of significance. But there is a
1992) advocate the use of the mean as the lot of gray area and debate in the field
centerline on a run chart. The median is of statistics. A classic reference on this
preferred by most writers because it is a topic appeared in 1970. It is called The
measure of central tendency (along with Test of Significance Controversy edited
the mean and mode) but more important by Denton Morrison and Ramon Henkel
it is not sensitive to extreme values as is (1970). If you are a student of statistical
the mean. All that matters with using the thinking this collection of readings is a
median is whether a data point is on this must. A more contemporary and critical
side of the median or the other side. If a exploration of this topic is The Cult of
data point is a small amount above the Statistical Significance: How the Standard
median or a very large distance above Error Costs Us Jobs, Justice and Lives by
the median the median value will not Stephen Ziliak and Deirdre McCloskey
be affected. This is because the median is (2011). I have been fascinated by this de-
positional (i.e., the 50th percentile point bate for years. My family knows this and
where half the data are above this point at Christmas they bought me a sweatshirt
and half the data are below it). For the that captures the ongoing debate nicely. It
same reason, the median is not affected states in simple letters on the front of the
by data that are not normally distributed sweatshirt “Statistics mean never having
(i.e., suffering from skewness or kurtosis). to be certain.”
4. I frequently get asked in class, “Why do 6. When you have a high and a low data point
we add 1 to the number of crossings?” on a chart or even a couple high and several
The answer is simple. Think of cutting a low points they essentially form the tails
loaf of bread. If you cut the loaf of bread of a distribution. Think of the normally
10 times you actually end up with 11 slices distributed bell-shaped curve. As you go
of bread. The first cut actually produces out the tails of the normal distribution
two pieces of bread; one is the end heel you expect to find less and less data, but
References 209

you will find a high and low data point Morrison, D., and R. Henkel, eds. The Test of Significance
residing in the tails of the distribution. But Controversy. Chicago: Aldine, 1970.
Provost, L., and S. Murray. The Health Care Data Guide. San
if you had only one extreme data point in Francisco: Jossey-Bass, 2011.
a distribution it would not take the shape Pyzdek, T. Pyzdek’s Guide to SPC. Vol. 1: Fundamentals.
of the normal bell curve. In this case, you Milwaukee: Quality Press, 1990.
would have a skewed distribution with Reason, J. Human Error. Cambridge: Cambridge University
one rather long tail. The astronomical Press, 2003.
Schultz, L. Profiles in Quality. New York: Quality Resources,
data point rule is similar to finding a 1994.
nonnormal curve with one very long tail. Swed, F., and C. Eisenhart. “Tables for Testing Randomness
of Grouping in a Sequence of Alternatives.” Annals of
Mathematical Statistics 14 (1943): 66–87.
References Western Electric Company. Statistical Quality Control
Austin, C. Information Systems for Hospital Administration. Handbook. Indianapolis: AT&T Technologies, Inc., 1985.
Chicago: Health Administration Press, 1983. Wheeler, D., and D. Chambers. Understanding Statistical
Lloyd, R. “Navigating in the Turbulent Sea of Data: The Process Control. Knoxville, TN: SPC Press, 1992.
Quality Measurement Journey.” Special edition on Quality Ziliak, S., and D. McCloskey. The Cult of Statistical Signifi-
Improvement in Neonatology and Perinatal Medicine, cance: How the Standard Error Costs Us Jobs, Justice and
Clinics in Perinatology 37, no. 1 (March 2010): 101–122. Lives. Ann Arbor: University of Michigan Press, 2011.
CHAPTER 9
Understanding Variation
with Shewhart Charts
The details related to these differences are dis-
▸▸ Run Charts versus cussed in the remaining sections of this chapter.

Shewhart Charts
For many teams just beginning their quality mea- ▸▸ What Is a Shewhart
surement journey (QMJ) the run chart provides
an excellent starting point. It is easy to construct Chart?
with paper and pencil, it does not require a soft-
Like run charts, Shewhart charts are graphic
ware package in order to make one, and it can
displays of process variation as it lays itself out
be used with any type of data (i.e., time, money,
over time. FIGURE 9-1 shows the basic elements of
counts of errors, percentages, rates, scores, or days
a Shewhart chart and one of the tests to identify
between adverse events). Also, the four run chart
a special cause (i.e., a data point exceeded the
rules are easy to understand and apply. So, why
upper control limit [UCL], signaling too much
would I want to use a Shewhart chart instead of a
variation in the data, which, by the way, you
run chart?1 There are basically three reasons why
should recognize as an astronomical data point
Shewhart charts are preferable over run charts:
on the run chart). A run chart and a Shewhart
1. Shewhart charts are more sensitive chart look similar in that the indicator of interest
than run charts. and its values are plotted on the vertical or y
2. Shewhart charts have the added axis and the chronological order of the data are
feature of control limits and zones, organized by what are called subgroups (e.g.,
which run charts do not have. by individual patients, by day, week, or month)
3. Shewhart charts allow us to more along the horizontal or x axis. The data points
accurately predict process behavior, are then connected by a line and the mean of the
future performance, and process data points is then plotted as the centerline (CL)
capability than do run charts. on the Shewhart chart. The presence of control

© Michal Steflovic/Shutterstock

211
212 Chapter 9 Understanding Variation with Shewhart Charts

Signal of a Upper Control


special Limit

60.0

50.0
Number of Patient Complaints

UCL=46.910
40.0 Data are plotted
in time order
30.0
CL=23.381
20.0 Centerline
(the mean)
10.0

0.0 LCL=0.148
Lower Control
-10.0 Limit
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Week

The unit of time is plotted along the horizontal axis

FIGURE 9-1 Elements of a control

limits on Shewhart charts are major points that data point actual values. By using the mean we
separate it from a run chart. are ensuring that the absolute value and the
Shewhart charts are more sensitive than distance of each data point from the CL will be
run charts because the run chart cannot detect considered in determining the variation in the
special causes that result from point-to-point indicator and if special cause variation exists.
variation. This is because the CL on the run Another reason why Shewhart charts are
chart is the median (i.e., the 50th percentile). more sensitive than a run chart is that Shewhart
The run chart basically allows you to classify charts have the added feature of control limits,
the data points as being only above or below which run charts do not have. The control
the median. The actual distance a data point limits are properly referred to as the UCL and
is from the CL is not an issue on a run chart. the lower control limit (LCL). They are also
Therefore, if one data point is 2 units above referred to as sigma limits. You will probably
the median and another point is 22 units hear someone refer to control limits, however,
above the median, they will both be treated as confidence intervals, confidence limits, or
the same because they are both on the same even standard deviation (SD) limits, which they
side of the median. The logic for this decision are not (Blalock, 1960; Carey, 2003; Daniel &
is related to the definition of the median and Terrell, 1989; Provost & Murray, 2011).
of a run (i.e., one or more data points on the The UCL and LCL basically define the
same side of the median). If these same two boundaries of process variation around the
data points (i.e., 2 and 22) were placed on a mean. The developer of the chart does not set or
Shewhart chart, however, you would notice a define the UCL and LCL. These are determined
discernable difference because the CL on the by mathematical formulae and the width of these
control chart is the mean or average of all the limits is dependent on the inherent variation
What Is a Shewhart Chart? 213

that lives within the data. The only thing the UCL is 47 minutes, the lower control limit is
developer of the chart can place on the Shewhart 23 minutes and there are no special causes
chart is a target or goal and annotations as to detected. This means that the process is a stable
when improvements were introduced. and predictable. Therefore, if we do nothing
The control limits enable the Shewhart to change how this process works we can
charts to have increased precision over the run predict that patients will wait on the average
chart. A run chart will miss certain nonrandom 35 minutes with the possibility that the wait
patterns that would be detected on a Shewhart time could go up as high as 47 minutes or as
chart as special causes. According to Perla, low as 23 minutes. In light of the target of
Provost, and Murray (2011, p. 47), “The three having all patients seen by their doctor within
probability-based (run chart) rules are used to 20 minutes or less, however, you can see that
objectively analyze a run chart for evidence of we have our work cut out for us!”
non-random patterns in the data based on an This scenario provides a summary of how
α error of p < 0.05.” This means that run charts process capability for the wait time in a clinic
could miss a nonrandom pattern in the data can be based on the parameters calculated for a
approximately 5% of the time. Shewhart chart Shewhart chart (i.e., the UCL, LCL, and mean).
rules, on the other hand, will not miss detecting Classically, process capability is defined as, “The
a special cause. This is why it is recommended calculated inherent variability of a characteristic
that the terms special and common cause as well (indicator) of a product or service. It represents
as stable or unstable should be reserved for use the best performance of the process over a period
only with Shewhart charts and that the terms of stable operation” (ASQ, 2005, p. 78). Process
random and nonrandom patterns be applied capability is essentially aimed at determining
to run charts. whether under current operating conditions the
Shewhart charts also allow us to more process can meet the predetermined specifications
accurately predict process behavior and future or achieve the target or goal we have established
performance than do the run charts. On a (Blank, 1998; Carey, 2003; Kume, 1985; Provost &
run chart, if the variation is random the best Murray, 2011; Western Electric, 1985; Wheeler &
prediction of the future performance of an Chambers, 1992).
indicator is the median value. For example, Besides a verbal summary of the Shewhart
if a team is trying to improve the wait time chart parameters using the UCL, LCL, and mean
to see a doctor and have plotted the data on as described previously, process capability can
a run chart the median is the best estimate of also be defined statistically by “a single number
future performance. Let’s say that the median assessment of the ability of the process to meet
wait time is 27 minutes. If you were present- specification limits on the quality characteris-
ing this data to a team or a committee all you tic(s) of interest (ASQ, 2005, p. 78). When you
could say would be, “Ladies and gentlemen, the move to the statistical indices that capture pro-
median wait time is 27 minutes. The process cess capability it is necessary to have an upper
reflects only random variation. Therefore, if specification limit (USL) and a lower specifi-
we do nothing to change the current process cation limit (LSL), which are then compared
we can expect to have patients wait about 27 to the performance of the process as defined
minutes to see the doctor.” On a Shewhart by the UCL, LCL, and the mean.2 Although
chart, however, because we have the UCL, these indices have not been used extensively
LCL, and the mean as the CL, we have more in healthcare settings I believe that they have
precision. In this case, when you present the great utility. We have many physiological tests
data to the team or a committee you would be that have upper and lower preferred levels of
able to say, “Ladies and gentlemen, the average performance (i.e., ­specification limits). These
wait time to see a doctor is 35 minutes, the include such indicators as temperature, blood
214 Chapter 9 Understanding Variation with Shewhart Charts

pressure, hematocrits, neutrophils, white and the interpretation of the chart and what can be
red blood cell counts, platelets, and clotting learned from it must come from the dialogue
factors.3 that emerges when people with subject matter
There are many useful books and articles on knowledge interpret the chart. This requires
the statistical theory behind Shewhart charts, how knowledge not a computer.
to construct them, and how to interpret the results.
I have provided only a brief introduction to the
key principles behind the Shewhart charts. This is
a very rich field of study that has been developed
▸▸ Key Questions about
over the past 100 years. Readers interested in Shewhart Charts
the detailed aspects of statistical process control
(SPC) and in particular Shewhart charts should There are three basic questions people typically
consult the rich variety of books and articles on ask as we start them on the road to using She-
this topic. Ones I have found particularly useful whart charts:
include Benneyan (2001); Benneyan, Lloyd, and 1. How many data points do I need to
Plsek (2003); Blank (1998); Carey (2003); Carey make a Shewhart chart?
and Lloyd (2001); Duncan (1986); Ishikawa 2. What is a sigma limit? And, why do
(1989); Mohammed, Worthington, and Woodall I need three of them?
(2008); Montgomery (1991); Provost and Murray 3. Do I apply the run chart rules to
(2011); Pyzdek (1990); Western Electric (1985); Shewhart charts?
Wheeler (1993, 1995); Wheeler and Chambers
(1992); and Woodall (2006). Each of these is discussed next.
If you wish to build a firm foundation in
Shewhart charts and SPC in general, I would How Many Data Points Do I Need
recommend that you read widely on this topic
and read what different authors have written. If to Make a Shewhart Chart?
one item you read seems too academic or math- As soon as the team begins to gather data they
ematical, read another author’s description and should start plotting the data points (dots) on a
use of SPC. As you read more of the literature chart. At first this will simply be a line graph. A
and different authors at some point there will be run chart requires less data because the median
a moment when you say, “Okay, I get it.” Fur- as the CL is not as sensitive to point-to-point
thermore, if you do not have a reasonably solid variation as is a Shewhart chart. Also the run
working knowledge of the theory and mechanics chart rules start to come into play with different
of Shewhart charts and how they are constructed, amounts of data. The trend rule can be detected
it will be rather difficult to successfully apply when you have five or six data points. As Provost
them to your improvement work. This becomes and Murray point out (2011, p. 87) “a trend
even more problematic when people say “No will remain a trend no matter the amount of
problem with the charts. We have software that additional data added to the graph.” The run
makes the charts for us.” This orientation creates chart rules related to a shift and too many or
several problems. Although it is easy to push too few runs, however, require more data to be
a few buttons on your computer and “make a detected. The general rule is that a minimum of
chart” this does not necessarily mean it is the 10 data points is necessary to properly determine
most appropriate chart for the indicator you are whether a shift has occurred on the run chart or
tracking. More important, the SPC software does whether too many or too few runs are present.
not help at all with interpreting what the chart When we move to using Shewhart charts more
is trying to tell you. The chart can come from data are usually required because (1) the mean
the machine in front of you with a keyboard but is now used as the CL and the absolute value of
Key Questions about Shewhart Charts 215

any data point enters into the calculations of the pulled from the process when it is stable
UCL and LCL, and (2) the rules for detecting and predictable.
special causes of variation are more rigorous ■■ If you have less than 20–30 subgroups of
and precise than the four rules for run charts. data you can still create a Shewhart chart but
But, the simple answer to the question of how the UCL and LCL should be referred to as
much data I need to make a Shewhart chart “trial” control limits (Carey, 2003; Carey &
is . . . it depends. I know some readers will Lloyd, 2001; Provost & Murray, 2011).
be thinking, “What kind of a lame answer is The trial limits can be used for learning
this? Just tell me how many data points I need but the use of the word “trial” is to remind
to make a Shewhart chart!” Because there are those using the chart that these limits may
many types of Shewhart charts, which are dis- change as more complete data are obtained
cussed in the next section, it must be realized and make the limits more reliable and stable.
that the different charts can be produced with The issue here is that when you have less
differing amounts of data. The subgroup, that than the recommended amount of data
is how you have organized your data along (i.e., 12–15 data points) the control limits
the x or horizontal axis of the chart, is key and CL (i.e., the mean) can change rather
to determining how much data you need to quickly and dramatically with the addition
make a particular type of Shewhart chart. For of each new data point. With a fewer num-
example, if you want to track the wait time of ber of subgroups you also run the risk of
each patient at a family practice clinic to see the committing a type II error (i.e., concluding
doctor then the subgroup is one patient and the that the chart indicates no special causes
one bit of data for this patient will be her wait when in fact one or more special causes do
time to see the doctor. If, on the other hand, exist). When you start to have more than
you decide that you want to track wait time by the recommended 20–30 subgroups of data,
day then the horizontal axis of your chart will say 40–50, you run the risk of committing
have Monday, Tuesday, Wednesday, etc. rather a type I error (i.e., finding special causes
than patient 1, patient 2, patient 3, etc. as the by chance alone). Additional detail on the
subgroup. Selecting day as the subgroup for theory and use of the type I and type II
a clinic could now provide upwards of 30–40 error concepts can be found in Carey and
patients’ wait times as possible observations (bits Lloyd (2001), Carey (2003), and Provost and
of data) within a single day. Having multiple Murray (2011). In summary, the underlying
data points in a subgroup or only one will play question here is how much data do you
a major role in deciding which Shewhart chart need to create a reasonably stable distri-
you can make. This is why it is very important bution? Different disciplines recommend
to make sure you have a well-thought-out data different amounts of data needed to form
collection plan. Again, more will be said about a distribution (e.g., from only a few data
these issues in the next section when I discuss points to over 500) but generally speaking
the types of Shewhart charts. a reasonably stable distribution of data for
All this being said, I do know that many improvement purposes occurs when you
people still want to have at least some general have 20–30 subgroups of data.
guidelines for organizing their data, so here are ■■ As a general rule I also recommend not
a few that I offer to improvement teams as we using quarterly data for your improvement
begin to work on developing Shewhart charts:4 efforts. There is just too much variation
being aggregated in quarterly data to be
■■ It is usually recommended that you have useful for improvement efforts. A quarter
20–30 subgroups of data before construct- consists of 3 months approximately 90 days
ing a Shewhart chart. These data should be and over 2100 hours. During this time a
216 Chapter 9 Understanding Variation with Shewhart Charts

great deal of variation can occur. So when (i.e., the normal bell-shaped curve), skewed to
someone begins making conclusions about the left, or skewed to the right in which case the
the quarterly average or SD you should ask distribution will have a tail that is longer on one
them to provide the actual variation that side than the other. A distribution with a long
produced these summary statistics by day, tail skewed to the right will have a mean that is
week or at a minimum by month. greater than the median whereas a left-skewed
distribution will have the mean be less than the
median. Kurtosis, on the other hand, refers to
What Is a Sigma? And Why Do how spread out or peaked the distribution is.
For additional details on measures of central
I Need Three of Them? tendency, dispersion, and distribution shape
These two questions probably pose the most you can consult any basic stat book. Some of
challenging technical aspects of Shewhart chart the books I have on this topic come from my
construction. Some of you will really enjoy this undergraduate days and are just as relevant as a
issue and want to learn more whereas others statistics book published last year (Blalock, 1960;
of you will say, “I really don’t care about this Daniel & Terrell, 1989; Gonick & Smith, 1993;
statistical distinction just make the chart that is Levine & Stephan, 2005). These principles are
most appropriate for my indicator and tell me fundamental and have not changed over the years.
what it means.” Either position is fine. I do not FIGURE 9-2 provides examples of d­ istributions
intend to go into great detail on this topic but with different characteristics of center, dispersion,
I do want to frame it properly so that you can and shape. Note that the normal (or Gaussian)
decide if you want to learn more or accept the distribution, which is popularly referred to as the
fact that these statistical principles have been bell curve, is typically not found in the real world
discussed, debated, and written about extensively of data collection and analysis. In a t­ heoretical
for many decades. normal distribution, the mean, median, and
Let’s start with the basics. Whenever you mode are all at the same position and the data
have an array of data you need to consider three are distributed randomly and symmetrically
characteristics of the distribution these data about the mean. But it needs to be pointed out
create: the central tendency of the distribution, that not all symmetrical bell-shaped curves are
the dispersion or spread of the distribution, normal (Blalock, 1960, p. 80). You can have,
and the shape of the distribution. You were for example, three normal curves that have the
acquainted with these characteristics when same SDs but different means. Similarly, you
you took your first statistics class, which was could have several curves that have the identical
probably a number of years ago. So, this should means but very different SDs that in turn create
all sound rather familiar even if you have not different shapes for the distributions.
used the concepts in a while. Measures of central It is important that you have a comfort level
tendency include the mean (i.e., the arithmetic with the characteristics of distributions so that
average), the median (i.e., the midpoint of the you can more fully understand the character-
distribution or 50th percentile), and the mode istics of the data you have collected and their
(i.e., the most frequently occurring number). potential limitations. Although the Shewhart
Measures of dispersion include the minimum charts can accommodate both normally and
and maximum values, the range (i.e., the absolute nonnormally distributed data (Wheeler, 1995)
difference between the min and max values), having knowledge of the data you have gathered
the sum of the deviations, the mean deviation, is the first step toward creating and interpreting
the sample variance, and the SD. The shape of Shewhart charts.
a distribution is determined by skewness and With a few of the basics about distributions
kurtosis. A distribution can be symmetrical in hand it is now time to address in a little more
Key Questions about Shewhart Charts 217

FIGURE 9-2 Examples of distributions with different centers, spreads and shapes

detail the topic of sigma limits. As was mentioned Chambers, 1992, p. 60). Wheeler also
earlier in this chapter, the UCL and LCL are prop- points out that the common dispersion
erly referred to as sigma limits or alternatively, statistic for a distribution (i.e., the SD)
estimates of the SD (Provost & Murray, 2011, needs to be converted into sigma units by
p. 115; Wheeler, 1993, 1995). Although some the use of specific formulas. He concludes:
writers will call the UCL and LCL SD limits “By shifting from measurement units (i.e.,
and not even reference the term sigma, I think SD of a distribution) to sigma units, it is
it is important to use the term sigma to refer to possible to characterize how much of the
the estimates of variation in a Shewhart chart data will be within a given distance on
rather than SDs for several reasons: either side of the average. Thus, the sigma
■■ This is how Dr. Shewhart (1980, 1986) units express the number of measurement
originally described the limits on the charts units which correspond to one standard
he developed. unit of dispersion” (p. 61)
■■ The SD of a distribution is calculated dif- A final point to acknowledge is that if you
ferently than a sigma. The SD is a single ­calculate the SD of a distribution using the
number that represents the average distance traditional formula that you will find in
any individual data point in a distribution many software packages, multiply this
is from the mean. It cannot be a negative number by 3, and then add and subtract this
number and it will go, theoretically, from value (i.e., 3 SDs) to the mean, you will get
zero to a rather large positive value (but the incorrect UCL and LCL for a Shewhart
generally speaking the SD usually does chart. This becomes even more important
not go much beyond double digits). A when you realize that each type of Shewhart
sigma unit, on the other hand, is a “mea- chart has its own formula to compute sigma
sure of scale for the data” (Wheeler and values and that none of these formulae use
218 Chapter 9 Understanding Variation with Shewhart Charts

the traditional SD formula for a sample based standard deviation is used to describe the
on the following formula: units of variation on a Shewhart chart or the
UCL and LCL in healthcare settings con-

∑ (x − x ) fusion occurs conceptually and statistically.


n 2
i
Sx = i =1 I would strongly encourage you, therefore,
n −1 to use the proper terms when constructing
and explain Shewhart charts. 5
n = The number of data points
–x = The mean of the xi Now that we know a little more about what
xi = Each of the values of the Data a sigma is, the next question is, “Why do we need
three of them?” The answer to this question is
Note that if you were calculating the SD for found partially in statistical theory and partially
a defined population the formula would in practicality. According to Wheeler (1995, p. 14)
use N as the denominator (i.e., the total Shewhart’s use of 3 sigma limits (i.e., three above
number of observations in the population the mean and three below the mean for a total of
being observed) rather than n - 1, which is 6 sigma units) as opposed to any other multiple
typically used when calculating a SD for a of sigma did not stem from any specific math-
sample. ematical computation. Rather Shewhart said
If you do not use the appropriate formula that three “seems to be an acceptable economic
for computing a particular chart’s limits, you value,” and that the choice of 3.0 was justified by
will produce limits that are too wide or too “empirical evidence that it works” Provost and
narrow. This will then lead you to make the Murray (2011) provide a succinct summary of
wrong decision about the variation in your the rationale for using Shewhart’s 3 sigma limits:
data (i.e., you will see special causes when
they do not exist and miss them when they ■■ The limits have a basis in statistical theory.
are actually present). ■■ The limits have proven in practice to distin-
■■ The third reason I prefer using the terms guish between special and common causes
sigma units and sigma limits with Shewhart of variation.
charts rather than SD units or SD limits is ■■ In most cases, use of the limits will ap-
to avoid confusion. A majority of health- proximately minimize the total cost due to
care professionals have been exposed to overreaction and underreaction to variation
the concept of a SD but not to a sigma. We in the process.
have all been in meetings where someone ■■ The limits protect the morale of the workers in
presenting data has proudly said, “We have the process by defining the magnitude of the
analyzed the data from last month for pro- variations that has been built into the process.
cedure X and discovered that the average Provost and Murray’s point about overreaction
length of stay is 4.3 days and the SD is 2.6 and underreaction to variation deserves a few more
days.” When this occurs most participants comments. There are basically two mistakes or
in the meeting either nod their heads and errors that you need to avoid when interpreting
say nothing or mumble a few words to the data. The first mistake (a type I error) is a risk
person sitting next to them about this is just of concluding that a data point requires special
what we heard last month. The SD is a very action when it is actually reflecting common cause
popular statistic presented in healthcare (random) variation. This leads to tampering with
management meetings. People hear the a process that is in fact stable and predictable.
number but most could not explain what Tampering (i.e., reacting one way a data point
it is, how it was calculated, or how to inter- then reacting another way to the next data point
pret it. But, it is a regular part of healthcare when they are part of a process that is stable and
management meetings. So, when the word predictable) leads to increases variation in a
Deciding Whether a Special Cause Is Present 219

High

The combine total risk of a Type I and


a Type II Error is minimized when 3
sigma limits (SLs) are used.
Risk

Low

+/- 1SL +/- 2SL +/- 3SL +/- 4SL +/- 5SL +/- 6SL

FIGURE 9-3 Balancing the risk of a Type I and Type II error

nonlinear manner making things worse. (See the and Leavenworth (1988), Montgomery (1991),
reference to Rule 4 of the funnel demonstration Shewhart (1931), Wheeler (1995), and Wheeler
in Chapter 7 for more detail on tampering.) Type and Chambers (1992).
I errors happen most often when you decide to
use sigma limits on a Shewhart chart that are less
than three. On the other hand, the second mistake Do I Apply the Run Chart Rules
(a type II error) occurs when you basically do to Shewhart Charts?
the opposite of a type I error. In this case, you
The simple answer to this question is no. The
would conclude that a data point indicates no
four run chart rules (a shift in the data, a trend
need for action when it fact it reflects a special
in the data, too many or too few runs in the data,
cause. Type II errors lead to under controlling
an astronomical data point) should be applied
or what Provost and Murray call underreacting.
only to the run charts. Shewhart charts have
This happens most often when you decide to use
their own rules to determine whether special
sigma limits that are wider than plus and minus 3
causes are present. These rules ae explored next.
around the mean. As Carey and Lloyd (2001,
p. 67–68) point out, “The challenge, therefore, is
to balance the risk of tampering against the risk
of under controlling. In the first case, you will ▸▸ Deciding Whether
see special causes when they do not exist, and
in the second case, you will miss special causes
a Special Cause Is
when they are present. The combined total risk
of type I and type II errors is minimized when
Present
3 sigma limits are used.” FIGURE 9-3 provides a Much of the beauty of the Shewhart charts lies
visual of how the total combined risk of two types in their simplicity. They require just enough data
of error is minimized when the limits are set at (about 20 data points) to construct a reliable chart,
+/− 3 sigma. Those of you interested in exploring are easy to read, and allow you to determine
these issues further should refer to the works very quickly whether special cause variation is
of Blumenthal (1993), Deming (1994), Grant present in your data. Shewhart charts, according
220 Chapter 9 Understanding Variation with Shewhart Charts

to Pyzdek (1990, p. 90), “are an operational defi- ■■ Irving Burr recommended using no more
nition of a special cause,” which I think is a very than Detection Rules One and Four.
appropriate way to summarize the purpose of ■■ Ellis Ott recommended the use of Detection
the charts. Shewhart (1931, p. 6) also captured Rules One, Two, and Four.
the purpose of the charts nicely when he wrote, ■■ Lloyd Nelson recommends the routine use
“A phenomenon will be said to be controlled of Detection Rules One and Four, along
when, through the use of past experience, we can with Test 3 (trends) and Test 4 (sawtooth).
predict, at least within limits, how the phenome-
non may be expected to vary in the future.” This The selection of the most appropriate rules,
acknowledges the fact that no one can predict however, should be linked to the subject matter
the exact value of the next data point. But, if you being analyzed, the types of data being collected,
understand the differences between a process and the ability of those who own the processes
being in control (i.e., merely random variation) that produce the outcomes to actually move
and out of control (i.e., detecting special causes the relevant indicators in the desired direction.
in the data) then you will be well on your way to The application of the rules for special
understanding Shewhart’s notion of prediction causes to a Shewhart chart begins by dividing
within limits. He basically argued that in order the chart into zones. The area between the cen-
to understand the variation in a process you terline (CL) (the CL or otherwise known as the
needed to move away from static and aggregated mean or average) and the UCL is divided into
displays of data and look at the process from a three equal areas or zones. Because the control
more dynamic view by plotting the data over limits are referred to as sigma limits, each zone
time and understanding the inherent variability is the equivalent of 1 sigma. The area from the
in the process. Figure 7-3 (Chapter 7) depicts CL to the LCL is divided in a similar manner.
what Shewhart was recommending. These zones are labeled C, B, and A, respectively
For decades the Western Electric Statistical and emanate outward from the CL. FIGURE 9-4
Quality Control Handbook (1985) has served as provides an example of how a Shewhart chart
the standard reference for the special cause rules. is divided into six zones. The creation of zones
In fact, in many circles and even in several SPC is a very simple process that can be achieved
software packages the rules are frequently referred easily with any reputable SPC software program.
to as the “Western Electric tests for detecting A natural or random pattern of data will
special cause.” Although there are dozens of tests bounce around across the zones, between the
or rules to detect special causes, most experts in UCL and LCL, and include the following three
the field of SPC maintain that only a few of the characteristics:
tests are essential for a basic understanding of
what the charts are trying to tell you. Wheeler
Note: Each zone is equal to 1 sigma
(1995) and Wheeler and Chambers (1992) provide
excellent summaries and critique of the Western
UCL
Electric rules and the variations that have been Zone A +3 SL
proposed by leading SPC experts. Consider the Zone B +2 SL
Measure

following passage from Wheeler (1995, p. 139) Zone C +1 SL


X (CL)
on this issue: Zone C -1 SL
Zone B -2 SL
■■ Shewhart used Detection Rule One. Zone A -3 SL
LCL
■■ David Chambers remarked that “No data
set could stand up to the scrutiny of all of
Time
the detection rules in the Western Electric
Handbook.” FIGURE 9-4 Dividing the Shewhart chart into zones
Deciding Whether a Special Cause Is Present 221

■■ Most of the data points are near the CL A 3 sigma violation


■■ A few of the data points spread out and
approach the UCL and LCL
■■ None of the data points (or at least only a
UCL
very rare and occasional point) exceeds the A
control limits (Western Electric, 1985, p. 24) B
C
A natural pattern or random distribution of data CL
will exhibit these three characteristics simulta- C
neously. One of the first signals that a process B
has special causes, therefore, is the absence of A
LCL
any one of these characteristics.
Because these rules for detecting special
causes have grown primarily out of industrial A 3 sigma violation
and manufacturing applications, however, we
need to evaluate them in light of which rules FIGURE 9-5 Rule #1: A single data point that
are most appropriate in health, education, and exceeds the upper or lower control limit
social services settings. We have done this at
the Institute for Healthcare Improvement (IHI) statistical way to determine whether, in fact, it is
with our colleagues from Associates in Process astronomical. This is the only test that Shewhart
Improvement (API)6 and decided that five of the used to identify special causes and the reason
rules for detecting special causes on a Shewhart why Wheeler (1995) stated that “Shewhart used
chart are most appropriate for these disciplines. Detection Rule One.” Some texts refer to a single
The five IHI/API rules for detecting special causes data point that exceeds 3-sigma as a “freak” point
on a Shewhart chart are: (Pyzdek, 1990). Irrespective of the term being
used, a 3-sigma violation is a clear signal that
Rule 1: 1 point outside the +/− 3 sigma limits the variation of this single point is very different
Rule 2: 8 successive consecutive points above from the variation demonstrated by the rest of
(or below) the CL the data points on the chart.
When you detect a 3-sigma special cause do
Rule 3: 6 or more consecutive points steadily not overreact. The first thing you should do is
increasing or decreasing check the data to make sure that the data point
Rule 4: 2 out of 3 successive points in Zone is legitimate. For example, if someone used a
A or beyond different operational definition for this data point
it may in fact be a false positive. This data point
Rule 5: 15 consecutive points in Zone C on
might also be due to a data collection procedure
either side of the CL
that sampled the population differently than
Each of these rules is discussed next. the other data points. Finally, it could be due to
a stratification problem. In this case, data may
Rule 1: 1 Point Outside the +/− have been pulled from the afternoon shift when
the rest of the chart was based on data sampled
3 Sigma Limits (FIGURE 9-5) from the day shift. My point is that before you
This is usually referred to as a 3-sigma violation and see a 3-sigma violation as a true special cause,
is classified as a signal of instability in a process. investigate the methods used to gather that data
It is also one of the most easily recognized of all point. If the data point was based on the same
the tests because it is based on a single data point. operational definition as the rest of the data and
On the run chart this was a visual determination there were no sampling or stratification issues
of the “astronomical data point.” Now we have a then you do in fact have a special cause that
222 Chapter 9 Understanding Variation with Shewhart Charts

requires investigation. Why is this data point are observed as a gradual movement of the data
statistically different from the rest of the data? over time, which is demonstrated as a shift in the
The presence of a true special cause provides the process. Ideally this shift would be in the desired
opportunity for learning. direction but the shift could also be in the oppo-
site direction. The data are neutral. They do not
know if they are in the direction of goodness or
Rule 2: 8 Successive Consecutive away from it. This is why it is important that you
Points Above (or Below) the apply the statistical decision rules that allow you
to know when there is a true signal in the data of a
Centerline (FIGURE 9-6) special cause and when it is just random variation.
People generally find it easy to detect a 3-sigma Although the rule of 8 is a classic Western
violation (Rule 1). But, as the Western Electric Electric rule you will see other alternatives offered
Statistical Quality Control Handbook points out (e.g., 7 in a row, 9 in a row or even one approach
(1985, p. 26) the data can reflect instability even that favors a spread of 8 to 10 in a row). Wheeler
when all the data points fall between the UCL and (1995) lists all the various options defining a
LCL. A shift in the process is one such indication of shift that have appeared over the years and offers
instability. Most writers refer to this rule as “eight commentary on which ones he has seen used most
consecutive data points on the same side of the often. My point in even mentioning these alterna-
centerline.” When such a pattern is observed, it tives is that you will hear a variety of opinions on
signals that there has been a shift in the process. the number of data points used to define a shift
Another way to think of this rule is that it reflects and also on what constitutes a trend (rule 3). The
a run of data that has lingered too long above or challenge is if you define a shift with say seven data
below the mean, which indicates a nonrandom points you may see special causes when they do
pattern. This test is a variation on the run chart not exist (i.e., a type I error). If, on the other hand,
shift rule but you will notice that it requires eight you choose to use 9 or 10 data points as a shift you
data points in a run whereas the run chart rule may fall prone to a type II error, which is missing
required six data points to determine a shift. a special cause when it is present. The rule of 8 has
This test is one of the original Western Electric been regarded as a solid practical rule and it is the
four primary tests and it is a frequent signal on one I and my colleagues at IHI have decided to
healthcare charts. As teams work continuously use. It is neither too lenient nor too conservative
on improvement strategies, their work typically for health and social service application. Unless
produces results that are not immediate and dra- you like to get into rather heavy statistical theory
matic in nature (i.e., the 3-sigma rule) but rather debates about which rule is “the best” I’d suggest
that you accept a set of rules that are practical and
UCL appropriate for your work.
A
B
C
CL
Rule 3: Six or More Consecutive
C
B
Points Steadily Increasing or
A Decreasing (FIGURE 9-7)
LCL
This rule detects a trend in the data that Provost and
Murray (2011, p. 117) define as “a small, consistent
Too many data points in a row below the
centerline signals a downward shift in the process. drift in a process.” When deciding if a trend exists,
duplicate points (i.e., repeating values) should be
FIGURE 9-6 Rule #2: 8 successive consecutive points ignored. This rule engenders considerable debate.
above (or below) the centerline First, there is the popular definition of a trend. We
Deciding Whether a Special Cause Is Present 223

UCL
A

C
CL
C

A
LCL

Downward trend Upward trend

FIGURE 9-7 Rule #3: 6 or more consecutive points steadily increasing or decreasing

see a trend in fashion, a trend in food, a trend in is detected when there is “a series of consecutive
the stock market, which is usually referring to data points without a change in direction.” At the
the fact that the stock market closed higher than IHI we have decided to use the rule of 6 as initially
it started the day. I regularly hear the weather defined by Nelson (1985) and then by Pyzdek
reporter on the Chicago TV stations referring to (1990) as a common practical basis for detecting
a “trend in the temperature.” In this case, the trend an upward or downward trend in the data. The
is usually a comparison of today’s temperature final point I will make about this particular rule
to the average temperature for the past week or is that like Rule 2 this rule engenders considerable
month or the comparison of today’s temperature debate. Wheeler (1995, p. 137), for example, states
to the temperature on the same day a year ago. that “all of these tests (for a trend) are problem-
The point is that there are very popular usages atic.” He offers a number of reasonable statistical
of the word trend and then there are statistical principles as to why he maintains this perspective.
definitions. As we analyze Shewhart charts we Others will argue with you about a trend because
definitely should be using a statistical definition. they are (1) wanting to see a trend, (2) are using
But, I have been in many meetings where people a popular definition of a trend, or (3) have some
interpreting either static or dynamic displays of other statistical reference that says their trend is
data have devised their own definitions of a trend. preferred over the one you propose. So, once again,
Over the next week make a mental note of how unless you are ready for these debates I would
often you hear your coworkers or people in the suggest that you accept the rule of 6 as a trend and
media look at data and declare a trend is present. see how well it fits with your analysis of the data.
People will conclude that there is a “trend” in the
data when in fact they are merely comparing two
data points. If the second data point is higher than Rule 4: Two out of Three
the first and in the direction of goodness then this Successive Points in Zone A
gets labeled as an upward trend. Deming (1992)
had a very good bit of guidance: when you have or Beyond (FIGURE 9-8)
two data points, “it is very likely that one will be Another of the classic Western Electric rules for
different from the other.” instability is when two out of three consecutive
The Western Electric handbook does not data points are more than 2 sigmas away from the
specify how many data points are needed in order CL. In this particular case, the single data point
to identify a trend. They merely indicate that a trend not in Zone A or beyond can be anywhere on the
224 Chapter 9 Understanding Variation with Shewhart Charts

Rule #2: Two out of three


consecutive data points that fall
in Zone A or beyond

UCL
A

C
CL
C

A
LCL

Rule #2: Two out of three


consecutive data points that fall
in Zone A or beyond

FIGURE 9-8 Rule #4: 2 out of 3 successive points in Zone A or beyond

chart. The deciding criterion is whether two out of tail(s) of the distribution when you should in
the three successive data points are in Zone A or fact be observing less and less the further you
beyond on the same side of the CL. This is one of go out. There certainly are more complex sta-
the rules that is more difficult to explain in words tistical explanations of why this rule detects a
than pictures. Observing Figure 9-8 will help in special cause. But as Wheeler points out (1995,
understanding this rule. The primary question I p. 135) “this rule provides a reasonable increase
get with this rule, however, is “so what?” “Why in sensitivity without an undue increase in the
is it that two out of three data points in Zone A false alarm rate.”
constitutes a special cause?” First, envision the
static normal curve. Slightly over two thirds of
the data (68.26%) will fall within ± 1 SD of the Rule 5: 15 Consecutive Points
mean. When you go out to ± 2 SDs of the mean in Zone C on Either Side of the
you will find 95.46% of the data. This means
that by the time you are out to ±3 SDs from the Centerline (FIGURE 9-9)
mean you should be observing 99.73% of all the This test is generally described as reflecting an
data in the distribution. But, because the normal issue with stratification. Stratification usually
curve theoretically extends infinitely in either indicates that two or more different causal sys-
direction you do not account for 100% of the tems are present in every subgroup. This pattern
data. Now let’s get back to the two out of three of stratification is also known as “hugging the
data points in Zone A of a Shewhart chart. As centerline” because there is a run of 15 or more
you go out the tails of the normal distribution data points within 1 sigma of the CL (i.e., in Zone
you should expect to see less data. The two out C above or below the CL) and the variation is
of three rule, therefore, is signaling that you relatively small for these data points compared
are observing too much data bunching in the to the width of the UCL and LCL. Stratification
Deciding Which Shewhart Chart Is Most Appropriate 225

UCL
A

C
CL
C

A
LCL
FIGURE 9-9 Rule #5: 15 consecutive points in Zone C on either side of the centerline

occurs most often because the data collection


plan was flawed. For example, you will find a ▸▸ Deciding Which
stratification pattern when two separate distri-
butions of data have been collected (e.g., day shift
Shewhart Chart
turnaround time was combined with night shift
turnaround time) or the sample of data points was
Is Most Appropriate
drawn from two different distributions of data. Although there is only one way to make a run
Even though we have these rather specific chart, there are numerous ways to make a She-
statistical rules for determining special cause whart chart. The basic design and look of any
variation that are grounded in decades of testing in Shewhart chart is essentially the same as shown
manufacturing settings, I think it is important that in Figure 9-1 (i.e., data plotted over time, the
when we apply these rules to healthcare situations mean of the indicator as the CL and the calcu-
we apply them with a serious dose of common lation of the UCL and LCL). Furthermore, the
sense. For example, if we are trying to improve charts are all grounded in established statistical
food tray delivery time we may be willing to fully theory and are all interpreted in terms of the
accept six data points constantly increasing as a fundamental ideas related to common and
trend. But, on the other hand, if we are dealing special causes of variation. But, there are many
with wrong site surgeries we may not want to different types of Shewhart charts and the user
wait until we have six occurrences of wrong site needs to know which one is most appropriate
surgery to declare a trend and then take action. As for the indicator being studied. The variety of
my colleague Dr. Ray Carey (2003, p. 19) wrote: Shewhart charts is summarized by Benneyan
“When the well-being of patients is at risk, a case et al. (2003, p. 16):
can be made for using 2-sigma limits as ‘early
warning limits’ or for using 6 rather than 8 points There are at least a dozen different
to detect a shift.” In these situations, clinicians types of control charts in common use
would still be looking for signals of special cause in manufacturing and other industry,
so they do not over react to a single data point. with three or four new types being
But, they would use the data not necessarily to developed each year. The various types
justify changing the system but rather as a basis differ by the statistic plotted (e.g., av-
to investigate potential instability in their process erages, percentages, counts, moving
that could cause harm to patients. Wheeler and averages, cumulative sums, etc.) and
Chambers (1992) refers to this as having a process the distribution assumed (e.g., normal,
on the “brink of chaos.” Statistical decisions must binomial, Poisson, geometric, etc.). All
be moderated with and filtered through a healthy have different formulae for calculating
dose of common sense and rational thinking. centerlines and control limits.
226 Chapter 9 Understanding Variation with Shewhart Charts

If there are all these different types of Shewhart ■■ Blood glucose readings
charts how do you decide which one is the most ■■ The number of procedures or tests performed
appropriate for your data? The decision involves ■■ The number of surgeries done each day
two rather simple steps: (1) deciding on the ■■ Financial measures such as revenue, oper-
type of data you have collected and (2) deciding ating margin, or expenses
which of the various Shewhart charts is the most ■■ Duration of a surgical procedure in minutes
appropriate for this type of data. or hours

Attribute data are essentially counts of events


that can be placed into discrete categories. Unlike
▸▸ Types of Data measuring a patient’s weight on a continuous scale,
attribute data are looking at characteristics that
The first step in selecting a Shewhart chart is to
can be classified and placed into categories or
determine the type of data you are collecting.
“buckets.” For example, any time you are measuring
There are basically two types of data: (1) v­ ariables
mortality you are using attribute date (the patient
data (also known as continuous, interval, ratio,
is either alive or dead). Similarly, pregnancy is an
or measurement data depending on your back-
attribute indicator. There are only two outcomes:
ground and training) and (2) attributes data
the woman is either pregnant or not pregnant.
(also known as classification or count data). The
A woman does not proudly announce that she
term used to identify the type of data is a matter
is 53.9% pregnant. This is essentially a binomial
of taste and preference. Most SPC books will
outcome. Attribute data can be further divided
use the terms variables or continuous data and
into two subdivisions, defectives and defects.
attributes, classification, or count data. What is
Defectives (also known as nonconforming
more important than the terms you choose to
units) require that you have a count of the total
use are the concept the terms are capturing and
number of items or events being observed or
how you apply them to your data. In this text,
produced and the number of items from this total
I use the terms variables and attributes data as
that were not acceptable. The unacceptable items or
the primary categories. FIGURE 9-10 provides
events become the numerator and the denominator
examples of these two types of data.
is the total number of items or events observed.
Variables data can be measured along a
When you know how many items out of the total
continuous scale. In Figure 9-10 this type of data is
are unacceptable you can either plot the number
depicted as money, time, weight, length, and tem-
of defective items on your Shewhart chart or you
perature. Consider the ruler as a form of variables
can compute the percent of defectives. When we
data. It has equal appearing intervals that can be
compute a percentage, therefore, we are basically
divided into as many subdivisions as your calibration
determining what proportion or percentage the
instruments will permit. With variable data you
numerator is of the denominator. The standard
can perform all the mathematical function. Data
terminology used in most SPC books to define
measured this way can be either counts of whole
defectives is that you know both the occurrence
numbers or they can have decimals or fractional
of the defective product or service (the numer-
parts. Examples of variables data include:
ators) and the nonoccurrences (defectives plus
■■ Wait times in the emergency department (ED) nondefectives which when added together form
■■ Turnaround time for a lab test the denominator). Knowing these two pieces
■■ Blood pressure readings allows you to calculate a percentage or proportion
■■ Newborn weight (measured in grams or of defectives. Keep in mind that when you use
pounds and ounces) percentages you are comparing the same types of
■■ international normalized ratio (INR) and items, products, or services. If you are looking at
prothrombin times (PTs) the percentage of food trays delivered late to the
Types of Data 227

Variables Data

© Pedjami/Shutterstock

© Ultrashock/Shutterstock

© Butterfly Hunter/Shutterstock

© Paul Velgos/Shutterstock © Lipskiy/Shutterstock

Attributes Data

Defectives Defects
(occurrences only)
(occurrences plus
Nonconformities
non-occurrences)
Nonconforming Units

© HeinzTeh/Shutterstock

FIGURE 9-10 Examples of variables and attributes data


228 Chapter 9 Understanding Variation with Shewhart Charts

patient, for example, you will have the number of report that she just stuck herself. After you try to
late food trays as the numerator and total number calm her down and explain the next steps you will
of food trays produced as the denominator. In take, you do not say, “Oh by the way, how many
this case, you have trays divided by trays—like times didn’t you stick yourself today?” Similarly, if
divided by like. The only attribute that is different a nurse asked a patient, “How many times didn’t
for classification purposes is whether the food you fall today?” she would probably get a rather
tray was delivered late to the patient. This is an confused look from the patient.
important distinction to keep in mind because When you are dealing with defects you need
as we move next to define defects, this condition to remember that a count of the number of falls,
will not hold. In summary, data classified as de- needle sticks, or medication errors gives you a
fective can be divided into one of two categories numerator but you do not have a denominator. So
(i.e., a binomial situation) when you know both you cannot compute a percentage. So, you either
the occurrences and the non-occurrences of an just count the number of defects as whole numbers
event. Examples of this form of classification in- (e.g., the total number of falls today was eight)
clude conforming/not conforming to standards, or you create a falls rate. A rate is a ratio (i.e., it
harm/no harm, go/no-go, pass/fail, OK/not OK, has a numerator and a denominator just like a
complete/­incomplete or present/absent. proportion or percentage) but the two numbers
Defects pose an interesting challenge. Defects you are using to form this ratio are not alike. For
occur and can be counted. But, how do you count example, when we compute an inpatient falls rate
all the nondefects? Stated differently, you know by month we have the number of inpatient falls
when a defect occurs (the occurrence of an event) (including multiple falls) for the month as the
but you do not know when the nondefects or numerator and the denominator is usually the
nonoccurrences happen. I know, at this point you total number of inpatient days for the month. Now
are thinking, “This makes no sense.” When I first we have falls divided by days, two unlike things.
heard this statement it did not make a lot of sense The resulting number is reported as so many
to me either. Examples should help to clarify this falls per 1,000 patient days. Any time you report
concept. Look down at the rug in your office or that there are so many defects per 1,000, 10,000
in your family room. How many spots, stains, or or 100,000 units (e.g., inpatient days, medication
snags do you find in the carpet? For argument’s orders, lab tests, or surgeries) you have just created
sake I’ll imagine that you found three dirt spots, a rate. Note that when you see the little word per
two coffee stains, and four snags on the carpet. included with the name of a measure you know
Now, count the number of nonstains on the carpet? that it is a rate and not a percentage. Most of the
How did you do? You cannot count the nonstains patient safety indicators as well as epidemiology
or blemishes on the carpet. This is an unknown. indicators are constructed as rates (e.g., patient
Similarly, when the highway department records fall rate, restraint rate, surgical site infection
traffic accidents they do the same thing. They can rate, ventilator-associated pneumonia (VAP)
count how many accidents occurred today on a rate, needle stick rate, or medication error rate).
particular segment of the highway but they have The other characteristic of a rate is that the
no idea how many nonaccidents there were today. numerator of a rate can be larger than the denom-
There are times, therefore, when we know the inator. For example, if you had a 20-bed unit and
occurrence of an event when the nonoccurrences each patient fell two times you would have 40 falls
are unknown and unknowable. for 20 patients. What do you call this? Is it 200%
In health care, we experience this situation with falls? No. If you wanted to use a percentage you
patient falls, needle sticks, nosocomial infections, would have to make a different indicator, which
medication errors, and liability cases. We know would be the percent of patients who fell once or
only when the event happens. Think of needle more while they were with us. In this case, we do
sticks. A staff member comes into your office to not care about the total number of falls, which
Types of Shewhart Charts 229

OK?
If Yes, then the car is
fit to be shipped out!

Not OK?
If No, then the car is classified as being “defective” but
we do not know why it is defective (not fit to be shipped)
until we inspect it and count the number of specific
“defects” that make the car “not OK” or defective.

FIGURE 9-11 Defectives versus Defects


Ed Aldridge/Shutterstock

includes duplicates. All we are concerned with is defects: one headlight does not come on (defect
at the time of discharge did this patient fall once 1), the driver’s side door does not close flush with
or more, yes or no? Basically if the indicator is the the body (defect 2), and the driver’s seat moves
percentage of patients who fell we do not care if backwards but not forward (defect 3). The entire
they fell more than once. But, because patients car is classified as defective but three defects have
can fall more than once, and we are concerned been discovered. The next car is also classified as
about this problem, we generally do not use the defective but it has only one defect (the oil pressure
percentage of patients who fell as a binomial in- warning light on the dashboard does not go out
dicator (i.e., fell/did not fall). If we are concerned after the specified period of time). In summary,
about the magnitude of the falls and severity we defects or nonconformities are the specific things
typically track all falls, which means that we have that make a product or service defective. Once you
the possibility of having a numerator that is larger understand the distinctions between defectives and
than the denominator. When this can occur, we defects you will be well on your way to selecting
normalize the total number of falls by creating a the most appropriate Shewhart chart. To help you
rate (e.g., 3.2 falls per 1,000 inpatient days). A final in building your skills in differentiating between
point about defects is that they usually occur less defectives and defects refer to EXERCISE 9-1. For each
often than indicators measured by a percentage. indicator listed decide if it is describing a defective
FIGURE 9-11 provides an easy way to remember or defect. The answers to this exercise can be found
the differences between defectives and defects. When at the end of this chapter.
cars come off the assembly line they get inspected.
If the car is determined to be acceptable by the
inspectors it is fit to be shipped to a dealer. But, if
the inspectors find one or more things wrong with ▸▸ Types of Shewhart
the car it is not fit to be shipped. In this case, the car
would be classified as being defective. This fit to ship
Charts
determination is a binomial decision: the car is okay After determining whether your data are variables
or the car is not okay to be shipped. At the end of or attributes, the next step is to decide which She-
the shift the inspectors take all the defective cars and whart chart is most appropriate for the type of data
provide a summary of why each car was classified you have collected. Seven basic control charts are
as defective (i.e., as a nonconforming unit). This is regularly described in the literature and taught in
where the defects come into the picture. The first most classes and seminars on SPC. After working
defective car has a summary report pasted to the with the charts for over 15 years, however, I have
windshield. It reads that this defective car has three found that five of the seven charts are the most
230 Chapter 9 Understanding Variation with Shewhart Charts

EXERCISE 9-1 Defective or defect? You make the call!

Defective Defect
Indicator (Classification) (Count)

1. Number of accidents per 1,000 employee days

2. Number of errors per 25 food trays

3. Percentage of acute myocardial infarction (AMI)


patients receiving aspirin within 24 hours of arrival
in ED

4. Percentage of inpatient deaths each month

5. Number of surgical complications per 1,000


surgeries performed

6. Proportion of hand hygiene observations done


incorrectly

7. Number of falls per 1,000 patient days

8. Number of medication errors per 10,000 doses


dispensed

relevant and frequently used with healthcare, that you read beyond what I summarize in this
social services, and educational indicators. I focus chapter. As I mentioned earlier in this chapter,
on these five Shewhart charts but encourage you by reading the explanations of different authors
to explore the full range of charts as discussed in describing Shewhart charts and their uses you
the Western Electric Statistical Quality Control will build knowledge on how to use them with
Handbook (1985), Wheeler (1993, 1995), Wheeler your own improvement efforts.
and Chambers (1992), Carey and Lloyd (2001),
Carey (2003), Duncan (1986), Pyzdek (1990),
Kume (1996), and Provost and Murray (2011).7
FIGURE 9-12 presents the Shewhart chart
▸▸ Defining the Key Terms
decision tree with the five control charts that Before addressing the details related to each of
have the most relevance to health care, social the reviewing the five basic Shewhart charts
service and educational indicators. Two of the shown in Figure 9-12, however, is it necessary
five charts are used with variables data (i.e., X-bar to review three key terms that play a critical
and S chart and the XmR chart) and three of the role in helping you work your way successfully
charts are used with attributes data (i.e., p-chart, through the Shewhart chart decision tree shown
u-chart, and c-chart). Each of the five charts is in Figure 9-12. These key terms are subgroup,
described next and examples of how to apply observation, and area of opportunity and are
the charts are offered. I would suggest, however, summarized in FIGURE 9-13.
Defining the Key Terms 231

Variables data Decide on the type of Attributes data


data

More than one


Occurrences and
observation per
Yes No nonoccurrences?
subgroup? No Yes

Is there an equal
Yes area of No
opportunity?

X-bar & S XmR c-chart u-chart p-chart

Average and Individual The number of The defect The proportion


Standard measurement defects rate or percentage of
deviation defectives

FIGURE 9-12 The Shewhart chart decision tree

Subgroup Observation Area of Opportunity

How you organize your data (e.g., The actual value (data) you Applies to all attributes or
by day, week, or month) collect counts charts
The label of your horizontal axis The label of your vertical Defines the area of frame in
Can be patients in chronological axis which a defective or defect
order May be single or multiple can occur
Can be of equal or unequal sizes points Can be of equal or unequal
Applies to all the charts Applies to all the charts sizes

FIGURE 9-13 Defining Subgroup, Observation, and Area of Opportunity

Subgroup axis of the chart. The subgroups will be arranged in


chronological order of occurrence. When deciding
The subgroup defines how you have organized your on a subgroup you should strive to select them so
data and usually captures some dimension of time that if special causes exist the chances for differences
such as when patients show up for an appointment, between subgroups will be maximized, whereas the
day of the week, week, or month. The subgroup chances for differences due to special causes within
will be the label you place on the horizontal or x
232 Chapter 9 Understanding Variation with Shewhart Charts

a subgroup will be minimized (Duncan, 1986; month we have 30 or 31 days, upwards of


Montgomery, 1991). The traditional subgroups 90 shifts in a hospital, and approximately
for Shewhart charts have been: 720 hours in which to deliver care. Why
would we want to aggregate all this activity
■■ An individual patient as the subgroup in into a monthly average or monthly total?
which case you would order the patients Monthly data frequently lead people down
along the x axis of the chart in the order the path of judgment or accountability not
that they presented themselves in the office. quality improvement (QI). In my view, a
Patient 1 arrived at 9:00 a.m., patient 2 at primary reason we have so many health-
9:25 a.m., patient 3 at 9:50 a.m. and so on. care indicators structured around monthly
■■ A day as your subgroup in which case you subgroups is that this is how financial and
would have Monday, Tuesday, Wednesday, etc. resource allocation systems are organized.
across the x axis. Then each day you would In health care, work is being produced
select either all of the patients or a sample of every minute of the day not in monthly
them and record their wait times. blocks. Patients are waiting to be seen,
■■ A week the subgroup and you would label the have tests performed, or surgery started.
x axis as Week 1, Week 2, Week 3, etc. You Their focus is on minutes or possibly hours
would then have to decide if you were going not months. Administrators and managers
to track the wait time for all the patients in a think in terms of months but patients think
week or just a sample. Usually when a week about the here and now not in monthly
or even a 2-week period is selected as the aggregates.8 The other challenge with using
subgroup and patient wait time is the indicator month as a subgroup is that the variation
of interest you would probably want to draw in the indicator of interest is usually not
a sample of the patients. A total enumeration visible because the data is aggregated into
would probably provide more data than you an average. Although administrators, man-
need and create data collection challenges. agers, and policy makers frequently rely on
■■ A month is a frequently used period of aggregated data and summary statistics to
time for a subgroup. But it is not always make decisions, no customer, patient, or
the best block of time in which to think service user cares about the average. They
about improvement or understanding are concerned about why they or their loved
variation. Remember that the Shewhart one are not getting service or treatment
charts are designed to help you understand now. A patient takes no comfort in being
the variation in a key process indicator as told that the average wait time to see the
close to the production of the indicator as doctor last month was only 49 minutes. Or
possible. In manufacturing, they evaluate a physician waiting for her stat lab result to
products and services on an hourly, shift, come back will rightfully be irritated if she
or daily basis. Although they may aggregate is told “We don’t have your result quite yet
the key indicators for management reports but don’t worry, the average time to get a
by month or quarter the ability to improve result last month was only 63.5 minutes.”
quality and insure reliability does not come My point is that although we have a ten-
by looking at monthly or quarterly averages. dency to fall back on making month the
It comes by looking at production almost subgroup for many healthcare indicators
as it happens. In health care and many there is no reason to do so. I have made
social services, there is a strong tendency many charts for teams that are collecting
to track indicators by month or even quar- monthly data. But each time this happens
ter. This is what I refer to as the tyranny I make sure we have a discussion about
of monthly data. Think of it this way: in a what is the smallest unit of time that we
Defining the Key Terms 233

could gather data on. Using month as a occurrence and nonoccurrences of the events
subgroup should be a fallback option not being studied. If you answer “yes” to this question
the first choice. you will be able to calculate either the proportion
■■ Finally, it is probably quite evident at this or percentage of defectives and proceed to make
point that it is not advisable to use quarters a p-chart. If you respond that you do not have
as a subgroup choice because quarterly data the occurrences and nonoccurrences you are
represent a very long time period and the left with having only the occurrence of an event
variation you are interested in understanding when the nonoccurrences are not know. As was
has been aggregated and therefore severely mentioned previously, this gives you a count of
dampened. Quarterly data can lead only to the defects and you will make either a c-chart or
judgment not improvement. a u-chart. The decision to make a c- or u-chart is
based on your answer to the following question:
“Is there a relatively equal area of opportunity
Observation for the defect to occur?” If you respond “yes” to
As the term implies this is the actual piece of data this question you will make a c-chart, which is
or the measurement that you record or observe a plot of the number of defects occurring within
(e.g., turnaround time for a lab test or medication each subgroup (e.g., a count of the number of
order, blood glucose readings for a patient, or falls occurring each day). If you respond “no” to
time to administer beta blockers to heart attack this question (i.e., there is not an equal area of
patients in the ED). The vertical axis label on the opportunity for a fall to occur) then you would
chart defines the observation and the units of make a u-chart, which would be a plot of the
measurement along the y axis show the potential defect rate by subgroup (e.g., 3.2 falls per 1,000
distribution of these values. An observation patient days). So, it really does not matter if you
can be classified as either variables or attribute respond “yes” or “no” to the area of opportunity
data (e.g., time, money, weight, a percentage of question. Consider it essentially a filtering ques-
defectives, a count of defects, or a defect rate). tion that will help you select the correct chart for
For example, if your indicator is wait time to see your indicator. As each chart type is explained
the doctor in a clinic your observation will be the use of these terms is demonstrated.
the actual wait time in minutes that occurs from The terms used in the Shewhart decision tree
when the patient checks in at the registration (Figure 9-12) and summarized in Figure 9-13 are
desk until she is seen by the doctor. This amount not only central to understanding SPC theory but
of time will be what gets plotted on the chart. also from a practical perspective, understanding
Therefore, the dot on the chart, or the “doink” the terms subgroup, observation, and area of
as I like to refer to it, represents the quantitative opportunity are essential in the operation of SPC
aspect of the indicator you are observing during software packages. Many of the SPC software
the defined period of time (i.e., the subgroup). packages I have used explicitly ask you to identify
the subgroup and the observation or some variant
of these terms. Although most software appli-
Area of Opportunity cation do not ask you the “area of opportunity”
All Shewhart charts must have a subgroup and an question, understanding this concept is critical
observation clearly defined or the chart cannot to selecting the most appropriate chart for your
be constructed. When we move to the right side indicator. With these basic terms in mind, we
of the decision tree (Figure 9-12) and consider can start using the Shewhart chart decision tree
the attributes charts, discussed in detail later, a to understand the conditions that will lead us to
third term comes into play. Notice that the first select each chart. We will start on the left side of
decision point when dealing with attributes the decision tree and address the variables data
charts is determining whether you have both the charts then move over to the attributes charts.
234 Chapter 9 Understanding Variation with Shewhart Charts

X-Bar and S Chart If the subgroups are of unequal size, however


(e.g., on Monday, we sample 10 patient wait
The left side of the Shewhart decision tree times, on Tuesday, we had 15, and Wednesday,
(Figure 9-12) follows a pathway to two charts. we collected 20 wait times) the UCL and LCL
The first one is referred to as the average (X-bar) will not be straight lines. Instead, they will be
and SD (S) chart. It is the most powerful of the what are called “stair-step” control limits as
five Shewhart charts because it has multiple shown in FIGURE 9-15. With an unequal sub-
observations of continuous data that have been group size the amount of data varies within
organized into subgroups. In this case, the “doink” each subgroup and so the dots on the chart
on the chart (i.e., the plotted dot) has multiple (i.e., the observations) each have their own
“doinkettes” (i.e., observations) going into it. For individual UCL and LCL calculated. With more
an X-bar and S chart the subgroups can be of data the limits are tighter and with less data in
equal size or unequal size. If the subgroups are each subgroup the limits are wider as shown in
of equal size (e.g., a stratified random sample of Figure 9-15. Day 4 in Figure 9-15, for example,
15 patients is selected each day and their wait has tighter limits indicating that there is more
times to see the doctor are recorded) then the day being collected on this day. Day 9, on the
UCL and LCL on the chart will be straight lines other hand, has wider limits due to less data
as shown in FIGURE 9-14. being collected on this day.

X-bar chart: patient wait time


60.0
UCL
50.0
Wait time in minutes

40.0

30.0 LCL
20.0

10.0

0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Week

S chart: patient wait time


18.0
16.0 UCL
14.0
Standard deviation

12.0
10.0
8.0
6.0
4.0
2.0
0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Week

FIGURE 9-14 X-bar and S chart with straight control limits due to equal subgroup sizes
Defining the Key Terms 235

140

137
UCL = 135.22
134
Average (mmHg)

131

128 CL = 127.333

124

121
LCL = 119.44
118

115

20
16
UCL = 12.071
Sigma

12
8
CL = 6.129
4
0 LCL = 0.186
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Day

FIGURE 9-15 X-bar and S chart with stair-step control limits due to unequal subgroup sizes

When you make an X-bar and S chart most In Figure 9-15 the indicator of interest is a
software programs will give you the option of patient’s systolic blood pressure. The patient re-
producing two charts as shown in Figures 9-14 corded several blood pressure readings each day
and 9-15. The top chart is the X-bar chart or (a minimum of three and a maximum of five each
­average chart and the bottom chart is the S chart day). As a result the subgroups are of unequal sizes,
or SD chart. The X-bar chart is considered to which produces the stair-step control limits. If the
be the primary chart. Both charts have three patient had recorded exactly the same number
main components: (1) the CL or average, (2) of blood pressure readings (observations) each
the UCL, and (3) the LCL. The X-bar chart day (e.g., four) then the UCL and LCL would be
will show the average of the data within each straight lines. As you will see in subsequent ex-
subgroup and the lower chart (the S chart) amples, stair-step control limits will also be found
shows you the SD for each subgroup (i.e., each on p- and u-charts. As was mentioned above, on
dot) plotted on the X-bar chart. In Figure 9-14, those days when more data were collected (e.g.,
for example, Week 1 on the X-bar chart has an Day 4 in Figure 9-15) the control limits will be
average wait time of 38 minutes and an SD of 5 tighter. On days when fewer data were collected
minutes (seen on the bottom chart). On Week (e.g., Day 9) the limits will be wider.
2 the average wait time is 39.7 minutes and the The upper chart in Figure 9-15 reveals
SD is 7.2 minutes. So for each week we can see the average systolic blood pressures by day
what the average wait time is and the amount and the overall average. The lower chart shows
of spread around that average for this week as the SD for each day as well as the average SD
measured by the SD. across all 25 days.9 The average systolic blood
236 Chapter 9 Understanding Variation with Shewhart Charts

pressure shown in the top chart for the patient (1992), Western Electric (1985), and Provost and
in Figure 9-15 is 127. Note that the decimal Murray (2011).
places on the chart can be ignored in this case
because this is entirely too finite a reading for
blood pressure results. The degree to which you XmR chart or the I-chart
can control the decimal places on a chart will The XmR chart is also known properly as the
depend on the software being used. The average Individuals and Moving Range chart. But it
UCL is 135 whereas the average LCL is 119. can also be referenced as the Individuals chart
Because this chart reveals only common cause or simply the I-chart. The key characteristic of
variation, the way to describe the performance of this chart is that each subgroup contains one
this patient’s systolic blood pressure this chart is and only one individual observation or bit of data
as follows: “On the average this patient’s systolic (i.e., the “doink” on the chart has only 1 bit of
blood pressure is 127. It could go up as high as data and no “doinkettes” as we discovered in the
135 on any given day or down to 119 and that X-bar and S chart). In the Shewhart decision tree
is the natural rhythm of this patient’s systolic (Figure 9-12), this decision point is identified
blood pressure process.” by the question “More than one observation
The lower chart is the S-chart. This chart for each subgroup?” When the answer is “no”
has two primary purposes. First, it helps you then the chart of choice is the XmR chart. Like
to understand the variation that exists within the X-bar and S chart, you will typically get
each subgroup (i.e., day). For example, the SD two charts when you request this type of chart
for Day 4 is around 4. On Day 9 the standard from your SPC software. The X chart shows the
deviation is about 3 mmHg. As you look at values for the individual data points as well as the
each day, therefore, you will see that there is a average for all the individual data points. The mR
different average and standard deviation which chart documents the “moving range.” The XmR
reflects the variation in this patient's blood chart is typically used when you are interested
pressure over time. The second purpose of the in answering questions such as:
S-chart (i.e., the bottom chart) is that the av- ■■ How many surgeries do we do each week?
erage SD (the CL) is used in the calculation of ■■ What is the cost of each knee replacement
the control limits for the average (upper) chart.
surgery?
In this case, the average systolic blood pressure ■■ How long does each patient wait before
(i.e., CL) is 127. What is important to realize is
being seen by the doctor?
that if the SD chart reveals wide variation, then ■■ How many home care visits do we conduct
the average SD will likewise be large. Because
each week?
the average SD is used to compute the UCL and ■■ How many calories do I eat each day?
LCL, a large SD will also contribute to making ■■ What is the length of stay for each coronary
the control limits of the top chart (the average
artery bypass graft (CABG) patient?
chart) wider. The relationship of the two charts
must be understood together. This chart is used frequently to address questions
I do not intend to elaborate on the statistical related to volume, frequency of events, or financial
formulae for the calculation of the control limits. issues. Note that you are not interested in finding
It is important to realize, however, that each out what percentage of surgeries started late (this
Shewhart chart has a different set of formulae to would be considering a late surgery start classified
calculate the chart’s UCL and LCL. For additional as a defective, which would require a p-chart),
details on calculating the statistical parameters but rather you merely want to know how many
for the various Shewhart charts readers should surgeries are done in the course of a day or a
refer to Wheeler (1995), Wheeler and Chambers week. In this case, the day or week becomes the
Defining the Key Terms 237

subgroup (x axis label) and the total number of dashed lines that divide the chart into three areas
surgeries completed each day or week becomes above and below the CL. As was discussed earlier
the individual observation for that week (i.e., the in this chapter, the zones are used to assist in
dot on the chart). In short, the XmR chart can identification of special causes. Typically, and you
be used in many situations. Remember, however, will see exceptions to this point, the zones are used
that the indicator being placed on the XmR chart when you have a chart with equal subgroup sizes.
is not being classified as a defective or defect. The bottom chart is referred to as the moving
When you use the XmR chart you will usually range chart. The moving range is derived by calculating
be asking a more neutral question such as “How the simple difference between two successive data
many of procedure X do we do?” points on the Individuals chart and then plotting
FIGURE 9-16 provides an example of an XmR this difference on the mR chart. This is also referred
chart. In this particular example, the indicator of to as creating an artificial subgroup of 2 since each
interest is the total number of U.S. dollars saved subgroup on the chart initially contains only 1 bit
each month as a result of implementing a new of data. These steps are highlighted in Figure 9-16
transcription system for radiology. Note that like by the circles drawn around each neighboring data
the X-bar and S chart there are two charts. The point on the top chart and the corresponding arrows
top portion of the chart provides a plot of the that point to the mR value between the coupled
individual data points along with the average of data points on the lower chart. Notice that the first
all the data points and the UCL and LCL. This three data points on the mR chart (Months 2–4)
chart also has the zones identified. These are the are relatively close together. This is due to the fact

5750.0
5500.0 UCL = 5470.10
5250.0 A

5000.0
B
4750.0
Dollars saved

4500.0 C

4250.0 CL = 4360.90
C
4000.0
3750.0 B

3500.0 A
3250.0 LCL = 3251.70
3000.0

1500.0
UCL = 1362.79
Moving range

1000.0

500.0 CL = 17.20
0.0 LCL = 0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Month

FIGURE 9-16 XmR chart for the total amount of dollars saved each month in radiology transcription
238 Chapter 9 Understanding Variation with Shewhart Charts

that there are small differences between the first does not have a neighboring data point to be com-
four data points that have been coupled together pared with until the data for Month 2 is posted.
on the Individuals chart. If you look at data points Therefore, there is no moving range for Month 1
for Months 15 and 16, however, you see a very on the mR chart, which is identified by the triangle
different picture. The difference (i.e., the range) surrounding Month 1. This will always be the case
between Months 15 and 16 is much larger ($938 when you use an XmR chart.
to be exact) than any of the ranges found when the EXERCISE 9-2 (You make the call: Is it an X-bar
first four data points were compared. In short, the and S chart or XmR chart?) will test your ability to
individual values when coupled together produce determine whether a particular indicator should
an artificial subgroup of two, which you must have be placed on an X-bar and S chart or an XmR
in order to calculate a range and subsequently the chart. Answers to this exercise may be found at
moving ranges. One final thing to note about the the end of this chapter.
XmR chart. The mR chart will always have one less When we move to the Attributes side of the
data point on it than the Individuals chart. This Shewhart chart decision tree (Figure 9-12), we
is due to the fact that the first data point (Month 1) need to address two questions:

EXERCISE 9-2 Is it an XmR (I) or X-bar and S? You make the call!

X-Bar and XmR


Indicator S Chart (I Chart)

Time to clean an inpatient room (in minutes)

Patient satisfaction scores for subgroups of 15 patients in the outpatient clinic

Average turnaround time for all STAT labs done each day

Cost for each normal delivery

A diabetic patient’s 3x a day blood sugar readings

Average length of stay for a subgroup of 20 intensive care unit (ICU)


patients

The distance (in feet) that a sample of 10 knee replacement patients can
walk in 15 seconds

■■ Do we have the occurrence and nonoccur- equal opportunity for a defect to occur?”
rences of an event? If “yes” then we make a If we have an equal opportunity for a
p-chart (i.e., a percentage chart) defect to occur we make a c-chart. If not,
■■ If the answer is “no” meaning that we then we make a u-chart. The details are
have only the occurrence of an event (i.e., explained next.
a defect when we do not know the non- We start with the first question and address the
defects) then we need to ask, “Is there an use of the p-chart.
Defining the Key Terms 239

P-Chart denominators for each dot on the chart are not


equal and spread from a minimum of 326 cases
The p-chart derives its name from the fact that (Month 3) to a maximum of 1,041 (Month 15).
either a percentage or a proportion is what you Notice that the smallest denominator (326 for
actually are plotting on the chart. Most of the Month 3) has the widest control limits whereas
time the percentage will be the statistic of interest the largest denominator (1,041 for Month 15) has
rather than the proportion. When you make a the tightest set of control limits. The numerators
p-chart, or any other attributes chart, you will get go from a low of 75 readmissions (Month 3) to a
only one chart (unlike the variables charts which high of 249 in Month 16. If the distance or spread
gives you two charts). The p-chart is used to between the stair-step limits is relatively small
monitor the proportion or percentage of defectives this means that the denominators are relatively
when you know the occurrence of the defective close in size.
product, unit, event, or service (the numerator) Finally, it should be noted that if the sub-
and the nonoccurrences (the denominator, groups (the denominators) were of equal size, the
which is the total being observed). This chart is control limits on the p-chart would be straight.
used frequently in healthcare settings because But because most healthcare indicators that
we track many indicators that look at accuracy, are defined as percentages differ from one sub-
completeness, errors, or the percentage of some- group (i.e., time period) to another (e.g., we do
thing done or not done (e.g., cesarean sections, not have the same number of deliveries each
completed history and physical reports, proper month, produce the same number of food trays
hand washing, or compliance with a standard each day, or have the same number of patients
protocol). FIGURE 9-17 provides an example of visit at clinic each day) we usually do not have
a p-chart with stair-step control limits. In this equal subgroups when calculating percentages
case, the indicator is the percentage of hospital or proportions. Therefore, most p-charts will
readmissions for home healthcare patients. The generally have stair-step control limits.

35.0

30.0
Percentage of readmissions

25.0
UCL = 24.59

20.0 CL = 20.64

LCL = 16.68
15.0

10.0 Denominator = 1041


(tightest limits)
Denominator = 326
5.0 (widest limits)

0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Month

FIGURE 9-17 p-chart on the percent of hospital readmissions for home healthcare patients
240 Chapter 9 Understanding Variation with Shewhart Charts

C-chart ■■
■■
Medication errors
Central line infections
The c-chart and the u-chart are the Shewhart
charts of preference when you are tracking defects. Figure 9-1 was used to show the elements of a
The c-chart is the appropriate chart when you Shewhart chart. It also provides an example of
have an equal area of opportunity for a defect a c-chart. In this example, the customer service
to occur. As shown in Figure 9-13, an area of manager of a large medical group is interested in
opportunity has the following characteristics: charting the total number of patient complaints
received each week at 17 sites of care. Because
■■ It applies to all attributes or count charts each patient could file more than one complaint,
(c, u, and p) complaints are viewed as defects. The alternative,
■■ It defines the frame or area in which a considering the registration of a complaint as
defective (i.e., a nonconforming unit) or a defective, is not selected because this would
a defect (i.e., a nonconformity) can occur preclude counting more than one complaint from
■■ It can be of equal or unequal sizes. an individual patient. Remember that defectives
A manufacturing example may help clarify this are based on the binomial distribution (i.e., the
concept. Imagine that you work on a paint line at patient complained or did not complain). If you
an automobile manufacturing plant. If you were approached this indicator as a defective you
assigned to paint the hoods of a single model would not be concerned with the magnitude of
of a car (e.g., a Ford Taurus) there would be an the complaint problem (i.e., the total number
equal area of opportunity for a paint blemish of complaints) but rather with the fact that a
because all Ford Taurus hoods are the same size. patient complained or did not complain and
In this case, we would make a c-chart and plot you do not care if a patient complained more
the total number of paint blemishes (defects) than once. Measuring complaints as a defective
on each hood you paint. Because each hood has would produce a percentage of patients who
the same number of square inches of surface complained (a p-chart). As a defect, however,
area there is a constant area of opportunity for we are concerned with the magnitude of the
a paint blemish. problem so we count the total number of com-
The challenge now becomes determining plaints, including multiple complaints from the
when this equal area of opportunity condition same patient. The c-chart is selected because the
exists in healthcare settings. One of the more volume of patients seen at the 17 clinics Monday
frequently used examples of how this might occur through Friday remains fairly constant each week
is with monitoring patient falls. If you conclude and the number of sites included in the study
that there is basically an equal opportunity for does not change. These two conditions allow the
a patient to fall each day of the week at your manager to assume an equal area of opportunity
hospital, rehabilitation facility, or long-term care for a complaint to occur. She merely counts the
facility, then you would merely count the num- total number of complaints received each week
ber of falls occurring each day, week, or month and plots this number on a c-chart. The chart
and plot the number of falls on a c-chart. Other produces a CL (average number of complaints)
indicators that could be placed on a c-chart if and a UCL and LCL.
the equal area of opportunity assumption was A frequent challenge with using the c-chart
met include the number of: for healthcare applications is that the condition
of equal area of opportunity may not be met.
■■ Patient restraints
Frequently in healthcare settings there are few
■■ Lawsuits
indicators that have equal areas of opportunity.
■■ Patient complaints
The severity of a patient’s condition can change
■■ Needle sticks
Defining the Key Terms 241

quickly, the census can show fluctuations, the actually means. An example should help clarify
clinic is not a 7 days a week operation, the volume how these are opposite sides of the same coin. If
of orders may change rapidly, and the ED may we have 21 inpatient falls, this number becomes
have to go on bypass because there are no more the numerator of the rate-based statistic. When we
inpatient beds available on a Friday or Saturday place this count of 21 falls over the total number
night. So if the assumption of an equal area of of patient days for the month (e.g., 4,775) we have
opportunity is violated, what do we do next? The a ratio of two different numbers that produces
answer is simple—you make a u-chart. a result of 0.00439 (i.e., 21/4775 = 0.00439).
Because the number of inpatient days is in the
thousands we multiple the resultant value of
U-Chart 0.00439 by 1000 to produce the inpatient falls rate
This chart is used frequently in health care, of 4.4 falls per 1,000 inpatient days. The number
especially now that there has been a more con- of spots you slip the decimal point on the resultant
centrated effort to track patient safety indicators. ratio depends on how large your denominator is.
The u-chart, like the c-chart, is used to track In this case, we had 4,775 inpatient days so we
defects. The difference is that the u-chart is slip the decimal point three places to the right
selected when you conclude that there is not an by multiplying the value of 0.00439 by 1000. If
equal area of opportunity for the defect to occur. you had patient days in the tens of thousands you
Let us return to the paint line at the Ford plant would slip the decimal point four places to the
for a moment. Although you have in the past right and have 43.9 falls per 10,000 inpatient days.
painted one model of car at a time, today you Or you could go out to 100k inpatient days and
have been told that the line will have a mixture say “I’m sorry but we had 439 inpatient falls per
of cars and a mixture of hood sizes. So, how do 100,000 inpatient days.” Or if you really wanted
you count the paint blemishes on the hoods to depress the senior management team or board
of a Ford Escort, a Taurus, a Mustang, and an you could report 4,397 inpatient falls per 1 million
Expedition? Each hood has a different number inpatient days. You can adjust the result of the ratio
of square inches, takes a different volume of of 21/4,775 very easily for any value you place in
paint to cover the surface of the hood, and has the denominator position. The general rule for
a varying probability of experiencing a paint rates, however, is that the denominator you use
blemish. The u-chart takes care of this problem should be based on the volume you are observing
very quickly by computing a defect rate. The on a regular basis. In the case of inpatient falls,
number of paint blemishes is used as the nu- most hospitals are dealing with inpatient days
merator and the number of square inches of the that are in the thousands so this is what should
hood’s surface is used as the denominator. The be used to calculate the final rate-based statistic
resultant ratio provides the number of blemishes of 4.4 falls per 1,000 inpatient days. If, on the
per so many square inches of hood area. The other hand, you were tracking medication errors
rate essentially normalizes the differences in you would most likely be justified in making the
denominator size (i.e., the area of opportunity number of errors per 10,000 doses dispensed or
for a blemish to occur). scripts written because an average-size hospital
One technical point about rates. Explaining will general dispense 10,000 or more doses each
a rate-based statistic can be a little challenging. It month. Finally, if your measure was the neonatal
is much easier to say, “This past month we had death rate for a state, province, or region then the
21 inpatient falls” than to say, “This past month proper denominator size might be per 100,000
we experienced an inpatient falls rate of 4.4 falls live births.
per 1,000 inpatient days.” Some in your audience Because it is an extremely rare to have the
may struggle with what this rate-based statistic same number of medication orders each week,
242 Chapter 9 Understanding Variation with Shewhart Charts

the same patient census, or the same number on SPC. The ASQ, for example, offers public
of central line days in the ICU, the u-chart is seminars on SPC. You may want to check with
used more often in healthcare settings than the your local ASQ chapter to see when such courses
c-chart. Furthermore, because epidemiologists will be offered. The IHI also offers workshops
frequently produce rate-based statistics (e.g., on building effective measurement systems and
the neonatal death rate or the VAP rate) the SPC. The various program offerings that I and
use of terms associated with the u-chart should my colleagues teach throughout the year can be
sound familiar to many healthcare professionals. reviewed on the IHI home page (www.ihi.org).
Examples of u-chart applications are provided Finally, if you have the opportunity to attend a
in the case study chapter (Chapter 10). local or national quality conference (e.g., the
TABLE 9-1 provides an overview of the five IHI National Forum on Quality Improvement
charts just described and offers examples of in Healthcare or the IHI-BMJ International
indicators that could be placed on each type Forum on Quality and Safety in Healthcare),
of chart. Other useful tables that summarize make sure that you sit in on sessions that are
how charts should be set up and their various discussing Shewhart charts and SPC. Hearing
uses can be found in Statistical Quality Control about control charts from multiple sources will
Handbook (Western Electric, 1985) and Benneyan be very beneficial.
(2001).10 Readers wishing to gain additional You can also test your knowledge of the var-
insights about the selection of control charts ious charts by completing the You Make the Call
should consult Wheeler (1995), Montgomery exercise found in EXERCISE 9-3. When I teach my
(1991), Pyzdek (1990), Ishikawa (1989), Duncan classes on Shewhart chart applications, I give the
(1986), Carey and Lloyd (2001), Carey (2003), participants this exercise at the end of the class
and Provost and Murray (2011). to provide a final test of their understanding of
the selection of appropriate Shewhart charts. It
gives them a chance “make the call!” and tests
▸▸ You Make the Call their control chart knowledge. The indicators
listed in this exercise are taken from actual teams
Now that you are familiar with the basic ideas I have had the opportunity to facilitate or coach.
behind the Shewhart charts, the next step is to Start the exercise by determining the subgroup.
apply this knowledge to your own indicators. Remember that the subgroup is the label for
The study questions in BOX 9-1 will serve as a the horizontal axis and reflects how you have
quick overview of some of the central issues organized your data (e.g., by day or week). Next
related to Shewhart chart development and as decide if you have variables or attributes data.
a test of your current knowledge. If you struggle Finally, list the chart you think is most appropriate
with some of the questions you can review the for this situation. You may want to refer to the
material presented in this chapter and then Shewhart Decision Tree shown in Figure 9-12 to
explore some of the listed references for addi- assist you in thinking through the chart options.
tional explanations. Another way to enhance The answers to the You Make the Call exercise
your knowledge base is to attend workshops can be found at the end of this chapter.
You Make the Call 243

TABLE 9-1 Shewhart chart summary

Type of Data and Data Examples of Indicators


Type of Control Chart Collection Issues Used on This Type of Chart

X-bar and S chart Variables data ■■ Actual turnaround time


This is known as the Average The X-bar and S chart usually for five lab tests or three
(X-bar) and Standard Deviation involves drawing a sample of pharmacy orders each day
(S) chart. Most SPC software observations (e.g., 3–10 per ■■ Blood pressure readings
programs will give you two subgroup). Rational subgrouping (e.g., three to five per day)
charts when you select this is frequently used with this ■■ Diabetes monitoring (e.g.,
chart: one for the X-bar portion chart. The statistical principles three fasting blood sugar
and one for the S portion. This behind this chart are based readings each day)
is considered to be the most on the assumptions of the ■■ Anesthesia time for a
statistically powerful of all the normal (Gaussian) bell-shaped sample of cases each day
charts. The X-bar and S chart distribution. ■■ Patient satisfaction scores
can have straight or stair-step
control limits.

XmR chart Variables data ■■ Patient wait time to see


This chart is known as the The XmR chart is used when you the physician or to be
Individual values (X) and have a single observation for each seen in the ED
moving range (mR) chart. subgroup (i.e., n = 1). Sampling ■■ The number of days to
Sometimes it will be referred typically is not done but might be mail a patient bill after
to as the Individuals or I-chart. if the process being monitored discharge
It does not have the statistical has an extremely large volume. ■■ The number of calls
rigor or power of the X-bar Because this chart frequently uses coming into a clinic each
and S chart because each dot aggregates as the plotted number day
on the chart is representing (e.g., days in accounts receivable ■■ Average length of stay
only one observation. This this month), it is important to by week for a particular
chart is used frequently to make sure that the data are diagnosis-related group
answer questions related to consistently collected from one (DRG)
volume, for example, “How time period to the next. This chart ■■ The number of surgeries
many surgeries did we do this is used to evaluate questions done each week
week?” The XmR chart does related to process outcomes ■■ Operating margin by
not address the question as to (volumes), with no concern as month
whether these surgeries were to whether the outcomes of the ■■ Pounds of laundry
started on time (this would process are acceptable or not each day
require a p-chart). Instead, acceptable. ■■ Average turnaround time
the XmR chart is answering a by day
neutral question, “How many?” ■■ The number of food trays
or “How much?” The XmR produced
chart will always have straight ■■ Patient satisfaction score
control limits.

(continues)
244 Chapter 9 Understanding Variation with Shewhart Charts

TABLE 9-1 Shewhart chart summary (continued)

Type of Data and Data Examples of Indicators


Type of Control Chart Collection Issues Used on This Type of Chart

p-chart Attributes data ■■ Percentage of cesarean


The p-chart is used frequently These data are classified as sections
in health care to compute the defectives or nonconforming ■■ Percentage of late food trays
percentage (or proportion) units because they reflect the ■■ Percentage of incomplete
of defective products or percentage (or proportion) of charts
services. The p-chart requires things or events that do not meet ■■ Percentage of late surgery
being able to count both specifications or criteria (the starts
the numerator and the numerators). The denominators ■■ Percentage of bills that are
denominator. The p-chart is usually (but not always) are of inaccurate
the weakest of the attributes varying sizes, which produce ■■ Percentage of mortality
charts because it is based on stair-step control limits. Data of ■■ Percentage of staff
the binomial distribution (i.e., this type reflect the binomial turnover
there are only two outcomes distribution. The denominators ■■ Percentage of patients
such as yes/no, acceptable/ need to be sufficiently large (e.g., responding “Very Good” to
not acceptable, or complete/ usually greater than 12) to enable a survey question
not complete. The p-chart a reasonable percentage to be ■■ Percentage of x-rays that
can have straight or stair-step calculated yet not too large (e.g., had to be redone
control limits. over 5,000). ■■ Percentage of did not
attends (DNAs) at an
outpatient clinic

c-chart Attributes data ■■ The number of falls


The c-chart is used to count The key to using a c-chart is that ■■ The number of restraints
the number of defects that there should be an equal area of ■■ The number of needle
occur within an equal area opportunity for a defect to occur. sticks
of opportunity when the This condition frequently makes it ■■ The number of lawsuits
nondefects are unknown. difficult to use this chart in health filed
In this case, each observed care because the conditions under ■■ The number of ventilator-
unit (e.g., a patient) can have which we provide care do not associated pneumonias
multiple defects (e.g., falls). always remain constant. One way ■■ The number of
Generally speaking, defects to address this inequality in the nosocomial infections
are the specific reasons why a area of opportunity is to apply ■■ The number of
product or service is classified stratification. For example, if the medication errors
as defective (i.e., a defective conclusion is that there is not an ■■ The number of returns to
product or service will suffer equal area of opportunity for an surgery
from one or more defects). inpatient fall because the hospital ■■ The number of surgical
Generally speaking, indicators functions differently on weekends site infections
appropriate for a c-chart than weekdays then separating the ■■ The number of violent
should be considered “rare data by weekdays versus weekends events in a mental health
events.” The c-chart will always may be sufficient to conclude that ward
have straight control limits. there is a relatively equal area of ■■ The number of central line
opportunity for a fall during each of infections
these periods. The c-chart is based
on the Poisson distribution.
You Make the Call 245

TABLE 9-1 Shewhart chart summary (continued)

Type of Data and Data Examples of Indicators


Type of Control Chart Collection Issues Used on This Type of Chart

u-chart Attributes data ■■ Medication errors per


The u-chart is used to track The Poisson distribution is also 10,000 doses dispensed
defects when the area of used as the frame of reference for ■■ VAP per 1,000 vent days
opportunity is not equal. For this chart. The u-chart presents ■■ Total falls per 1,000 patient
this reason, the u-chart is rates (e.g., so many falls per 1,000 days
typically used more often in patient days). Knowledge of how ■■ Total readmissions per
health care than the c-chart. to collect data to form rates is 1,000 discharges
This chart is based on rates essential. ■■ Bloodstream infections
rather than simple counts. The per 1,000 line days
u-chart can have straight or
stair-step control limits.

BOX 9-1 Shewhart charts study questions

■■ When is it appropriate to use Shewhart charts? Should I use them in place of descriptive statistics?
■■ What is the relationship between Shewhart charts and tests of significance?
■■ How many data points do I need to make a Shewhart chart? What do I do if I do not have
enough data?
■■ Which is better, attributes or variables data?
■■ What is a subgroup? Do I have to have one to make a Shewhart chart?
■■ Can I make a Shewhart chart with only single data points?
■■ Do my subgroups have to be of equal size when I make Shewhart charts?
■■ Much of the data I get does not have the date on it. So, does it really matter if the data points are
not in chronological order?
■■ I still don’t get this distinction between a SD and a sigma limit. Why aren’t they the same? Does it
really matter? My spreadsheet software will give me a SD. Why can’t I just multiple this number by 3
and then add and subtract this product from the mean to get the control limits?
■■ Why do I have to use 3 sigma control limits? Why can’t I use two or maybe 1.5 sigma limits?
■■ Do defects add up to make defectives or is the other way around?
■■ When I make a p-chart, does the size of the denominator make a difference? Can I have, for
example, 4 or 5 in my denominator?
■■ What is the difference between a proportion, a percentage, and a rate?
■■ Should I view common cause variation as “good” variation and special cause variation as “bad”
variation?
■■ Do I really have to investigate a special cause? Can’t I just remove the data point from the chart and
get on with making changes?
246 Chapter 9 Understanding Variation with Shewhart Charts

EXERCISE 9-3 You make the call! Selecting the right chart

Type of Type of
Situation Subgroup? Data? Chart?

1. Each day you record the number of films processed in V or A


the radiology department.

2. Each day you record the number of films requested V or A


and the number that cannot be found in the radiology
library.

3. The number of inpatient restraints each month is V or A


placed over the total inpatient days each month.

4. Each day you pull a stratified random sample of V or A


15 complete blood counts (CBCs) and record the
turnaround time (in minutes) for each CBC.

5. The number of minutes it takes to get a stat med V or A


order administered to the patient (order time to
administration time).

6. Every 2 weeks you pull a sample of 30 medication V or A


orders and count the total number of orders that have
one or more errors.

7. The wait time in the ED (door to discharge) is tracked V or A


for each patient.

8. The clinic receptionist notes the time of check-in for V or A


each patient. The physician notes the time when he/
she first sees the patient in the exam room. An analyst
compiles the data daily and reports the percentage of
patients who had to wait more than 30 minutes.

9. The director of surgery keeps track of the total number V or A


of surgical procedures performed each week.

10. The dietary department records the number of food V or A


trays that come back uneaten each day and the total
number of trays they produced for that day.

11. You are interested in the average time patients spend V or A


in your waiting area, so every day a student randomly
picks eight patients and measures their actual waiting
time in whole minutes.

(continues)
Additional Shewhart Charts 247

EXERCISE 9-3 You make the call! selecting the right chart (continued)

12. The ICU nurses want to evaluate the ventilator- V or A


associated pneumonia (VAP) rate. So every 2 weeks
they record the total number of pneumonia episodes
and the total number of vent days.

13. Each week patient satisfaction scores for three units V or A


are compiled and an average is calculated for the three
units.

14. The finance department tracks the total number of V or A


business days it takes to process a vendor’s request
for payment. Process time starts when the request
for payment is received in the finance department
and ends when the payment is sent (electronically or
posted in the mail) to the vendor.

15. Every week each medication order is checked against V or A


five potential types of errors. The total number of errors
for the week is divided by the total number of orders
submitted that week.

16. You know the number of people who come to the ED V or A


complaining of chest pain and the number who are
actually diagnosed with an AMI or unstable angina.

▸▸ Additional Shewhart ■■
■■
Multivariate Shewhart-type Charts
P primed chart (p′-chart)
Charts ■■ U primed chart (u′-chart)
Provost and Murray (2011) do a very good job of
In addition to the five basic Shewhart charts not only describing these alternative charts and
described previously, there are many other charts provide examples of their use. I do not intend to
that have their roots in manufacturing but have go into depth about these various charts but I do
proven to be very useful in certain healthcare want to make a few comments about the t- and
situations. Some of these alternative Shewhart g-charts that are being used more and more in
charts include: healthcare improvement work.
■■ Median chart The t- and g-charts are designed to address
■■ t-chart the occurrence of rare events. I know, you are
■■ g-chart wondering, “What is the operational definition
■■ Moving average chart of a rare event?” When I was first learning
■■ Cumulative Sum chart (CUSUM) about these charts the instructor used a simple
■■ Exponentially weighted moving average example. He would ask, “What is the probability
chart (EWMA) of looking out the window and seeing a car go
■■ Standardized Shewhart Chart by?” Everyone would respond, “High.” Then he
248 Chapter 9 Understanding Variation with Shewhart Charts

would ask, “What is the probability of looking happens (e.g., a fall, a pressure ulcer, a surgical
out the window and seeing an accident occur?” site infection) you basically reset the counter and
Everyone would respond, “Low.” He then would begin counting the number of days again until
proudly announce, “You now understand a rare the next fall occurred. This is the same approach
event.” Now this is a pretty casual explanation of that factories use to track the number of days that
a rare event but I think it helps to set the context have gone by without an accident in the factory.
for thinking about rare events. If you wish to get If you never had an event you would never have
very statistical about rare events you can study a dot on the chart. Because you are counting the
what is called the “rare event rule for inferential number of days that have gone by since the last
statistics.” Within this body of statistical theory event (i.e., a defective or a defect) the horizontal
you will be reacquainted with probability theory axis will not have Monday, Tuesday, Wednesday
that you were exposed to relatively early in your or January, February, March, etc. marked. When
statistical training. Most of the time this is ex- an event occurs this is when you place the date
plained by using the probabilities associated with of the event on the horizontal axis, which will
rolling various combinations on dice or getting not be occurring in equal periods of time. The
a particular combinations of cards while playing indication of improvement on a t-chart is when
blackjack or poker or betting on a roulette wheel. you observe an ever-increasing run of days
At the IHI we use a practical approach to without the occurrence of an event.
defining rare events that is grounded in statistical As healthcare providers have become more
theory but does not require detailed compu- focused on safety indicators and reducing harm
tations. Simply stated, if you have more than the t-chart has become increasingly popular.
25% of the data on a p-, c-, or u-chart at zero But a word of caution is in order. The statistical
(or conversely at 100%) you need to consider basis for properly calculating the limits on a
moving to a t- or g-chart. With 25% or more of t-chart are a little involved. First, you need to
the data points at zero the use of the traditional realize that a distribution of rare events does not
rules for detecting special causes on a Shewhart follow a normal Gaussian bell curve. A Poisson
chart become questionable (Provost & Murray, distribution is a better referent for rare events.
2011). It also is a practical issue. If you do not The Poisson distribution is appropriate as a
have sufficient nonzero data for an attribute referent for the c- and u-charts as well as the
chart the LCL may not exist, which makes the time between chart. In the case of the t-chart,
interpretation of the chart difficult. In these however, the form of the Poisson distribution
situations, you should consider moving to the is actually an exponential distribution, which
time between chart (t-chart) or the cases between is in turn highly skewed. Second, the skewness
chart (g-chart). The t-chart (the t part of the of the exponential distribution is not a major
name refers to “time”) or time between chart problem and is addressed by transforming the
shows you how much time has gone by since the time between events (i.e., days gone by) into
last adverse event. Nelson (1994) provided the a quasi-normal or symmetric distribution by
details on how this chart is constructed. When performing what is called a Weibull transfor-
you use this chart you have to reorient the way mation. Third, once the data are more or less
in which you explain the chart. For example, the approximating a normal distribution the UCL
horizontal axis on the t-chart is a discontinuous and LCL can be calculated by using the formulae
time sequence. If you start next Monday to begin for the XmR chart. Finally, after the limits and
tracking patient falls but a fall does not occur CL are calculated they are transformed back
until Wednesday then you would place a dot to their original state for plotting on the chart.
on the chart’s vertical axis at 2 (i.e., 2 days have I know, this all sounds rather complicated.
gone by before a fall occurred). When an event The detailed steps for constructing the t-chart
Using Shewhart Charts Effectively 249

are clearly discussed by Provost and Murray sudden proclaim, “I think I get it!” Once you
(2011) and Nelson (1994). Also take heart reach this point it is now time to start applying
in knowing that any reasonably good SPC this knowledge to actual improvement oppor-
software package will do all the calculations tunities. But be careful. I have seen some people
for a t-chart quickly and easily once you cre- become so enthusiastic about the various charts
ate the time between data file. Your biggest that they start making graphs on any process that
challenge will be to explain how to interpret produces data. It is at this stage that I remember
the t-chart. But because its use is growing an old adage—if you give a child a hammer, the
in popularity in healthcare setting, it is well whole world looks like a nail! The charts play a
worth your time to gain more knowledge of valuable and central role in all QI efforts. It is
the time between chart. important to realize, however, what they can do
The g-chart (or geometric chart) is similar in and what they cannot do.
principle to the t-chart. It too is a chart for rare First, appreciate the fact that the charts do
tracking events except that instead of plotting not answer the following questions:
the amount of time (e.g., days) between a rare ■■ What is the reason for a special cause?
event, the g-chart plots the number of cases ■■ Should a common cause process be improved?
that are regarded as being successful against ■■ What should I do to improve the process?
cases considered to be failures. A failure in
this situation might be a surgical site infection, The answers to these questions do not come
patients experiencing a medication error, or from the charts or statistics. They come from
a return to surgery within 24 hours. Like the the will, ideas, and ability of the team to execute
t-chart success is determined by having a long tests of change. I have seen too many teams feel
run of successful cases with no failures or ad- that once they have created a chart their work
verse events. Although the t-chart is modeled is finished. I think that this occurs because the
after an exponential distribution the g-chart chart is a tangible thing that can be pointed to
referent is a geometric distribution. Again, and shown to others. Improvement strategies,
the steps for computing the limits and the CL on the other hand, are not as finite or discrete.
on a g-chart are nicely laid out in Provost and Developing improvement strategies is actually
Murray (2011). There is also a considerable much more difficult than mastering control
body of literature on both the exponential and chart theory and construction because you are
geometric distributions that can be found in the dealing with people, behaviors, and culture
ASQ's Journal of Quality Technology. The g-chart not numbers.
is also becoming a standard offering in most Second, after you make a chart and decide
SPC software packages. Once again, however, whether the process exhibits common or special
the challenge is making sure you have at least cause variation, you then need to decide how
a moderate foundation in being able to explain you are going to approach the variation you
the chart and how to interpret it. have identified. Do you need to merely reduce
variation in the process or fundamentally redesign
the process and change the way in which work
▸▸ Using Shewhart Charts is envisioned and delivered? All improvement
strategies emanate from an understanding of
Effectively variation. If the process exhibits special cause
variation the appropriate decision is to investigate
At some point after reading various books on the special cause(s) and determine why they have
run and Shewhart charts and listening to others made the process unstable and unpredictable.
explain control chart theory, you will all of a Just as we would investigate a patient safety event
250 Chapter 9 Understanding Variation with Shewhart Charts

(i.e., a sentinel event) by conducting a root


cause analysis, we also need to do the same
Notes
thing when a special cause is detected on a run 1. Historically these charts have been known
or Shewhart chart. Ignoring a special cause will as control charts. Shewhart himself even
guarantee that it will rear its ugly head at some referred to them as control charts as have
point in the future. We cannot predict exactly many writers since Shewhart’s time. But as
when a special cause will occur but you can Blank (1998, p. 1) points out, “It is important
be sure that it will pop up again if you choose to understand that SPC does not control
to ignore it.11 processes. People control processes. SPC
The other aspect of a special cause is that not is merely a tool that provides you with
every special cause is negative and undesirable. information you need to reduce variation
Remember that special causes are not bad and and tell you whether or not your processes
common causes are not good. The key point can meet the customer’s expectations.” In
is the special causes make a process unstable more recent times, the charts have been
and unpredictable. It is very likely that you will referred to more and more as Shewhart
observe a special cause that you want to emulate charts (Provost & Murray, 2011, p. 113)
(e.g., when lab turnaround time is much faster to emphasize their use primarily in un-
than it has been or the past 15 days). In this derstanding variation and to facilitate
case, you want to investigate why the process learning about process capability rather
worked so well on those days and see whether conveying images of “control.” The term
these conditions can be replicated. Common Shewhart chart is also used to recognize the
causes on the other hand are not inherently significant contributions of Dr. Shewhart to
good. Common cause variation merely means the field of SPC. A final note on the use of
that the process is stable and predictable (i.e., the word “control.” The ASQ was originally
predictable within the boundaries of the UCL called the American Society for Quality
and LCL). Just as you can have a special cause Control (ASQC). In 1997, the membership
that you might want to emulate, you can also voted to drop the word “control” from the
have common cause variation that is unac- organization’s name. This was to recognize
ceptable (e.g., when a patient’s blood pressure that quality was becoming a broader con-
is running at a very high level and staying cept and used in many other fields besides
there or when the wait time to see your family manufacturing where initially in the early
physician is consistent and predictable but it 1900s control was used as a key operative
is at such a high level that it is predictably bad word. Shewhart’s book, Economic Control
and unacceptable). of Quality of Manufactured Product (1931)
QI starts with making the correct decision provides a classic reference to the initial
about the variation that lives in your data. Walter use of the term “control.” So for a variety
Shewhart introduced the control chart and the of reasons I use the term Shewhart chart (s)
notions of common and special causes of variation in this text rather than control charts.
in 1924 (Hare, 2003). Since then SPC has become 2. The USL and LSL are frequently referred
the foundation for all successful QI initiatives. to in manufacturing as “tolerance limits”
It is a key component of the Baldrige criteria, Six and are also frequently referred to as the
Sigma, Lean, and International Organization for voice of the customer (VOC, i.e., what the
Standardization (ISO). Without a clear under- customer wants, needs, or expects from
standing of variation and its causes, however, the product or service). There are many
individuals and organizations will continue to different types of indices that have been
suffer from numerical illiteracy. developed to capture statistically process
Notes 251

capability. The three basic process capability precision for many of its indicators as the
indices are the process capability index manufacturing industry. So, when I use the
(Cp), the minimum process capability term process capability I am using it in a
index (Cpk), and the process capability general sense to describe the variation in
index to the mean (Cpm). The traditional the process as defined by the mean (CL)
statistical use of process capability (Cp) and the UCL and LCL. These numbers
is to indicate whether or not the process define how well the process is performing
can meet the predetermined specifica- relative to the target or goal.
tions (Blank, 1998). There are numerous 4. Some of my colleagues may disagree with
variations on the Cp statistic, all of which these guidelines. I have found over the years
are designed to help the quality control that there are two general issues that need
(QC) researcher investigate special causes to be balanced against each other: statisti-
and get the process to perform as closely cal purity and practicality. The science of
as possible to the expectations of the improvement (SOI) is as Shewhart referred
customer (i.e., the specifications). to it an “applied science.” Therefore, in my
3. I have calculated a Cp and Cpk statistics work I have always tried to balance the
only once for a healthcare indicator. It precision of statistical requirements with
was when I was helping to set up an out- a heavy dose of practicality. For example,
patient clinic designed to manage patients I have worked with wonderful people in
on anticlotting medication (i.e., warfarin the National Health Service (NHS) of
sodium). Several key indicators are used Scotland for over 12 years. During this
in assessing clotting issues. The PT, along time we have developed a variety of health
with its derived measures of prothrombin and social service measurement systems.
ratio (PR) and INR, are assays evaluating Most of the data are collected monthly and
the extrinsic pathway of coagulation. many of the indicators were not collected
This test is also called “ProTime INR” historically. So, we were starting out with
and “PT/INR” (MedlinePlus medical no data on selected indicators and had
encyclopedia, https://medlineplus.gov/ to build charts as we went along. In this
ency/article/003652.htm). Because there case, trail control limits were essential.
are defined therapeutic limits associated We also made a very practical decision to
with these measures they can be regarded use the first 6 months of data as baseline
as USL and LSL. These values would be set for indicators that had no history. Again
on the Shewhart charts as reference lines. some would argue that this is not enough
Then the patient’s actual results on the PT data to establish a baseline but it was
and INR would be plotted on the chart, sufficient to get us started on the road to
and the UCL and LCL of the patient would improvement.
then be compared to the USL and LSL. 5. In one of my measurement workshops,
Because we had both an USL and a LSL and a few years ago this confusion was high-
control limits the capability statistics could lighted very clearly. A young woman
be calculated to determine how well the near the front of the room raised her
patient was conforming to the therapeutic hand after I was done explaining that a
limits (USL and LSL) of the drug. But in sigma was not equivalent to a SD. She
most instances in healthcare settings, there had a bit of a wrinkled brow and looked
is only a single target or goal rather than concerned. She said, “I was told that the
the USL and LSL. Healthcare simply does UCL and LCL were calculated as SD. Is
not currently function at the same level of this not correct?” I drew the formula for
252 Chapter 9 Understanding Variation with Shewhart Charts

the SD on a flipchart and asked her if this for his 4-day seminars. They have written
is what she used to calculate the three a number of key books on QI including
sigma control limits. She said “Yes, that is The Improvement Guide (Langley, Moen,
what I was told to use.” I then proceed to Nolan, Nolan, Norman, & Provost, 2009),
politely tell her that the control limits on Quality Improvement Through Planned
her charts were wrong. The UCL and LCL Experimentation (Moen, Nolan, & Provost,
would either be too wide or too narrow 2012), and The Health Care Data Guide
if she used the SD of the data. She got a (Provost & Murray, 2011).
very strange look on her face, was quiet 7. When Dr. Ray Carey and I first started
for a moment, then burst out, “But this teaching control chart applications to
means I have been giving the charts to healthcare professionals in 1992, we
the senior management team and to the taught the traditional list of seven control
board! What am I going to do?” I asked charts. In 1995, we wrote a book that
her if anyone had ever noticed or com- described these seven charts and their
mented on the fact that the limits were not use (Carey & Lloyd, 2001). In December,
properly calculated. She responded, “No.” 2002, Dr. Carey and I taught a minicourse
I suggested that she learn how to make and two workshops on control charts
the charts correctly with SPC software at the 14th National Forum on Quality
(she was merely using Excel with no SPC Improvement in Health Care sponsored
add-on software that properly computes by the IHI. This was the first time in the
sigma limits) and then submit the correct 12 years that we had been teaching for
charts to the senior management team the IHI that we reduced the number
and board the next time around. She still of charts we taught from seven to five.
looked a little perplexed, however. She was The sessions were well received, and the
concerned that she would lose credibility participants found the more simplified
with the management team when they approach to be appealing. The two charts
found out the charts were wrong. I told her we dropped were the X-bar and R chart
that unless she tells them that her original and the np-chart. Our reasoning for doing
charts had the wrong limits it does not this was that the X-bar and S chart can
sound like anyone on the board or the be used in any situation that calls for the
senior management team had sufficient X-bar and R chart (when the subgroup
grounding in Shewhart charts to actually is greater than 2). The np-chart, which
discern that the charts were different. I is a count of the number of defectives,
told her to let me know how it went when requires equal subgroup sizes (i.e., the
she showed them the correct charts. She denominators), which do not happen very
wrote back and said that no one asked often in healthcare settings. The p-chart
any questions. can be used effectively, however, in any
6. API develops methods, works with leaders situation where an np-chart could be used.
and teams, and provides education and If there are equal subgroup sizes then the
training to help organizations improve their p-chart will have straight control limits.
products and services and to build their If, on the other hand, the measure has
capability for ongoing improvement. The unequal subgroups then the p-chart will
principals of API have worked in industrial, have what is known as “stair-step” control
educational, health, and social service limits. In this case, the control limits are
settings. They have worked extensively different for each data point. The closer
with Dr. Deming and provided support in size the denominators the smaller the
Answers to the Chapter 9 Exercises 253

“steps” between each of the control limits. 11. There are many good examples of how
If there are large differences between the people have ignored special causes when
denominators the “steps” will be greater they first occurred and then decided to
between the individual data points. deal with them when they popped up
8. I wrote a commentary in JAMA a few again. The terrorist attacks on our nation
years ago titled “A Matter of Time” (Lloyd on September 11, 2001 provide a classic
&G ­ oldmann, 2009) highlighting how example. Several years prior to 9/11
clinicians, researchers, patients, and the World Trade Center was bombed
improvement specialists all have very by terrorists. Although this seemed to
different views of time. To these four draw the nation’s attention for a while,
categories I could add management time, interest in this special cause soon faded
which focuses on monthly aggregates into the “old news” category and steps
of data. were not taken to extricate the factors
9. The control chart examples presented that led to the special cause. The condi-
in this chapter have been developed to tions for 9/11 were still existing within
demonstrate the five different charts. The our system. The September 11 special
substantive importance of the various cause, however, generated a completely
charts is not the focus of this chapter. The different reaction. Our nation mobilized
charts have been developed for heuristic not only to investigate the special cause
purposes, and the clinical or operational but also take steps to literally try to
impacts of the indicators presented on eliminate the origin of the special cause.
the charts are not the primary objective Every day there are stories in the news
in this chapter. Analysis and interpreta- that should prompt a discussion as to
tion of control charts are addressed in whether the event is a special cause or
Chapter 10. part of a common cause system. All too
10. The idea for creating this table came from often, however, we overreact to a special
Dr. James Benneyan of Northeastern cause and want to change the system
University in Boston. In a paper titled without fully investigating the reasons
“Design, Use, and Preferences of Statistical why it occurred. Other times, however,
Control Charts for Clinical Process Im- we ignore a special cause and “hope” that
provement” (September 16, 2001), he used it will not happen again. Hope is not a
a table to summarize the various charts. plan. Knowing how to appropriately react
After reading this paper, I realized that to common and special causes is a much
the table was something I had not used better approach than hoping a special
to summarize the control charts. I believe cause will not pop up again.
a table format works nicely to augment
the utility of the decision tree shown Answers to the Chapter 9
in Figure 9-12 and the textual details.
Dr. Benneyan has written extensively Exercises
on the topic of control charts in health This section provides the answers to the exer-
care and I would encourage readers to cises presented earlier in this chapter. The first
review his work. He can be reached at the (EXERCISE 9-1) deals with differentiating defectives
­following address: MIME Department, 334 from defects. EXERCISE 9-2 provides indicators
Snell Engineering Center, Northeastern that could be placed on either an X-bar and
University, Boston, MA 02115; phone 617- S chart or an XmR chart. The answers to these
373-2975; email benneyan@coe.neu.edu. two exercises are shown here.
254 Chapter 9 Understanding Variation with Shewhart Charts

EXERCISE 9-1 Defective or defect? You make the call! (Answers)

Defective Defect
Indicator (Classification) (Count)

1. Number of accidents per 1,000 employee days *

2. Number of errors per 25 food trays *

3. Percentage of AMI patients receiving aspirin within *


24 hours of arrival in the ED

4. Percentage of inpatient deaths each month *

5. Number of surgical complications per 1,000 surgeries *


performed

6. Proportion of hand hygiene observations done incorrectly *

7. Number of falls per 1,000 patient days *

8. Number of medication errors per 10,000 doses dispensed *

EXERCISE 9-2 You make the call: Is it an X-bar and S chart or XmR chart? (Answers)

X Bar and S XmR


Indicator Chart (I Chart)

Time to clean an inpatient room (in minutes) *

Patient satisfaction scores for subgroups of 15 patients in the *


outpatient clinic

Average turnaround time for all STAT labs done each day *

Cost for each normal delivery *

A diabetic patient’s 3x a day blood sugar readings *

Average length of stay for a subgroup of 20 ICU patients *

The distance (in feet) that a sample of 10 knee replacement *


patients can walk in 15 seconds
Answers to the Chapter 9 Exercises 255

The final (EXERCISE 9-3) brings together the of this situation is that if the target is to have
key issues related to selecting the most appro- all patients be seen in 30 minutes or less, the
priate Shewhart chart for different measurement 30-minute target actually needs to be the UCL
situations. In this exercise, the subgroup, type of of the X-bar and S chart not the average. If 30
data, and type of chart all need to be specified. minutes is the average on the chart you will
Depending on how you interpret the word- naturally have some patients waiting more
ing describing the situations in Exercise 9-3, than 30 minutes and some waiting less. A
you might think that a type of chart other target is useful on a chart but it needs to be
than that I have listed could be selected. A understood in light of the actual variation in
key leaning point for this exercise is that slight the process and the capability of the current
changes to the wording of the situation could process to achieve the target. The Shewhart
lead you to selecting a different chart. For chart can help you determine the magnitude
example, take a close look at situations 8 and of improvement needed to achieve the target
11 in Exercise 9-3. The wording for situation and but in the case of improving wait time, this
8 points you to select a p-chart because they is best accomplished by not turning variables
decided to focus on patients who had to wait data into attributes.
more than 30 minutes. Even though they had The most appropriate chart for each situation
variables data (i.e., time) they basically turned described in Exercise 9-3 is shown here. Note
it into attributes data because of the 30-minute that situation 16 is a trick question. Did you
target. They have taken the more powerful determine that a chart cannot be identified?
form of data (variables data) and relegated it Why? Because there is no subgroup identified in
to a binomial condition, over 30 minutes and the situation description. Remember, a Shewhart
under 30 minutes. They will never understand chart must have a subgroup and an observation
the true variation in wait time. What is the as minimum requirements. In this situation,
longest wait? You have no idea. All we know is there is no subgroup. But if the situation had
that a certain percentage of patients had to wait been worded as follows then we would have
more than 30 minutes. The longest wait could a subgroup: “You know the number of people
be 31 minutes or 13,184 minutes. The more who come to the ED complaining of chest
appropriate approach is found in situation 11. pain EACH MONTH and the number who are
Here they are taking a sample of eight patients actually diagnosed with an AMI or unstable
each day and recording their actual wait times. angina.” Now you would be able to determine
The chart of preference in this situation is which chart is most appropriate. In this situ-
the X-bar and S chart. We will now have the ation, the Shewhart chart of choice would be
average wait time for a given day and the SD the p-chart because we know the denominator
from this average. We can lay a separate line (i.e., the number of people coming to the ED
of the chart showing the target of 30 minutes. complaining of chest pain) and the numerator
This gives us much more information about (i.e., the number who were actually diagnosed
the process variation and how capable it is of with an AMI or unstable angina). Without a
achieving the target, which cannot be deter- subgroup, however, we cannot make a decision
mined by using the p-chart. The final aspect about which chart is most appropriate.
256 Chapter 9 Understanding Variation with Shewhart Charts

EXERCISE 9-3 You make the call!: Selecting the right chart (Answers)

Type of Type of
Situation Subgroup? Data? Chart?

1. Each day you record the number of films processed in Day V XmR
the radiology department.

2. Each day you record the number of films requested Day A p-chart
and the number that cannot be found in the
radiology library.

3. The number of inpatient restraints each month is Month A u-chart


placed over the total inpatient days each month.

4. Each day you pull a stratified random sample of Day V X-bar & S
15 CBCs and record the turnaround time (in minutes)
for each CBC.

5. The number of minutes it takes to get a stat med Stat med V XmR
order administered to the patient (order time to order
administration time).

6. Every 2 weeks you pull a stratified sample of 30 Two weeks A p-chart


medication orders and count the total number of
orders that have one or more errors.

7. The wait time in the ED (door to discharge) is tracked Patient V XmR


for each patient.

8. The clinic receptionist notes the time of check-in for Day A p-chart
each patient. The physician notes the time when he/
she first sees the patient in the exam room. An analyst
compiles the data daily and reports the percentage
of patients who had to wait more than 30 minutes.

9. The director of surgery keeps track of the total Week V XmR


number of surgical procedures performed each week.

10. The dietary department records the number of food Day A p-chart
trays that come back uneaten each day and the total
number of trays they produced for that day.

11. You are interested in the average time patients spend Day V X-bar & S
in your waiting area, so every day a student randomly
picks eight patients and measures their actual waiting
time in whole minutes.
References 257

EXERCISE 9-3 You make the call!: Selecting the right chart (Answers) (continued)

Type of Type of
Situation Subgroup? Data? Chart?

12. The ICU nurses want to evaluate the ventilator- Two weeks A u-chart
associated pneumonia (VAP) rate. So every 2 weeks
they record the total number of pneumonia episodes
and the total number of vent days.

13. Each week patient satisfaction scores for three Week V XmR
units are compiled and an average is calculated for
the three units.

14. The finance department tracks the total number of A request V XmR
business days it takes to process a vendor’s request for payment
for payment. Process time starts when the request
for payment is received in the finance department
and ends when the payment is sent (electronically or
posted in the mail) to the vendor.

15. Every week each medication order is checked against Week A u-chart
five potential types of errors. The total number of
errors for the week is divided by the total number of
orders submitted that week.

15. You know the number of people who come to the Unknown* A Unknown*
ED complaining of chest pain and the number who
are actually diagnosed with an AMI or unstable
angina.

*NOTE: Item 16 is a trick question. A subgroup is not specified. Without a subgroup you cannot make a decision about the most appropriate chart. If
this description indicated that “You know the number of people who come to the emergency department EACH MONTH . . .” you would have a subgroup.
The chart of choice would then be a p-chart.

Benneyan, J., R. Lloyd, and P. Plsek. “Statistical Process Control


References as a Tool for Research and Health Care Improvement.”
ASQ Statistics Division. Glossary and Tables for Statistical Journal of Quality and Safety in Healthcare 12, no. 6
Quality Control, 4th ed. Milwaukee: ASQ Press, 2005 (December, 2003): 458–464.
Benneyan, J. “Design, Use and Performance of Statistical Blalock, H. Social Statistics. New York: McGraw-Hill, 1960.
Control Charts for Clinical Process Improvement.” Blank, R. The SPC Troubleshooting Guide. New York: Quality
Unpublished paper, Northeastern University, ­September Resources, 1998.
2001. For access to this paper contact Professor James Blumenthal, D. “Total Quality Management and Physicians’
C. Benneyan, Ph.D., MIME Department, 334 Snell Clinical Decisions.” Journal of the American Medical
Engineering Center, Northeastern University, Boston Association 269 (1993): 2775–2778.
MA 02115; tel: 617-373-2975; fax: 617-373-2921; Carey, R. Improving Healthcare with Control Charts. ­Milwaukee:
e-mail: benneyan@coe.neu.edu. Quality Press, 2003.
258 Chapter 9 Understanding Variation with Shewhart Charts

Carey, R., and R. Lloyd. Measuring Quality Improvement Practitioners.” Quality and Safety in Health Care 17
in Healthcare: A Guide to Statistical Process Control (2008): 137–145.
Applications. Milwaukee: Quality Press, 2001. Montgomery, D. C. Introduction to Statistical Quality Control,
Deming, W. E. The New Economics, 2nd ed. Cambridge, 2nd ed. New York: John Wiley & Sons, 1991.
MA: Massachusetts Institute of Technology, Center for Nelson, L. “Interpreting Shewhart Average Control Charts.”
Advanced Studies, 1994. Journal of Quality Technology 17 (1985): 114–116.
Deming. “Quality, Productivity and Competitive Position.” Nelson, L. “A Control Chart for Parts-Per-Million Noncon-
Indianapolis, Indiana, August 11–14, 1992. forming Items.” Journal of Quality Technology 26,
Duncan, A. J. Quality Control and Industrial Statistics, 5th ed. no. 3 (1994): 239–240.
Homewood, IL: Irwin Press, 1986. Perla, R., L. Provost, and S. Murray. “The run chart: a simple
Gonick, L., and W. Smith. The Cartoon Guide to Statistics. analytical tool for learning from variation in healthcare
New York: Harper Perennial, 1993. processes” British Medical Journal Quality and Safety in
Grant, E., and R. Leavenworth. Statistical Quality Control. Healthcare 20, no. 1 (2011): 46–51.
New York: McGraw-Hill, 1988. Provost, L P., and S. Murray. The Health Care Data Guide.
Hare, L. “SPC: From Chaos to Wiping the Floor.” Quality San Francisco: Jossey-Bass, 2011.
Progress (July 2003): 58–63. Pyzdek, T. Pyzdek’s Guide to SPC. Vol. 1: Fundamentals.
Ishikawa, K. Guide to Quality Control. White Plains, NY: Milwaukee: Quality Press, 1990.
Quality Resources, 1989. Shewhart, W. Economic Control of Quality of Manufactured
Kume, H. Statistical Methods for Quality Improvement. Tokyo: Product. New York: D. Van Nostrand, 1931. Reprint,
Association for Overseas Technical Scholarship, 1996. Milwaukee: Quality Press, 1980.
Langley, G. L., R. Moen, K. M. Nolan, T. W. Nolan, C. L. Shewhart, W. Statistical Method from the Viewpoint of Quality
Norman, and L. P. Provost. The Improvement Guide: A Control. New York: Dover Publications, 1986.
Practical Approach to Enhancing Organizational Perfor- Western Electric Company. Statistical Quality Control
mance, 2nd ed. San Francisco, CA: Jossey-Bass, 2009. Handbook. Indianapolis: AT&T Technologies,
Levine, D., and D. Stephan. Even You Can Learn Statistics: Inc., 1985.
A Guide for Everyone Who Has Been Afraid of Statistics. Wheeler, D. Understanding Variation: The Key to Managing
Upper Saddle River, NJ: Pearson-Prentice Hall, 2005. Chaos. Knoxville, TN: SPC Press, 1993.
Lloyd, R., and D. Goldmann. “A Matter of Time.” Journal of the Wheeler, D. Advanced Topics in Statistical Process Control.
American Medical Association 302, no. 8 (2009): 894–895. Knoxville, TN: SPC Press, 1995.
Moen, R., T. Nolan, and L. Provost. Quality Improvement Wheeler, D., and D. Chambers. Understanding Statistical
Through Planned Experimentation, 3rd ed. New York: Process Control. Knoxville, TN: SPC Press, 1992.
McGraw-Hill, 2012. Woodall, W. “The Use of Control Charts in Health-Care
Mohammed, M., P. Worthington, and W. Woodall. “Plotting and Public-Health Surveillance.” Journal of Quality
Basic Control Charts: Tutorial Notes for Healthcare Technology 38, no. 2 (2006): 89–104.
CHAPTER 10
Applying Quality
Measurement Principles
T
his chapter provides case studies that Each case study is divided into two sections.
demonstrate the quality measurement First, the quality measurement challenge is
concepts and principles and tools discussed described in the Situation section. This is fol-
in previous chapters. The intent is to show brief lowed by a Discussion section in which options
practical applications of these ideas to daily work. for remedying the situation are presented. In
Some of the case studies are short and address is- some cases, the discussion section is fairly brief
sues related to indicator identification, operational (e.g., if the challenge is to clarify an operational
definitions, stratification, and sampling. All of the definition). In other cases (e.g., the Shewhart
examples discussed are based on real-life situations control chart case studies), the discussion
that I have encountered with improvement teams. section goes into a little more depth, because
Teams that have given me permission to show their the example involves the actual presentation,
actual data or tell their story are noted. In those analysis, and interpretation of data. Finally, note
cases, when a team asked to remain anonymous, that there is no particular order or grouping of
however, I have adjusted the story and their data the case studies relative to their content or the
slightly to honor their request for anonymity. issues being explored.

CASE STUDY #1: Predicting a Cardiovascular Event1


Situation
The medical director of a 165-member medical group decides to evaluate how well his physicians are at
predicting if a cardiac catheterization is necessary. Within the ranks of those who make these predictions
are two groupings: (1) nuclear medicine–trained cardiologists and (2) nuclear medicine–trained
radiologists. Inappropriate cardiac catheterization not only places patients at unnecessary levels of risk
but also wastes valuable resources and physician time that could be directed to more seriously ill patients.

(continues)
© Michal Steflovic/Shutterstock

259
260 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #1: Predicting a Cardiovascular Event (continued)

The total number of cases referred for possible cardiovascular (CV) intervention constitutes the
patient population. From this population, a subset of patients is actually referred for a thallium treadmill
stress test in nuclear medicine. Once the films are read, a certain number of patients are then identified
as being at high risk for coronary artery disease. These high-risk patients are recommended to the
cardiologist for a cardiac catheterization procedure. Once the catheterization procedure is completed a
determination is finally made as to whether the procedure was appropriate or not necessary.
The medical director is grappling with two questions. First, was the catheterization procedure
warranted? This is actually a rather straightforward question based on the data obtained during the
procedure related to the percentage of blockage discovered in the cardiac arteries. The second question,
however, is a more delicate one. Specifically, “What is the accuracy of the nuclear-trained cardiologist
compared to the nuclear-trained radiologist in predicting a cardiac catheterization that shows significant
(greater than 50% occlusion) coronary artery disease?” The literature is not definitive on this issue, but
the general belief is that the predictive percentage (i.e., true positives) should be somewhere between
70% and 90%. Again, this question can be answered objectively by reviewing the predictive percentages
of the cardiologists and the radiologist. But, here is the medical director’s real dilemma. There are seven
physicians who read the thallium stress tests and make recommendations to proceed or not proceed
with a cardiac catheterization. Six of these physicians are cardiologists and one is a radiologist. Historically,
the cardiologists were the only ones who read the thallium stress tests results and determined whether
a catheterization was appropriate. When the radiologist was added to the mix last year the group
dynamics began to change. Several of the cardiologists took issue with allowing the radiologist to make
decisions about cardiac issues. The tension in the group was increased when the most senior cardiologist
presented data from a 2-week period that he claimed “proved” that the radiologist had a “significantly”
lower predictive percentage than all of the cardiologists. He even had a p-value to “prove” significance
of the differences at the 0.05 level. This situation became even more complicated because all of the
cardiologists were male and had been with the medical group for years. The new radiologist was a
female and had been with the group for only a year and a half. You can see that this situation moved
very quickly beyond the data and was going to require not only analytic skills but also a delicate dose of
social psychology. The medical director felt that the data analysis prepared by the cardiologist was not
providing an accurate picture of the results. This is where I entered the story to shed some light on the
analytic side of things. I left the social psychology challenge and group dynamics to the medical director.
I asked the medical director whether he could obtain data for the entire time period in which
all seven physicians were employed by the medical group. When we next met he provided data for
the past 18 months. The data are shown in TABLE 10-1. During the year and a half period, the seven
physicians read a total of 505 stress test results (column 2 in Table 10-1), of which 303 (60%) were done
in an outpatient setting and the remaining 202 (40%) were performed in an inpatient setting.

Discussion
■■ When looking at an outcome that is either correct or not correct, what type of data have we
collected—attributes or variables?
Indicators that are based on only two possible outcomes (e.g., correct or not correct) form a
binomial distribution, which is a form of attributes data. In this present case, the measure of
interest is whether the CATH procedure was appropriate (yes or no?).
■■ Do we have equal or unequal subgroups?
The subgroups are of unequal size because each physician reviewed a different number of stress
test results during the 18 months (i.e., between 58 and 85, column 2 in Table 10-1).

(continues)
Applying Quality Measurement Principles 261

CASE STUDY #1: Predicting a Cardiovascular Event (continued)

TABLE 10-1 Cardiac CATH data by Physician

Column 1 Column 2 Column 3 Column 4 Column 5 Column 6


Physician Total Number of Percentage Number Percentage
Number Number Stress Test of Stress of CATHs of CATHs
of Stress Results Test Results Determined Initially
Test Initially Initially to Be Thought to
Results Thought to Thought to Appropriate Be “High
Read Be “High Be “High After the Risk” that
by Each Risk” and Risk” Rec- Procedure Were
Physician Recom- ommended Actually
mended for a CATH Found to Be
for a CATH (Col 3/2) Appropriate
By Each After
Physician the CATH
Procedure
(Col 5/3)
1 77 50 64.9 30 60.0

2 69 59 85.5 44 74.6

3 85 49 57.6 30 61.2

4 58 35 60.3 28 80.0

5 63 56 88.9 23 41.1

6 81 57 70.4 32 56.1

7 72 41 56.9 34 82.9

Total 505 347 68.7 221 63.7

■■ What is the preferred chart for these data?


The preferred chart for this type of data is the p-chart. The key outcome indicator, the positive
predictive percentage, is a binomial condition (i.e., the prediction that a CATH procedure was
warranted is either “yes, it was appropriate” or “no, it was not appropriate”). A p-chart requires a
numerator and a denominator and produces either a percentage or proportion. There are two
key percentages that can be considered in this case:
1. The percentage of stress test results initially thought to be high risk and recommended
for a CATH (column 4 in Table 10-1). This measure reflects what each physician “thought”

(continues)
262 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #1: Predicting a Cardiovascular Event (continued)

the results of the stress test were revealing. If the physician thought there was sufficient
blockage in one or more of the patient’s arteries then the patient was recommended for a
CATH procedure.
2. The second percentage is the one the medical director was most interested in. This is what they
referred to as the “predictive percentage.” It reflects how well each physician was able to predict
a true positive (i.e., the CATH was deemed appropriate after the procedure was completed).
Specifically, this measure, shown in column 6 of Table 10-1, is the percentage of CATHs initially
thought to be high risk that were actually found to be appropriate after the CATH procedure.
The numerator for this measure is the number of catheterizations that were determined to
be appropriate (i.e., the cardiologist confirmed greater than 50% stenosis). These numbers are
shown in column 5 of Table 10-1. The denominator is the number of catheterizations performed
(column 3 in Table 10-1). The objective is to be as close to 100% as possible, which means that
every catheterization that was done was required and determined to be appropriate by the
cardiologist after the procedure was completed. Every time a catheterization is performed and
it is discovered after the procedure that the patient did not need the catheterization, this is
considered to be an incorrect reading of the thallium stress test and therefore a wrong decision
(i.e., a defective decision), which again is the appropriate concept to be placed on a p-chart.
■■ Why are the dots on the p-charts not connected by a line?
The dots are not connected because the seven physicians do not represent a process
displayed over time. Most control chart examples show some unit of time (e.g., day, month,
or patients in chronological order) displayed along the horizontal axis. This is the traditional
format for a control chart. In this example, however, the horizontal axis displays the individual
physicians numbered 1–7. This p-chart is known, therefore, as a “comparative chart” because
it is comparing each physician’s percentage of correctly interpreted films at a fixed point in
time. Prior to making this comparative chart, a p-chart for each physician’s performance over
the last 18 months should be prepared. These physician-specific charts (not shown) should
all reflect common cause variation before proceeding to make the aggregated comparative
charts shown in FIGURES 10-1 and 10-2. If one or more of the physician-specific charts
contained special causes, then the comparative p-chart should not be constructed. The
reasoning behind this decision is straightforward: if the constituent parts (i.e., the individual
physicians) reflect special cause variation, then the aggregated comparative chart will
similarly reflect an unstable process. For additional applications of the comparative chart see
Carey (2003).
■■ Is there any value in knowing the numbers of films done on an outpatient and an inpatient basis?
How could these data be used to further our understanding of the accuracy of the reading process?
The data in Table 10-1 show the total number of stress tests read by each physician (column 2).
Once the overall pattern has been analyzed, the next step would be to see if there are any notable
differences between the positive predictive percentages for inpatient and outpatient settings.
Stopping any analysis after analyzing only the totals can be misleading. Stratification enables you
to drill down to the constituent parts of a process. In this case, there could be differences between
the inpatient and outpatient settings because of differences in the severity of the patients or
possibly the amount of time the physicians spend on evaluating inpatient and outpatient stress
tests. Various theories would have to be explored with the subject matter experts.
■■ What do you conclude about the seven physicians who read the films? Do they form a common cause
system or are there special causes present?

(continues)
Applying Quality Measurement Principles 263

CASE STUDY #1: Predicting a Cardiovascular Event (continued)

Percent of stress test results initially thought to be of “high risk” and


recommended for a CATH (p-chart)
100%
90%
UCL
80%
CL = 68.7%
70%
60%
Percent

LCL
50%
40%
30%
Special cause
20% (3 sigma violation)
10%
0%
1 2 3 4 5 6 7
Physician Number

FIGURE 10-1 Percentage of stress test results initially thought to be of “high risk” and recommended
for a CATH

Percent of CATHs initially thought to be of “high risk” that were actually found to
be appropriate after the CATH proceure (p-chart)
100%
90%
80% UCL

70%
CL = 63.7%
60%
Percent

50%
40% LCL

30%
20% Reference range
10% (70%-90%) Special cause (3 sigma violation)
0%
1 2 3 4 5 6 7
Physician Number

FIGURE 10-2 Percentage of CATHs initially thought to be of “high risk” that were actually found to be
appropriate after the CATH procedure (p-chart)

Figures 10-1 and 10-2 provide the p-charts for the performance of the seven physicians. We see
in Figure 10-1 that physician 5 is a special cause relative to the other six physicians. This physician
predicted that 88.9% of the thallium stress test results required a cardiac CATH. All of the other

(continues)
264 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #1: Predicting a Cardiovascular Event (continued)

physicians’ initial conclusions fell between the upper control limit (UCL) and the lower control limit
(LCL). Note that physician 2’s value of 85.5% fell exactly on the UCL but did not exceed it. This is
not a special cause. So, physician 5 had the highest percentage of recommended CATHs. Now the
question becomes, how did each physician’s positive predictive percentage compare to what they
initially concluded about the necessity of a CATH? The answer to this question can be found in
Figure 10-2. This p-chart shows us the percentage of CATHs initially thought to be high risk that were
actually found to be appropriate after the CATH procedure. The positive predictive percentage for
physician 5 is below the LCL signaling a special cause. This physician initially thought that 88.9% of
the stress tests he or she reviewed required a CATH yet only 41.1% of them actually were appropriate
for a CATH after the procedure was completed. The remaining six physicians’ percentages all fell
between their UCL and LCLs. Note that because this is a p-chart, each dot (which represents
a physician) has its own control limits. The width of these limits is based on each physician’s
denominator (i.e., the total number of stress tests results reviewed that were initially thought to
be appropriate for a catheterization, column 3 in Table 10-1). The more stress test results read, the
tighter the control limits, whereas the smaller the denominator, the wider the limits. Physician 4, for
example, had the widest limits because he or she had only 35 films as a denominator. On the other
hand, physician 2 had the largest denominator with 59. Even though physician 5 falls below the LCL,
this does not mean that this physician’s performance is necessarily “bad” and that of the other six
physicians is “good.” Good and bad are not appropriate concepts for interpreting Shewhart charts.
The more appropriate terms are stable and predictable (common cause variation) versus unstable
and unpredictable (special cause variation). With one physician falling below the LCL, therefore, the
primary interpretation of this chart is that when it comes to the percentage of positive predictive
outcomes, this group of physicians does not exhibit common cause variation. Stated otherwise they
do not perform as a system. The secondary interpretation is that one physician’s performance (i.e.,
physician 5) is notably different from the rest (i.e., this physician is a special cause). Why he or she
is different or whether this difference is clinically meaningful remains to be seen. Remember that
control charts do not tell you why a process is unstable or not performing in an acceptable manner.
The control chart only tells you (1) whether the process is stable and predictable and (2) what
impact an intervention had on the performance of the process.
The two p-charts for these measures are shown in Figures 10-1 and 10-2
■■ Of what value is the reference range for positive predictive percentage accuracy (70–90%)?
The reference range provides an opportunity to see how well this process is performing
against an established norm or comparative reference data. These numbers also can be viewed
as the lower specification limit (LSL = 70%) and upper specification limit (USL = 90%) for
the process. In this case, if the UCL and LCL coincide with the USL and LSL, then the process
would be performing in line with the expectations for the process. When the control limits are
wider than the specification limits (SLs), then the process is not performing in line with the
expectations for the process. Conversely, if the process control limits are less than the SLs, then
the process is functioning better than expected. As you can see from Figure 10-2 , only three
physicians (2, 4, and 7) are performing within the reference range of 70–90%. The remaining
four are performing below the LSL of 70%, and one of these has fallen below the LCL. The
challenge in this case is deciding whether the SLs (i.e., 70–90%) are clinically appropriate for
the population of patients being served by these physicians. For example, this range allows for
10% error because 90% is the USL. Why would we not expect this process to function in the
range of 70–100%? Why was 70% selected as the minimum acceptable standard? And why

(continues)
Applying Quality Measurement Principles 265

CASE STUDY #1: Predicting a Cardiovascular Event (continued)

are these both whole numbers that are divisible by 5? This seems too convenient. Why not a
spread of 73.4% to 91.9%? As I mentioned previously targets and goals are frequently set not in
a statistical manner but because people have a tendency to think in terms of whole numbers
that are divisible by 5. When a team starts to discuss these questions, then they have opened
the door for a dialogue as to whether the process is capable of achieving the targets and goals
being set relative to the current performance of the system. The Shewhart chart does not
necessarily answer all the questions, but it should certainly serve as a foundation for raising
questions. One final note to this story. Physician 5 was the senior cardiologist who claimed that
the female radiologist (physician 7) had a “significantly” lower predictive percentage than all
of the cardiologists. As can be seen from Figure 10-2, the radiologist actually has the highest
positive predictive percentage. When I presented the charts to the medical director he was
pleased with the analysis and the results. I told him that the charts were the easiest part. I said
that he now had the difficult challenge of explaining these results to the group of physicians.
He looked at me and started to smile. “Yes” he said, “you and I do have a challenging meeting
awaiting us.”

CASE STUDY #2: Sampling Central Line Infections


Situation
The director of infection control is interested in getting a clear understanding of the central line
infections occurring within her hospital. So each day she and her staff review all the charts that
have notes indicating that the patient has a central line. They look for any sign of infection noted in
the charts. If they detect any potential characteristics of an infection they request pathology to test
cultures from any suspicious lines to see if an infection is actually present. Each day a patient has a
central line in place it is counted as 1 central line day. Because keeping track of central line days is a
very time-consuming task, the staff have been considering doing a sample of the central lines but are
concerned that they might miss something.
After attending a national infection control conference, the director of infection control tells her
staff what she learned about sampling at one of the conference’s workshops. The presenter told the
group that the easiest method of sampling for central line infections or any infection for that matter is
to take a sample of 2 random days during the month and review all the central lines on these 2 days.
Then she said the presenter told them to take the total number of central line infections found in these
2 days and multiply this number by 15 to get the estimated number of central line infections for the
month (the reasoning being that 2 days × 15 = 30 days in the month). The total number of central line
days could be estimated in a similar manner. Does this seem like a reasonable approach to sampling?
Will such an approach produce data that properly represents the central line infection process? What
challenges do you see with this approach to sampling?

(continues)
266 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #2: Sampling Central Line Infections (continued)

Discussion
Although I do not disagree with the idea of possibly applying sampling to central line indicators,
several points need to be considered prior to devising a sampling strategy:
■■ First, I would want to have a discussion with the director of infection control to discover the
number of central lines being inserted and managed each day or week. By knowing the actual
volume we are dealing with we can decide if sampling is even necessary. For example, if only
five central lines are inserted each week, which would not be even one each day, then I would
recommend taking all occurrences and not engaging in sampling. When volume increases and
the clinical staff is inserting 10 or more new central lines a day and attempting to manage over
50 each week, then we can have a discussion about a sampling strategy.
■■ Related to the issue of volume is the subgroup. When data are pulled will they be organized as a
subgroup by day, week, every 2 weeks, or month?
■■ Another discussion should occur about the primary concern with central lines. This could have
an impact on a sampling strategy. For example, is the concern with the insertion of a central
line, the maintenance of the central line, or both issues? If the director of infection control wants
to stratify the entire central line population into insertion and maintenance categories then
the volume of patients in each category would have to be reviewed and decisions made as to
whether a total enumeration of all cases is considered or a sample.2
■■ Finally, if sampling is determined to be a viable path forward then a discussion about the
advantages and disadvantages of various sampling approaches needs to occur outlined
in Chapter 4. What are the sampling options for studying central line infections? The first
thing I would do is not to follow the recommendation of the individual who presented at
the infection control conference. Selecting 2 random days out of the month, reviewing all
the central line cases in the hospital on these days, and noting any with infections then
multiplying this number by 15 to get a monthly estimate of the number of central line
infections and central line days will produce misleading numbers. I believe the director of
infection control would be better off just picking a simple random sample of 5–10 patients
(possibly stratified by new central lines inserted versus those that are in a maintenance status)
each day or each week (depending on the volume of central lines) and then applying the
review criteria for central line infections to this sample of patients. Another approach could
be to select a random day each week and review all the central line cases on these selected
days. If the hospital had a large volume of central line patients it would then be feasible to
draw a random sample on the randomly selected day of the week. All of this could be figured
out ahead of time so that a schedule could be produced to minimize the burden on the staff
pulling the sample of patients and conducting the reviews. Now some readers might say that
this approach sounds similar to what the speaker at the infection control conference was
suggesting. It is in the sense that selecting a random set of days is not a bad idea. The issue I
have with the speaker’s suggestion is that selecting only 2 random days in a month is probably
too few days to get an accurate picture of the central line process. Selecting a random day
each week provides a more representative sampling approach. What I totally disagree with,
however, is the speaker’s suggestions to take the central line infections on the 2 randomly
selected days and merely multiplying the results for these 2 days by 15 to get an estimate of
what the month’s infection might be. Finally, given the seriousness of central line infections, it
might be advisable not to apply sampling to this indicator at all.
Applying Quality Measurement Principles 267

CASE STUDY #3: Sampling Medicare Insurance Audits3


Situation
You are the director of performance improvement at a medium-sized hospital. You receive a call from
the manager of admissions and registration asking how many Medicare claims her staff needs to
pull in order to obtain a “significant” sample that “proves they are 95% accurate.” In a single day, there
are around 150 new Medicare claims that are eligible for review. This includes both inpatient and
outpatient activity. You think this seems like one of those questions that you really cannot answer with
a single number. You also feel there is a need to clarify a few questions about the ultimate objective
behind this request. What would you recommend to the woman in registration?

Discussion
The first thing the manager of performance improvement should do is to find out what the ultimate
purpose of this request is. Her initial request has some confusing and conflicting aspects. She said she
wants a “significant” sample that “proves they are 95% accurate.” People frequently confuse accuracy,
reliability, and significance. These are three very different concepts. What is the operational definition
the manager is applying to each of these concepts? For example, I am not sure what the manager
means when she says she wants a “significant” sample. Significance is not a relevant term when
discussing sampling options. The key concept for sampling is representativeness (i.e., is the sample
representative of the population?). Significance, at least from a statistical point of view, is used when
conducting tests of significance to determine if two numbers are statistically different from each
other. This is when the p-value comes into play and we talk about significance at the 0.05 or 0.01
level of significance. Sometimes people will use the term significance as a synonym for meaningful or
important or even accurate. Another aspect of sampling that sometimes comes into play is the notion
of “power.” Specifically researchers will conduct a power analysis to determine the sample size required
to detect an effect of a given size with a given degree of confidence. Statistical power is affected chiefly
by the size of the effect of an intervention on the test subject and the size of the sample used to detect
it. Large samples offer greater test sensitivity than small samples. But it is very unlikely that the manager
was thinking of these statistical references when she was using the word “significance.” Furthermore,
she has also stated that she wants to “prove that they are 95% accurate.” Accurate in terms of what?
Are they confusing accuracy with completeness? A Medicare claim could be 100% complete but also
100% inaccurate. Why 95% accurate? Shouldn’t all claims be 100% accurate? She may be thinking
something to the effect that if she pulled repeated samples of Medicare claims, these claims would
be “representative” of the entire population 95% of the time. If this is the case then accuracy is not the
concept of interest and representativeness is. I do not mean to confuse you with these various terms.
My point is that people frequently use terms that have rather precise definitions statistically but they
are referring to the terms in a broad and general sense. The challenge for the improvement specialist is
to explore what the requester means by the terms. Once the terms and objectives for collecting data
have been clearly defined then it is time to discuss the specifics of the data collection milestone.
If we look at just the numbers, however, we can make some recommendations on a sampling
strategy. Given that about 150 new Medicare claims are processed each day, there is clear justification
to consider a sampling approach when doing the claims reviews to determine “accuracy.” Here are
some key considerations when discussing sampling options:
■■ Depending on the discipline in which you are trained there are slightly different positions on
how much data you need before a stable distribution starts to form. Generally speaking, a
distribution will start to form and be reasonably stable with about 25–30 data points. By the

(continues)
268 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #3: Sampling Medicare Insurance Audits (continued)


time you get 50 data points most disciplines will agree that there are enough data to form a
reasonably stable distribution.
■■ The manager was correct in asking what is practical. Data collection requires time and effort. So it
is important to make sure that the data collection procedures are not creating a burden on staff.
■■ When it comes to answering the question “how much data do I need to collect?” there is no
simple number as an answer. Sampling should be structured around a very simple guideline—
pull as much as you must and as little as you dare. In the current situation with about 150 new
Medicare claims a day, it would be reasonable to first of all have a discussion about stratifying
these claims into a few relevant categories. Are all the charts the same? No, they are not all the
same. So, let us look at how to divide them into more relevant buckets (e.g., medical versus
surgical, male versus female, age splits, complications, readmissions versus first-time admits,
or total costs). With around 150 new occurrences every day there are clearly enough data to
stratify into relevant categories. The staff will know best what these categories are. When the
stratification categories are selected and applied to the data, the volume within each category
should be evaluated to determine the sample size that is feasible within each category.
■■ Once the stratification issues are addressed a random sample of the Medicare claims could be pulled
each day within each category. A random sample of three to five claims from each stratification
category each day would be sufficient as a starting point. As the samples are drawn, they should be
regularly reviewed and studied to see how well they match the characteristics of the stratification
category they are from and the total population. Sampling decisions are not a one and done
situation. Sampling plans need to be evaluated continuously throughout the life of an improvement
project to make sure the sample data are in fact representative of the total population.

CASE STUDY #4: Tracking Patient Falls


Situation
Eight months ago your facility introduced a new program to reduce the number of inpatient falls.
The key question being asked by the Quality Committee of the board is: “Did this new program have
the desired impact?” To answer this question, you decide to take advantage of the knowledge and
experience of the team that created the new program. They have been working on improving this
process since last year and have (1) developed a standardized operational definition of a patient fall
(which was probably a major accomplishment in and of itself ), (2) developed a falls assessment tool,
(3) established and implemented an ongoing data collection plan, (4) established a baseline, and
(4) prepared Shewhart charts to determine whether a change has occurred.

Discussion
Before deciding whether the falls prevention program made a difference we need to decide which
Shewhart chart is most appropriate for the falls data. The key points in selecting a chart for tracking
inpatient falls are as follows:
■■ Patient falls are not something that is desired. Therefore, a patient fall can be classified as
attributes data and viewed as being either a defective or a defect. If falls are viewed as a

(continues)
Applying Quality Measurement Principles 269

CASE STUDY #4: Tracking Patient Falls (continued)

defective then the indicator would be the percentage of patients who fell once or more while
in the hospital. If a patient falls at any time during a hospital stay it is considered an undesirable
outcome (i.e., a defective). But if we take this approach we do not care about how many times
a patient falls because a percentage is a binomial opportunity (i.e., the patient fell, yes or no,
and we do not care if they fell multiple times). If we take this approach a p-chart would be the
chart of choice. Because the same patient can fall more than once, however, most improvement
teams want to know the total number of falls, including duplicates for the same patient. When
we take this approach each fall is considered to be a defect that can occur more than once for
the unit of analysis, which in this case is a patient. There are two options for analyzing defects,
the c-chart, which tracks the number of falls, and the u-chart, which tracks the falls rate (i.e., so
many falls per 1,000 patient days). The reader should refer to the Shewhart chart decision tree in
Chapter 9 to review the various charts and the conditions for selecting each one.
■■ The chart selected by the improvement team in this case study to track inpatient falls was the
c-chart. They did this because they decided that there was basically an equal opportunity for a
patient to fall any day of the week. If they had decided that there was not an equal opportunity
for a fall each day of the week then the chart of choice would have been the u-chart. Again, refer
to Chapter 9 for more details on the differences between the c- and u-charts.
■■ Each data point on the charts shown in FIGURES 10-3 through 10-6 represents the total number
of inpatient falls occurring within the facility each month. Figure 10-3 presents the baseline data
for months 1–24 before the components of the falls prevention program were tested. Notice that
the falls process reflected common cause variation with an average of about 53 falls a month. This
means that the falls process is stable and therefore predictable. The control limits indicate that if
nothing was done to try to reduce the number of falls they could predict falls to go as high as 75 a
month or as low as 31. The UCL, LCL, and the mean or centerline on the chart (CL), therefore, define
what this process is capable of producing given current operating conditions. The primary question
for the team and the Quality Committee of the board to discuss is a simple one, “Given current
performance of the falls process, as defined by the baseline ULC, LCL, and CL, is it acceptable?”
80
Baseline (months 1-24)
UCL UCL = 74.7
70

60
Number of Falls

CL = 52.9
50

40

30 LCL LCL = 31.1

20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Month

FIGURE 10-3 Number of falls by month, baseline months 1–24 (c-chart)

(continues)
270 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #4: Tracking Patient Falls (continued)

80
Baseline (months 1-24) UCL = 74.7
UCL
70

60
CL = 52.9
Number of Falls

50

40

30 LCL
LCL = 31.1
20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Month

FIGURE 10-4 Number of falls by month, baseline months 1–24 with “frozen” control limits and mean
extended (c-chart)

After the falls


Baseline (months 1-24) prevention program
80
UCL = 74.7
UCL
70
CL = 52.9
60
Number of Falls

50

40

30 LCL
LCL = 31.1
20 Note: The rules for detecting a special cause are
applied to the new data (months 25-32) with the A special cause: 8 data points below
10 “frozen” limits and CL as reference points. the CL indicating a shift in the process
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Month

FIGURE 10-5 Number of falls by month, baseline months 1–24 with “frozen” control limits, mean
extended and the number of falls after the falls prevention program, months 25–32 (c-chart)
■■ Everyone agreed that having about 53 falls a month is unacceptable. So, the falls prevention
team initiated its work. Because the baseline reflects only common cause variation they “froze”
the UCL, LCL, and CL of the baseline chart and extended them into the future as reference lines
against which the new data will be plotted and evaluate. They will use the five rules for special
causes to determine if a special cause is detected and their falls prevention program has been
able to reduce the number of falls. Figure 10-4 shows how the baseline parameters have been
extended into the future as reference lines. (See Chapter 9 for more on special cause rules.)

(continues)
Applying Quality Measurement Principles 271

CASE STUDY #4: Tracking Patient Falls (continued)

100

90

80 After the falls


Baseline (months 1-24)
prevention
70 UCL = 74.7 program
UCL = 64.1
Number of falls

60

50 CL = 52.9
CL = 44.1
40

30
LCL = 31.1 LCL = 24.2
20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Month

FIGURE 10-6 Number of falls by month, baseline months 1–24 compared to the new control limits and
CL for the number of falls after the falls prevention program, months 25–32 (c-chart)

■■ Figure 10-5 shows the new data (months 25–32) compared to the baseline UCL, LCL, and CL.
The team observed that eight data points were below the CL, which is a signal that there has
been a shift in the process in the desired direction (i.e., fewer falls each month).
■■ The final chart (Figure 10-6) demonstrates how to phase the data into two separate charts.
The left side of the c-chart shows the baseline period (months 1–24) with its respective UCL,
LCL, and CL. The right side of Figure 10-6 shows the new falls process. Not only has the process
shifted to a lower level with a new average of roughly 44 falls a month down from 53, but
the UCL and LCL of the new process on the right side of the chart are a little closer together
indicating less variation in the process.
The information presented in Figure 10-6 does not tell the team whether or not it should continue
its efforts to reduce the number of inpatient falls. It merely tells the team that the initial efforts have
been successful and that the current process produces fewer falls each month than the baseline
process. Should the team decide to introduce another intervention to further reduce the number of
falls? The answer to this question lies with the team and the target or goal they have established as their
aim. As owners of the process, they need to decide if the process is capable of further improvement
and if the resources are available to support this work. If the facility is part of a system, for example,
they could obtain comparative reference data (norms) and see how their performance compares to
that of the other facilities in the system. If such data do exist, then there is an opportunity for internal
benchmarking. The team would also be well advised to continue monitoring the number of patient falls
over the coming months, even if they decided not to introduce any further improvement efforts. The
reason is that the team should be responsible for the performance of the process and confident that
the observed improvement in months 25–32 is being sustained during future months. The control chart
provides the conceptual and statistical foundation for doing this.
272 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #5: Pressure Ulcer Prevention


Situation
You are the director of care management at a 275-bed hospital and have become curious about
the incidence and prevalence of pressure ulcers (PUs) within the hospital. Pressure ulcers seem
to be one of those topics that everyone takes for granted. It is not unusual, for example, to
hear staff say, “They will occur and there is not a lot we can do about it.” You believe, however,
that there are some things that can be done to reduce the occurrence of PUs, improve clinical
outcomes, and save the hospital money. You begin by preparing a brief summary of the two
types of PUs. The first is nosocomial or hospital-acquired PUs (incidence). The second is known as
community-acquired PUs (prevalence). This summary is distributed to the inpatient nursing staff,
the physical therapists, and the staff at your skilled nursing facility to increase awareness and start
a dialogue on this topic. Next you develop a measurement plan. The primary outcome measure
is the percentage of patients that develop nosocomial PUs during their admission (the incidence
indicator).
The improvement team created to work on reducing hospital-acquired PUs decides that it will
introduce a new type of mattress with a pressure-relieving surface and a downward-slanted heel slope
as their first intervention. They also begin working on developing a new protocol for assessing patients
and preventing PUs before they develop. Based upon a national comparative norm of 7%, you set off
to establish a baseline for the incidence of nosocomial PUs. After establishing a baseline, they began
to test the two initial improvement strategies they developed and see if they made a difference in the
proportion of hospital-acquired PUs. FIGURES 10-7 through 10-9 show the baseline and the results after
the interventions were introduced. What conclusions do you make about the effectiveness of these
two interventions? Did one work better than the other?

0.7
Proportion of nosocomial pressure ulcers

0.6

0.5
UCL = 0.449
0.4

0.3
CL = 0.241
0.2

0.1

LCL = 0.033
0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Week

FIGURE 10-7 Baseline proportion of patients with nosocomial PUs (p-chart)

(continues)
Applying Quality Measurement Principles 273

CASE STUDY #5: Pressure Ulcer Prevention (continued)

Baseline
0.7
Proportion of nosocomial pressure ulcers

0.6
Changes in the
process began here
0.5
UCL = 0.471

0.4 A special cause – 8 or more data


points below the centerline
0.3
CL = 0.241
0.2

0.1
Baseline (weeks 1-17)
LCL = 0.010
0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Week

FIGURE 10-8 Proportion of nosocomial PUs comparing the baseline period to the new process with the
average control limits and the CL (average) based on weeks 1–17 (p-chart)

Baseline (weeks 1-17) Results after changes


0.7
Proportion of nosocomial pressure ulcers

0.6

0.5

0.4
UCL = 0.373

0.3

0.2
CL = 0.170

0.1
National comparative norm of .07

LCL = 0.000
0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Week

FIGURE 10-9 Proportion of nosocomial PUs comparing the baseline (weeks 1–17) period to the new
process (weeks 18–28) (p-chart)

(continues)
274 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #5: Pressure Ulcer Prevention (continued)

Discussion
Figure 10-7 shows the baseline data for the incidence of nosocomial PUs. Note that this is presented
as a proportion, not as a percentage.4 The baseline p-chart reflects common cause variation with a
process average proportion of 0.24 or 24%. The process could operate as high as the UCL 0.449 (45%)
or as low as 0.033 (3.3%). Because the process exhibits common cause variation, we can expect the
process to continue to perform within these parameters until (1) an unexpected special cause enters
the picture and makes the process unstable, or (2) the team plans an intervention that they believe will
move the process to a new level of performance.
The improvement team decided to choose the second option (i.e., the improvement route).
Now the question is whether the interventions they selected had a positive impact on nosocomial
PU prevention. Figure 10-8 provides the answer to this question. This chart displays the baseline data
shown in Figure 10-7 (i.e., weeks 1–17) on the left side of the chart and the proportion of nosocomial
PUs after the team introduced the interventions on the right side of the chart (weeks 18–28). The average
UCL and LCL as well as the average CL are all computed on the baseline period; they have been “frozen”
and then extended across the period when the changes were being tested (note that the averages for
the control limits and the CL in Figures 10-7 and 10-8 are the same). It is quite clear that the process has
shifted to a new level of performance. This is confirmed by having eight or more data points below the
CL from the baseline period.
Figure 10-9 displays the final step in presenting the data for this case study. Notice that this
chart has two sets of control limits. The left side of the chart is the same baseline we observed in the
previous charts. The right side of the chart (labeled Results after changes) reveals what happened to
the process after the interventions were tested. Not only did the control limits narrow but we also
see that the centerline or process average has dropped to 0.17 or 17%. Both sides of the chart display
common cause variation.
The team has just cause to celebrate its success. But when you compare the new process average
to the national comparative norm of 7%, you realize that there is still considerable work to be done. The
next steps would be to continue to educate staff on the new protocol for detecting and preventing
PUs. Because it is easier to change mattresses than it is to change the behaviors of individuals, it
might be that the impact of the protocol training will take longer to have an influence on the process
average. The control chart methodology provides, however, a very easy and simple way to track the
interventions as the team progresses.
The final question to be addressed is, Did one intervention work better than the other? The team
simultaneously tested the introduction of new mattresses as well as the use of a new protocol for
assessing patients for PUs and a positive response was observed (i.e., fewer PUs). But did one of these
interventions work better than the other? Was the effect seen in Figure 10-9 a result of the mattresses,
the protocol, or a combination of both? Because the team introduced both changes at the same time
it is difficult to tease apart the individual effect of each intervention and the combined interaction of
the two interventions with a single control chart. In order to decide whether one intervention had a
larger effect on reducing PUs than the other or whether it was a combination of the two factors being
introduced simultaneously, the team would need to setup and run a planned experiment (Moen,
Nolan, & Provost, 2012). A planned experiment would allow the team to determine statistically whether
one of the factors (e.g., the mattresses) had a larger effect on the reduction of PUs than the protocol.
The planned experiment would also allow the team to determine the combined effect of introducing
the mattresses and the protocol. At the Institute for Healthcare Improvement (IHI), we teach planned
experimentation (PE) in our Improvement Advisor Professional Development Program. It is not a topic I

(continues)
Applying Quality Measurement Principles 275

CASE STUDY #5: Pressure Ulcer Prevention (continued)

cover in this text but if you expect to be a person with deep knowledge of the science of improvement
(SOI), building knowledge about the application of PE is essential.
A final technical point related to the use of a p-chart needs to be mentioned. Note that the right
side of Figure 10-9 (weeks 18–28) has no LCL. This is because when you are using a p-chart many
software programs will not allow the UCL or LCL to exceed the typical minimum (0.0) or maximum
(1.0 or 100%) values for a proportion or percentage. Remember that the UCL and LCL are calculated
to be symmetrical about the mean (CL). As the data approach either the minimum or maximum
theoretical limits for a proportion or percentage the calculated limits may exceed these boundaries.
So, to prohibit having a control limits that go either above 100% or below 0% the software will cap the
limits. You may have to explain this as you present your data.

CASE STUDY #6: Evaluating Staffing Effectiveness


Situation
A national regulatory and oversight body has recently raised concerns over the growing number
of nurse vacancies and the potential impact these vacancies are having on patient care outcomes.
There have even been recent articles in the paper and stories on the evening news that nurse
vacancies are leading to “increases in patient harm.” Although many of the stories and opinions
about poor quality are anecdotal there was a situation at one hospital last week where a reporter
claimed she uncovered evidence that “proved” that nurse vacancies lead to an increasing number of
medication errors.
Because of the growing concern over this issue the national regulatory body has proposed that
there be new staffing effectiveness requirements for all hospitals and nursing homes. Although still
be developed this new requirement is designed to test relationships between two sets of indicators:
human resources indicators and clinical/service indicators. The basic theories driving development of
this new requirement are that changes in the human resources indicators, especially nursing levels,
are negatively correlated with clinical outcomes, patient safety, and patient satisfaction. While these
theories have not been consistently substantiated in research studies they have gained increased
popularity with the public, the press and political leaders. The specifics and requirements related to
assessing staffing effectiveness have not been finalized but your organization has created a team
to be proactive about this issue and it has started to study your own data to see if any patterns or
correlations exist. What steps would you advise the team to take in order to determine whether there
is a relationship between staffing effectiveness and clinical outcomes?

Discussion
The first thing your team would most likely do is to gain as much information about the regulatory
body’s assessment strategy and plan as possible. Unfortunately the regulatory body has not specified
exactly how healthcare providers will have to demonstrate clinical effectiveness and quality outcomes.
In their preliminary news releases, however, they have indicated that each hospital or nursing home
will be expected to test the relationships between various human resources indicators and clinical
outcomes. They have also released a preliminary set of indicators they are planning to use to evaluate

(continues)
276 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #6: Evaluating Staffing Effectiveness (continued)

human resource effectiveness and clinical/service outcomes. The indicators by category they are
considering are as follows:
■■ Human resource indicators
•• Overtime hours
•• Staff vacancy rates
•• Staff satisfaction scores
•• Staff turnover rates
•• Nursing hours per patient day
•• Staff injuries
•• On-call per diem use
•• Sick time usage rates
•• Resources used to bring in agency nurses
■■ Clinical/service indicators
•• Patient complaints
•• Family/caregiver complaints
•• Patient satisfaction scores
•• Patient falls
•• Adverse drug events
•• Patient injuries
•• Skin breakdowns
•• Pneumonia rates
•• Urinary tract infections
•• Shock/cardiac arrest
•• Length of stay
One of the main challenges with these two lists is that very few are actually indicators. Some
are still vague concepts that require clear operational definitions. (See Chapter 4.) For example, how
will staff and patient satisfaction be measured? Will individual questions be used or a composite
score? (See Chapter 3.) What is the operational definition of an “adverse drug event”? Even indicators
that sound somewhat specific like “staff turnover rates” or “length of stay” require more detailed
operational definitions. A spokesperson for the regulatory body has indicated that they do not plan
to define precisely every concept or indicator. Each hospital, they have said, must decide which
indicators they wish to evaluate and then decide how they will define terms such as vacancy,
turnover, patient injuries, skin breakdowns, or staff satisfaction. Once the providers decide how they
will define the various indicators, they then must select one human resource indicator and one
clinical/service indicator and test the relationship between these two indicators. As you can imagine,
structuring an assessment program by having the organizations select the indicators upon which
they will be judged and then define these indicators creates a variety of challenges, especially if
the regulatory body wishes to make any comparisons between organizations. But, in light of these
challenges your team presses onward.
The second thing the team should discuss is what it means to “test the relationship between one
human resources indicator and one clinical outcome indicator.” Whenever two measures are matched
and the word “relationship” is used, the objective is to determine if there is a causal relationship
between the two measures. In this case, the human resources indicator (e.g., number of nursing
vacancies) is considered to be the cause (i.e., the independent variable), and the patient safety indicator

(continues)
Applying Quality Measurement Principles 277

CASE STUDY #6: Evaluating Staffing Effectiveness (continued)

(e.g., the number of patient falls) is viewed as the effect or dependent variable. Stated differently, if you
decided to compare the number of nurse vacancies with the number of patient falls, one possible
theory that the regulatory body would expect you to explore is as follows: As the number of nursing
vacancies increases, we expect patient falls also to increase.
Depending on the indicators selected, the expected relationship between the two will have one
of the following outcomes:
■■ There is a positive relationship between the two indicators (i.e., as one indicator increases, the
other indicator also increases).
■■ There is a negative relationship between the two indicators (i.e., as one indicator increases, the
other indicator decreases).
■■ There is no relationship between the two indicators.
Once you have selected the indicators to be assessed, the following steps should be taken to test
the relationships between the two indicators:

Step 1: Organize the Data


In order to determine whether there is a relationship between the two indicators, you need to have
an adequate amount of data. Ideally, the indicators would be based, at a minimum, on monthly
data. Some indicators may even be available on a weekly basis. The least desirable time frame for any
indicator is quarterly. Quarterly data will produce only four data points for the entire year, which is too
little data to successfully test the relationship between the two indicators. Generally speaking, if you
are going to test the relationship between two indicators you should have a minimum of 15–20 data
points before testing the relationship between the two indicators. Also note that it is best practice to
have both indicators based in the same time frame. For example, you would not want to have patient
falls by week and nurse vacancies by month.

Step 2: Prepare Control Charts


A Shewhart chart should be prepared for each indicator. If an indicator exhibits special cause
variation, then the special causes need to be addressed before the indicator is matched up with the
second indicator. If both of the selected indicators exhibit special causes, then the conclusion about
the relationship between the two indicators will be suspect because both indicators are unstable
and therefore unpredictable. As a result you have little understanding of what the indicator is
capable of producing.
If special causes are present, the first thing to do is to review the data collection procedures.
Inconsistent data collection procedures can produce special causes that are a result of:
■■ Operational definitions that keep changing
■■ Changes in the patient population (e.g., a more severely ill patient population emerged midway
through the year)
■■ Improvement strategies implemented by a team during the data collection period
■■ New employees who did not understand the data collection procedures
Any or all of these factors can create special causes in your data.
As an alternative to preparing control charts, you can develop a histogram for each indicator to
see if the data are reasonably close to forming a normal distribution. Histograms can be made very
easily in most statistical software packages.

(continues)
278 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #6: Evaluating Staffing Effectiveness (continued)

Step 3: Develop Line Graphs


There are two basic approaches for testing the relationship between the paired indicators. The first
approach is to make a simple line graph with both indicators plotted on the same graph. The second
approach is to develop scatter diagrams, which are discussed in Step 4.
FIGURES 10-10 and 10-11 show two possible conditions for the relationship between the number
of nursing vacancies and the number of falls each month. Each of these graphs has two y axes: one
for the number of nurse vacancies on the left side of the graph and one for the number of falls on
the right side of the graph. It is possible to have only one y axis on the left side of the graph if the two
indicators being compared have a very similar value range.
Figure 10-10 shows a pattern in which both indicators follow a similar path. Notice that the two
lines have different amplitudes but basically follow the same path over time. As the number of nursing
vacancies goes up so do the number of falls. Similarly, as the nursing vacancies go down so do the
number of falls. This demonstrates that the two indicators are related and follow a similar course (i.e.,
as one goes up the other goes up, and when one goes down the other also goes down). This is the
simplest way to show a relationship between two indicators. In Figure 10-11, we see a different pattern
between the two indicators. As the number of nursing indicators goes down we observe that the
occurrence of falls goes up. For these two indicators, however, the hypothesis was stated as a positive
relationship: As the number of nursing vacancies increases, we expect patient falls also to increase. Simple
line graphs provide the first level of analysis, but they do not allow you to evaluate the strength of the
relationship between the two indicators. To do this you need to move to scatter diagrams.
# of Nurse Vacancies

# of Falls

Month

Number of nurse vacancies

Number of patient falls

FIGURE 10-10 A line graph showing two indicators that are positively related

(continues)
Applying Quality Measurement Principles 279

# of Nurse Vacancies
CASE STUDY #6: Evaluating Staffing Effectiveness (continued)

# of Falls
Month

Number of nurse vacancies

Number of patient falls

FIGURE 10-11 A line graph showing two indicators that are negatively related

Step 4: Develop Scatter Diagrams


A scatter diagram (also called a scatterplot, a scattergram, or an x,y plot) is a graph of data points based
on two variables, where the one variable defines the horizontal (or x) axis and the other defines the
vertical (or y) axis. These graphs enable you to test for possible cause and effect relationships. Note that
scatter diagrams do not prove that one variable causes the other. Instead they allow you to determine
whether a relationship exists and the possible strength of that relationship.
The data for scatter diagrams must be variables data that is measured along a continuous scale.
Attributes data (e.g., categories such tall versus short, cesarean section delivery versus vaginal delivery,
surgery started on time or not on time) cannot be placed on a scatter diagram because they are
categorical types of data. The values of the two indicators are then arranged as pairs of coordinates (i.e.,
the x axis is the independent variable and the y axis is the dependent variable or effect). These coordinates
are plotted as dots on the graph. If the dots are clustered close together, the relationship between the two
indicators is considered to be stronger than if the dots are spread farther apart. The more the clustering of
dots looks like a straight line, the stronger the relationship is between the two indicators.
FIGURE 10-12 shows a strong positive relationship (correlation) between the number of falls (the
y axis) and the number of registered nurse (RN) vacancies (the x axis). When the number of nurse
vacancies is low the number of falls is also low. FIGURE 10-13, on the other hand, shows a weak positive
relationship between falls and RN vacancies (the dots are more spread out and not clustered in a tight
pattern). Note that when the two indicators are positively related, the pattern of the dots will rise from
the lower left to the upper right. In this case, as one indicator increases the other one also increases.

(continues)
280 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #6: Evaluating Staffing Effectiveness (continued)

High High

Number of falls
Number of falls

Low Low
Low Number of RN vacancies High Low Number of RN vacancies High

FIGURE 10-12 A strong positive relationship FIGURE 10-14 A strong negative relationship
between the two variables between the two variables

High
High

Number of falls
Number of falls

Low
Low
Low Number of RN vacancies High
Low Number of RN vacancies High
FIGURE 10-15 A weak negative relationship
FIGURE 10-13 A weak positive relationship
between the two variables
between the two variables

High
Number of falls

Low
Low Number of RN vacancies High
FIGURE 10-16 No relationship between the two
variables
In contrast, negative relationships present patterns that are similar to those shown in FIGURES 10-14
and 10-15. In Figure 10-14, the pattern reveals that when the number of vacancies was high, the number
of falls was low. When the number of falls was high, however, the RN vacancies were low. If you detect
negative relationships, one indicator’s values will increase and the other indicator’s values will decrease.
Again, the stronger the relationship, the tighter the pattern of the dots.
The final outcome of a scatter diagram might be that you discover no relationship between the
indicators. This is shown in FIGURE 10-16. When this happens the dots are randomly scattered on the
graph with no discernable pattern emerging.

(continues)
Applying Quality Measurement Principles 281

CASE STUDY #6: Evaluating Staffing Effectiveness (continued)

High High
Strong - r
# of Falls

# of Falls
Strong + r

Low Low
Low # of RN Vacancies High Low # of RN Vacancies High

High High
Weak - r
# of Falls

# of Falls
Weak + r
Low Low
Low # of RN Vacancies High Low # of RN Vacancies High

FIGURE 10-17 No relationship between the indicators

No correlation (r = ~0)

High
# of Falls

Low
Low # of RN Vacancies High

FIGURE 10-18 No relationship between


two indicators

Note that if you really want to maximize the use of a scatter diagram, you should produce a
correlation coefficient to accompany the scatterplot. This is a single value that tells you the actual
strength of the relationship between the two indicators. The correlation coefficient (usually designated
as r) will range from –1.0 to + 1.0. The other option, besides having a correlation coefficient calculated,
is to have the software plot the linear least-squares regression line through the scatterplot as shown in
FIGURES 10-17 and 10-18. If you do go the route of plotting regression lines through a scatterplot realize
that not all relationships are linear in nature as shown in FIGURE 10-19.
In summary, the regulatory agency is interested in knowing whether you can detect relationships
between pairs of indicators. This means that that team assigned to answer this question needs to:
1. Select pairs of indicators representing human resource and clinical/service activities.
2. Develop theories about the possible relationships between the indicator pairs that you have
selected. For example, if you select staff satisfaction as the human resource indicator and patient

(continues)
282 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #6: Evaluating Staffing Effectiveness (continued)

800
700 Y=0.0477x2 + 2.2108x + 2.3052

Stopping Distance
600
500
Y-axis

400
300
200
100
0
0 20 40 60 80 100 120
Speed
X-axis

Y=-0.25X2 + 2.73X - 0.1 20 r = -0.172


Fuel Used (liter/100km)

12
10
15
8
6
Y

4 10
2
0 5
0 2 4 6 8 10 12
0 50 100 150
X
Speed (km/h)

Mrs. Johnson's 2nd Grade at Smith


Elementary School
90
Performance

80
IA Reading Score

70
60
50
40
30
20
10
0
Low Moderate High 0 5 10 15
Average Number of Hours of Sleep per Night
Amount Anxiety

FIGURE 10-19 Nonlinear relationships for scatterplots

(continues)
Applying Quality Measurement Principles 283

CASE STUDY #6: Evaluating Staffing Effectiveness (continued)

satisfaction as the clinical service indicator, what theories or hypotheses do you have about the
possible relationship between these two indicators? Do you think there will be a positive or
negative relationship?
3. Do a quick analysis of the data to see whether they have been collected with consistent
operational definitions and the time periods for the indicators all match.
4. Create a control chart for each indicator to determine whether the data represent common or
special causes of variation. If special causes are present, investigate them and try to determine
why they are present in your data. Indicators that reflect special cause variation will create
problems for you when you try to interpret the line graphs and/or scatter diagrams.
5. Do a preliminary analysis by placing both indicators on the same line graph and determining
whether the lines track in a similar or dissimilar fashion.
6. Create scatter diagrams to analyze the pattern of the dots and decide whether there is a positive
relationship, a negative relationship, or no relationship between the indicators.
7. Determine the strength of the relationship. You can do this by merely using common sense
(the tighter the pattern, the stronger the relationship), or if you are comfortable with correlation
coefficients and regression analysis, you can request these statistics from the software program
you used to create the scatter diagram.

CASE STUDY #7: To Flash or Not to Flash—That


Is the Question
Situation
You are the director of surgical services at a 180-bed hospital that also has an outpatient surgery
center. Recently during lunch with one of your friends, who just happens to be the hospital’s director
of infection control, you mention that you have noticed an increase in the hospital’s surgical infection
rate. Your friend suggests that one possible area that could be contributing to this increase is the use
of flash sterilization (FS), which is frequently overlooked as a potential cause of surgical infections.5
When you first started working at the hospital, FS was used only in emergency situations (e.g., when an
instrument was dropped during surgery). Now it seems to have become a rather routine practice.
You decide to begin by looking at the FS rate and how it has varied over time. You consult with
the hospital’s measurement expert and with her assistance FIGURES 10-20 and 10-21 were developed.

Discussion
The data are plotted weekly (the subgroup) on u-charts. The FS rates are calculated as follows:
Total number of FSs done each week
FS rate =
Total number of surgeries done each week
■■ Why do you think a u-chart was used instead of a c-chart or a p-chart?
The u-chart is the chart of preference because each surgery case could have more than one
FS. Each time an FS occurs it is viewed as a defect. A c-chart would not be used because the

(continues)
284 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #7: To Flash or Not to Flash—That


Is the Question (continued)

0.9

0.8
Flash sterilizations per 100 surgeries

0.7
UCL = 0.653
0.6

0.5
CL = 0.433
0.4

0.3

0.2 LCL = 0.212

0.1

0.0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Week

FIGURE 10-20 Flash sterilization rate by week (u-chart)

Baseline New surgeons Inspection Post Inspection


1.1

1.0

0.9
Flash sterilizations per 100 surgeries

0.8 UCL = 0.823

0.7

0.6
CL = 0.570
0.5

0.4

0.3
LCL = 0.317
0.2

0.1

0.0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

FIGURE 10-21 Flash sterilization rate by week with defined sets (u-chart)

(continues)
Applying Quality Measurement Principles 285

CASE STUDY #7: To Flash or Not to Flash—That


Is the Question (continued)

c-chart requires that the subgroups for counting the defects (which are weeks in this case
study) have an equal opportunity for a defect to occur. Because a different number of surgeries
are performed each week, this constitutes an unequal subgroup, which requires the use of the
u-chart. A p-chart would be used only if the measurement question was as follows: “Was FS used
during this surgical case?” The answer to this question is simply yes or no. The p-chart does not
address the issue of multiple occurrences of a FS. If a surgical case had 1 FS or 100 FSs, it would
merely be recorded as a “Yes” on a p-chart because the frequency of the event does not matter.
The p-chart is used to measure the percentage or proportion of defectives, but the u- and
c-charts are used to count the actual number of defects.
■■ Do you detect any special causes in the charts? If so, what are they and what do they tell you?
Figure 10-20 reveals numerous special causes owing to shifts in the process. As you look at the
patterns in the data, you notice that there seems to be a cyclical nature to the data. The dots
form a cluster in the lower left side of the chart, then they jump up to a higher FS rate. After
running for a few weeks at a fairly high level, the data drop dramatically to a lower level. Finally,
in the last part of the u-chart the process shifts again to a higher level (actually the highest level
so far). What could be causing these cyclical shifts in the data? The statistician supporting this
initiative suggests that it might be useful to partition the data into sets. Not quite sure what
she is suggesting, you follow her lead and agree. After a few quick strokes on the keyboard, she
produces the chart shown in Figure 10-21. In this chart, all the special causes have disappeared,
and there are four distinct phases that form common cause sets of data. As you ponder
this chart, you notice the dates associated with the four segments. All of a sudden the light
bulb goes on and you identify the four phases as (1) the baseline—that is, when you started
tracking the FS rate; (2) when a new group of surgeons came on board; (3) when the hospital
was gearing up for an inspection by an external regulatory body; and (4) the period after the
regulatory body’s inspection.
This is a clear demonstration of how organizational conditions can have rather dramatic
impacts on the performance of a process. The addition of new surgeons created an entirely new
process. Why these new surgeons had such an impact is unknown at this point but certainly
merits further study. The impact of a pending inspection by an external regulatory body had
another dramatic impact on the process. Even the new surgeons seemed to have changed their
behaviors with respect to FS during this time. But after the inspection was over, the use of FS not
only increased but also moved to an all-time average high of 57 FSs per 100 surgeries.
■■ How will you know whether a planned change improves the current process?
If there is serious concern over the FS rate, then the last time period (weeks 30–40) should
serve as the new baseline. If the objective is to lower the FS rate, then the creation of a team
would be advisable. The team should continue to track the FS rate and begin to gain a better
understanding of why FS is used. In this case, the creation of Pareto diagrams documenting
the reasons for flashing an instrument and the types of surgical instruments being flashed
would help to identify the vital few areas for improvement. The team might also consider
finding out what the FS rates are at other hospitals in the area. If the hospital is part of a system,
then getting this same indicator from other hospitals in the system sets up the opportunity
for internal benchmarking. As the team develops improvement strategies they can freeze the
control limits and CL on the new baseline (i.e., Phase 4) and then use these reference lines as a
basis to decide if their interventions are able to reduce the FS rate.
286 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #8: Clarifying the Operational Definition


of Readmission
Situation
A home healthcare agency is interested in tracking the percentage of its patients that are readmitted
to the hospital. So, they set about collecting data only to find out that the numbers seem to vary rather
dramatically from what they expected. Do you (1) assume that the data are the true reality and that the
agency staff has been missing something, or (2) do you go back to square one and check the accuracy
and efficiency of their data collection procedures?

Discussion
Whenever data seem to be at odds with what you think they should be telling you, the first thing to
do is to check the data before you jump to any conclusions about the results. Three key questions
should be explored: (1) Is the operational definition clear? (2) Is everyone interpreting the operational
definition in the same way? and (3) Have the data collectors gathered the data in the same way?
Based on my experience in developing indicators, the most common problem is that the operational
definition is not clear, which sets the stage for the next two issues to occur.
A good operational definition should be clear and unambiguous; it should define the decision
criteria to be applied when collecting the data and the methods to follow when actually collecting
the data. Basically, if you have developed your operational definition correctly, all three questions
will be answered simultaneously. Next consider issues related to defining a readmission. The
major issue I have encountered is first determining if the readmission is for the same diagnosis
or any diagnosis. Most of the time the team’s interest is in determining whether the patient was
readmitted for the same diagnosis. But it is surprising how many times I have found teams in
conflict over this issue. Although the team members usually reach consensus on the fact that
readmission for the same diagnosis makes the most sense, the people pulling the data are often
not told to eliminate all readmissions for a different reason. For example, if a congestive heart
failure (CHF) patient was admitted to the hospital for a CHF-related problem and then readmitted
within 30 days for a broken arm, would this be considered a readmission? Most staff would
conclude that this is not a readmission because the patient was not admitted for a CHF-related
problem. I have seen teams debate this conclusion because some think that the CHF caused the
patient to stagger and then fall on his arm. Therefore, the patient’s CHF was the causal factor
leading to the readmission. The other issue that I have seen enter into discussions of readmission
is whether the team was considering a 30-, 60-, or 90-day readmission. It seems obvious, but I have
seen teams waste considerable time and effort because they did not specify the time frame for a
readmission. So, before assuming that the data do in fact reflect the current performance of the
process, always start by reviewing the operational definition and your data collection procedures.
If the operational definitions and data collection procedures prove to be consistent with expected
practice, then it is time to start understanding the variation inherent in the data both conceptually
and statistically.
Applying Quality Measurement Principles 287

CASE STUDY #9: Managing a Breast Cancer Patient’s


Clotting Levels6
Situation
A breast cancer patient and her oncologist decide to have a mediport inserted into her chest to aid
in the delivery of chemotherapy drugs. The insertion of the port goes without incident, but about
a month after it is in place, the patient develops a deep vein thrombosis (DVT) in her left arm. She
is hospitalized for several days on heparin, sent home on a brief regimen of enoxaparin injections,
and then begins oral doses of warfarin sodium. During each visit to a local outpatient clinic (which
takes about 15 minutes), a small amount of blood is extracted via a finger stick and analyzed, and
adjustments are made on the spot. The patient assists the pharmacist by keeping track of her daily
intake of vitamin K (inconsistent intake of vitamin K can have a dramatic impact on the effect of
warfarin sodium). The challenge with managing a patient on this particular drug is that there is a very
narrow SL for the proper management of the effects of the drug (i.e., two to three on the international
normalized ratio [INR] scale). When the patient exceeds the USL of 3.0, there is an increased risk of
bleeding. Conversely, when the patient falls below the recommended therapeutic limit of 2.0, there is
an increased risk of clotting. The pharmacist creates control charts specific to each patient to assist in
managing the patient’s progress on warfarin sodium. FIGURES 10-22 and 10-23 show the patient’s INR
values during and after chemo and radiation.

5.0
4.5 UCL=4.14
UCL USL
4.0
3.5
INR Value

3.0
CL=2.38
2.5 CL
2.0
1.5
1.0 LCL LSL LCL=0.62
0.5
0.0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233
Blood Draw

2.5
UCL 2.16
2.0
Moving Range

1.5
1.0
CL 0.66
0.5
0.0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233
Blood Draw

FIGURE 10-22 INR values for a breast cancer patient on warfarin sodium (XmR chart)

(continues)
288 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #9: Managing a Breast Cancer Patient’s


Clotting Levels (continued)

6.0
During chemo and radiation
5.0 UCL 4.92
4.0 USL
INR Value

After chemo and radiation


3.0 CL 2.60
2.73
2.0
1.78
1.0 LSL
LCL 0.53 0.95
0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Blood Draw

3.0
UCL 2.70
2.5
Moving Range

2.0
1.5
1.01
1.0 CL 0.83
0.5 0.31
0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Blood Draw

FIGURE 10-23 INR values for a breast cancer patient on warfarin sodium before and after chemo and
radiation (XmR chart)

Discussion
■■ What chart did the pharmacist make for this patient? Why do you think she used this chart? Is there
another chart that she could have used?
Because the indicator used to measure the effect of warfarin sodium is a continuous measure,
the pharmacist has a choice of making the X-bar and S chart or the XmR chart. The chart of
preference in this case is the XmR chart (a.k.a. the Individuals or I chart) because there is only
one observation per subgroup (i.e., each time the patient visits the clinic, which is weekly, the
pharmacist takes one blood draw for analysis). If multiple blood draws were taken each week,
the pharmacist could make the X-bar and S chart. Because multiple blood draws each week are
not necessary, the appropriate chart is the XmR chart.
■■ Is this patient’s process in control or out of control?
The patient’s process is not in control. There are two special causes in Figure 10-22 which are
highlighted in black. One occurs at blood draw 16 (a 3 sigma violation) and the other one
occurs after chemo and radiation were terminated after blood draw 21 (i.e., the process stayed

(continues)
Applying Quality Measurement Principles 289

CASE STUDY #9: Managing a Breast Cancer Patient’s


Clotting Levels (continued)

below the centerline for more than eight data points in a row [data points 22-30], signaling a
downward shift in the INR values).
■■ What do we learn by looking at the moving range (mR) chart?
The mR chart (the bottom chart of Figure 10-22) depicts the variation from one data point to
the next. If there are large swings from one data point to the next, then this chart will have wide
control limits. The mR chart exhibits wider variation during the period when the patient was
receiving chemo and radiation. After these treatments were ended at blood draw 21, the mRs
do not show as much fluctuation (blood draws 22–33). This is a subtle point but notice that the
mR chart (the bottom chart) has one less data point than the top chart (known as the X chart).
This is because in order to calculate a mR there must be two data points. The first INR data point
on the X chart is a single measure which has no variation. Variation requires at a minimum the
comparison of two data points, which does not happen until the first INR value is compared to
the second INR value on the X chart. So, the mR chart has one less data point because the mR
cannot be calculated on the first data point by itself. The first mR on the chart is zero because
blood draws 1 and 2 happen to be the same value (2.3) and there is no range between numbers
of the same value. Because blood draw 2 is different from blood draw 3 (2.3 compared to 2.8)
we now observe the first mR on the bottom chart, which is 0.5. This process of creating what is
called “artificial subgroups of two” by coupling each single data point with its neighbor is key to
understanding how the XmR chart is constructed.
■■ Note that the chart shows the USL and LSL. Of what value are these lines? How do they differ from the
UCL and LCL?
A number of clinical measures actually have SLs. The healthcare measures that typically have
SLs are those that relate to physiological measures (e.g., hemoglobin levels, white blood
counts, neutrophils, prothrombin time or platelet counts). Warfarin sodium levels are measured
in terms of clotting times. The standard measure is known as the INR value. The normal INR
for individuals not on warfarin therapy is typically 0.9 to about 1.1. On warfarin therapy,
the INR usually elevates to between 2.0 and 3.0 but most hospital pharmacies and clinical
hematology services will have specific INR goals documented in their treatment protocols.
These therapeutic values (i.e., 2.0–3.0) can be used as the USL and the LSL for the drug. When
the patient’s control limits do not coincide with these SLs, then the patient is at risk. The
principal difference between SLs and control limits is that the SLs represent the desired levels
of performance, sometimes referred to as the voice of the customer (VOC), whereas the control
limits capture the actual levels of variation in the data. SLs can be developed by those making
the chart whereas the control limits are derived from the inherent variation that lives in the
data. Stated otherwise, the maker of the chart does not decide what the UCL and LCL are but
can establish the USL and LSL.
■■ Figure 10-23 partitions the data points into two segments. The left side of the chart shows the
INR values for the patient while undergoing chemo and radiation (blood draws 1 through 21).
The second set of control limits shows the INR values for the patient after all chemo and radiation
treatments were completed (blood draws 22–33). Why do you think the pharmacist divided the
chart into two parts? Does it help understand the patient’s process? What could be causing the shift
in the INRs?

(continues)
290 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #9: Managing a Breast Cancer Patient’s


Clotting Levels (continued)

The pharmacist knew, from experience, that once a patient completes the chemo and radiation
treatments, the management of warfarin sodium usually moves into a new phase. Specifically,
without the influence of chemo and radiation, the patient’s body began to exhibit its true
response to warfarin sodium. So the pharmacist divided the control chart into a treatment
phase (blood draws 1–21) and an after-treatment phase (blood draws 22–33). Consistent with
previous patients, this patient also exhibited a fairly dramatic drop in the INRs after chemo and
radiation treatments were terminated. This was followed by a gradual increase to a more stable
period that coincided (for the first time) with the INRs being close to the specification limits.
The mR chart after chemo and radiation ended shows much greater stability than in the mR in
the baseline phase. Frequently the mR chart can be quite valuable when studying clinical issues
at the patient level. Stability in INRs, blood sugar, or blood pressure, for example, are in many
ways easier to address than constant swinging between extreme readings. So, when applying
the Shewhart charts, especially the XmR or X-bar and S charts, to physiological measures at the
patient level the mR chart can be just as valuable as the average (X) chart.
■■ What other patient conditions can you think of that could be monitored with Shewhart charts?
Shewhart charts have great utility for tracking blood pressure; blood glucose levels; the daily
intake of fat, sodium, and cholesterol; or the impact of physical therapy treatments. Take a few
minutes to think about how you could apply Shewhart charts to your discipline.

CASE STUDY #10: Group B Streptococcus in Pregnant


Women
Situation
A team of nurses at Advocate Good Samaritan Hospital (Downers Grove, Illinois) was interested in
studying the administration of antibiotics to pregnant women who present for delivery with a positive
culture for group B streptococcus (GBS).7 According to the Centers for Disease Control and Prevention,
about 20–30% of all women have GBS naturally in their bodies. Although GBS is benign to the woman,
the transference of GBS to the newborn during delivery can have extremely negative consequences
for the baby, including death (approximately 1 death per 1,000 deliveries). Women should be tested for
the presence of GBS during weeks 35–37 of pregnancy. If they test positive for the flora, they should
be receiving intrapartum antibiotic prophylaxis (IAP) for at least 4 hours prior to delivery in order
to minimize the potential of transferring the GBS to the newborn and thereby creating a very high
probability of sepsis and possibly death of the newborn.

Discussion
The nurse manager for labor and delivery was somewhat dismayed when she called me because she
thought the percentage of women receiving IAP was not improving. She said that she thought the
problem was the result of an increase in the number of women presenting with a positive GBS who

(continues)
Applying Quality Measurement Principles 291

CASE STUDY #10: Group B Streptococcus in Pregnant


Women (continued)

were under the 4-hour threshold. Because you cannot tell a pregnant woman, “We’re going to wait 3
more hours to deliver your baby so we can administer antibiotics,” these women receive less than the
recommended dose of antibiotics. I asked the nurse manager if they were combining all cases (i.e.,
women presenting 4 hours or more before delivery and women presenting less than 4 hours prior to
delivery) in their data analysis. She said that they were. I asked if they could stratify the cases into these
two categories and she said that they could. The issue here is that the increased volume of women
presenting 4 hours before delivery is something the hospital cannot control. The application of the
antibiotic protocol should be applied only to the women who present at least 4 hours before they
deliver. These are the women who can experience positive results from the application of prophylactic
antibiotics. Those who present less than 4 hours prior to delivery should be placed in a separate
category (stratum) and dealt with separately.8
FIGURE 10-24 depicts the various levels of stratification developed with this team. Note that the
decision tree places patients into categories or buckets that are mutually exclusive. In this way, the
team could decide which categories it wishes to measure and which are not particularly relevant
to their intended objectives. Stratification is an aid, however, in making these decisions because
it can assist them in determining whether the data can be broken down into smaller and more
homogeneous comparison groups.
While developing the decision tree in Figure 10-24, the team also started to collect data on the
GBS process. They had only 3 months of data as a baseline. This is not ideal for a baseline but it is what
the team had so it had to suffice. After the decision tree was developed and the patient population
stratified properly they collected 5 more months of data. FIGURE 10-25 shows a p-chart with the data
organized into monthly subgroups, phased to show the baseline compared to the properly stratified
patients receiving IAP and the national average of 89%. The average for the first 3 months when the
two patient populations were combined is 79.9% whereas the average for the 5 months after the
patient population was properly stratified is 96.1%, which is the true percentage of compliance with
the IAP. This is a good example of how a team can be thinking that their performance is worse than it
really is simply because they had not properly stratified the data into relevant segments
The team’s next challenge will be to move this average of 96.1% to be as close to 100% as
possible. The team’s improvement strategies included:
■■ Revising their data collection instruments and procedures to make sure they are all using the
logic shown in Figure 10-22 and properly stratifying the patients into the relevant categories
■■ Evaluating the volume of cases they experience each month to see whether they can use week
or every 2 weeks as the subgroup rather than month
■■ Continuing to update the p-chart on the percentage of IAP compliance with GBS positive
patients
■■ Providing continuing education for the nursing staff and patients on the importance of
understanding that IAP should be initiated for women with GBS within 30 minutes of admission
■■ Enhancing collaboration between nursing and clinical management to make sure pitocin
inducement is delayed for women with GBS
■■ Increasing the number of nurses involved with the data collection efforts
■■ Adding GBS status to the mother/infant record and prenatal record

(continues)
292 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #10: Group B Streptococcus in Pregnant


Women (continued)

Total patients for the month

Number with Number with


complete incomplete
documentation documentation

GBS culture not done GBS culture done

Women with Women with GBS negative GBS positive


no risk factors risk factors culture culture

Women with
no risk factors

Women Women
presenting less presenting more
than or equal than 4 hours
to 4 hours prior to delivery

Women not Women


receiving the receiving the
appropriate appropriate
antibiotic antibiotic
regiment regiment

FIGURE 10-24 Stratification of pregnant women with group b streptococcus (GBS)

(continues)
Applying Quality Measurement Principles 293

CASE STUDY #10: Group B Streptococcus in Pregnant


Women (continued)

CL=96.1%
100%
UCL
90%

80% National Average = 89%

70% CL=79.9%

LCL
60%

50%

40%
Jan Feb March April May June July Aug
Month

FIGURE 10-25 Percentage of IAP compliance for patients with group B streptococcus (p-chart)

CASE STUDY #11: Emergency Department Fast Track


Situation
An emergency department (ED) has proudly announced to the community that it now has a “fast
track” in place for emergency visits that are not life threatening. Brochures have been developed and
a banner has been placed across the ED proclaiming that “Fast Track is here!” The ED team set a goal of
75 minutes from ED registration to discharge from the fast track area. They did not use data to establish
this goal; it “just felt right.” Besides, one of the staff had read that a hospital in New York City was able to
get people in and out of its fast track service in 60 minutes. If an urban city hospital could achieve
60 minutes or less, then surely a suburban hospital should be able to do it in 75 minutes. Unfortunately,
after a month of operation the fast track process was receiving more complaints than compliments.
What would you advise the ED Fast Track team to do?

Discussion
The quality measurement expert helped the team develop two analytic tools. First they developed a
Pareto diagram to identify the vital few reasons people were presenting to the fast track. As you can
imagine the reasons were quite varied. Patients were presenting for minor sports injuries like badly
sprained ankles to more complex issues requiring stitches or even rule-outs for potential cardiac

(continues)
294 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #11: Emergency Department Fast Track (continued)

events. This analysis allowed the team to decide that two categories within the fast track area would be
established (i.e., visible blood versus no visible blood).
Next the QI measurement expert guided the team on the development of baseline data. She
cautioned the team that establishing a goal, in this case 75 minutes or less, without baseline data was
not a particularly good idea. But this suggestion seemed to fall on deaf ears because the team had
already publicized the 75 minute or less limit not only to management but to the public. The team was
at least willing to start building a baseline to see how close they were to 75 minutes.
The volume of people coming to the ED fast track process during the first 2 weeks of operation
varied from a minimum of 47 and a maximum of 89 each day. Based on these initial numbers, the QI
measurement expert advised the team that they did not need to record the wait time of every patient.
A systematic random or stratified systematic random sample would be sufficient to understand the
variation in the process. After she explains the differences between the two sampling approaches, the
team decided that it would be best to draw a stratified systematic sample each day. They also thought
that two stratification levels would be relevant to this work: (1) day versus afternoon shift (the fast track
process does not operate during the night shift), and (2) type of injury coming to the fast track process
(i.e., visible blood versus no visible blood). This means that the team will have five different charts to
evaluate:
■■ Wait time in the fast track day shift—visible blood patients
■■ Wait time in the fast track day shift—no visible blood patients
■■ Wait time in the fast track afternoon shift—visible blood patients
■■ Wait time in the fast track afternoon shift—no visible blood patients
■■ Total wait time in the fast track—both shifts and both injury types
A systematic sampling plan was laid out to accommodate the four stratification levels, log sheets
were created, and a date was selected to start collecting patient wait times. The day of the week was
selected as the subgroup for the Shewhart chart (i.e., the units placed on the x axis of the Shewhart
chart) including weekdays and weekends.
Each day the team selected a random number from 1 to 10 at the beginning of each shift.
This random number then became the starting point for pulling the systematic sample. If a random
starting point is not selected for a systematic sample then it cannot be considered a random sample.
In Chapter 4, the details of pulling a systematic sample are provided. The team agreed to track the wait
time for every fifth patient presenting in each of the four stratification levels. Each day at the start of
each shift a team member generated a random number between 1 and 10 and then selected every
fifth patient after that number and tracked the patient’s wait time in the fast track process.
Although a systematic sample of five patients in each of the four stratification levels was
the objective, some days they get more than five patients and other days they got less. The
team expressed concern that this variable subgroup size would affect the analysis of the data.
The QI measurement expert assured them, however, that this will not be a problem, because the
Shewhart chart she will be preparing for the team (the X-bar and S chart) can handle varying
subgroup sizes. They collected data for 20 days, all the while trying to get patients through the fast
track process in 75 minutes or less. The baseline data on the X-bar and S chart for the day shift and
patients with no observable blood is presented in FIGURE 10-26. What does this chart tell you about
the fast track process?
The first thing that you observe is that on the last Wednesday on the chart (fourth data point from the
right) a special cause occurred. This is a 3-sigma violation because the average for the day (84.4 minutes) fell

(continues)
Applying Quality Measurement Principles 295

CASE STUDY #11: Emergency Department Fast Track (continued)

140.0

135.0

130.0
UCL = 127.235
125.0
Fast track wait time (in minutes)

120.0

115.0

110.0
CL = 107.958
105.0

100.0
Special cause
95.0 (3-sigma violation)

90.0
LCL = 88.680
85.0

80.0

40.0

30.0
UCL = 28.215
Sigma

20.0
CL = 13.506
10.0

0.0 LCL = 0.000


M W F M W F M W F
Day

FIGURE 10-26 Emergency department “fast track” wait time (day shift with minor injuries) X-bar
and S-chart
below the LCL for this data point. This is actually a special cause that the team would like to see repeated,
because it is the lowest wait time during the entire study period but still not near the goal of 75 minutes.
The team should investigate why the process worked well on this day. Were the patients presenting with
less critical problems? Was the regular ED slow so that staff came over to assist on the fast track side? Could
it be that there was a data-recording mistake on this day and someone transposed a number? Why was
this day different from the other 19 days? There are lessons to be learned with every special cause.
The second thing you should notice is that the control limits are not straight. These are called
stair-step control limits. They are found on any chart that does not have a consistent subgroup size.
Stair-step control limits are seen most of the time on the p- and u-charts, but you will find them on an
X-bar and S chart when the sample being pulled is not consistent over time. In the present example,
the subgroups ranged from three to eight each day.

(continues)
296 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #11: Emergency Department Fast Track (continued)

Finally, you do not need a degree in statistics to see that this process is not operating anywhere
near the desired goal of 75 minutes. In fact, the process is running so high that the chart does not
have room to show the goal of 75 minutes or less. Despite the fact that there is a special cause present,
which basically makes the process unstable and unpredictable, you can still apply the “intraocular test
of significance” to the data (i.e., the old eyeball test) and see that the average time to go through the
fast track process is 108 minutes. The average UCL is 127 minutes and the LCL estimate is 89 minutes.
The complaints of the patients seem to be justified—particularly if word got out that this process was
going to get people in and out in 75 minutes or less.
So what should the team do now? One of the staff suggests taking down the banner that
proclaims that “Fast Track is here!” This idea is considered but then discarded; the banner stays up. The
next idea is to see whether the baseline data for the afternoon shift differ from that of the day shift and
whether the presence of blood or no blood in the presenting patients has an impact on the process
wait time. They need to look at the other four X-bar and S charts to see what they can learn from the
variation in the entire process. The QI expert suggested that there also may be difference based on
the hour of the day and the volume of patients coming into the ED. In conjunction with the additional
data work, they also decided to develop a flowchart on the process and see whether they can identify
any bottlenecks and delays in the process.
The final thing the team did was to take time to engage in a dialogue about the nature of the
concept of the “fast track.” They begin to wonder if everyone has the same operational definition of this
concept. How fast can the ED fast track process be expected to go? What does the public think about
when they see a sign that tells them there is a fast track process in place? Part of this team’s problem is
simply managing the customers’ expectations. It is a classic example of creating an expectation in the
customer’s mind, publicizing the idea, and then implementing a process that has not been tested under
different conditions and is not capable of even meeting expectations, let alone exceeding them. Quality
does not happen merely because you have good intentions and a colorful banner across the entrance
to the ED. It happens when you understand the VOC and the voice of the process (VOP) and then take
action to make sure that the process is capable of at least meet minimum expectations. In this case
study, it is not surprising that the fast track team was not near their goal. They did not start out with an
understanding of the capability of the process. Their good intentions were detoured by poor planning.

CASE STUDY #12: Tracking Patient Complaints


Situation
A group practice clinic decides to change the processes for registering patients and scheduling
follow-up visits. Within the center are the following subspecialties: family practice, internal medicine,
and obstetrics/gynecology. In addition, there is an outpatient laboratory, a pharmacy, and a durable
medical equipment service. In the past, each of the medical subspecialties had their own areas for
registration, scheduling follow-up visits, and payment (billing/insurance questions, etc.). Driven
primarily by financial factors, the managers of the various subspecialties agreed to consolidate all the
administrative functions. As a result, they decide to have one area to handle the three functions of
registration, scheduling follow-up visits, and handling billing questions. They also decide to physically
rearrange the floor space and the way in which the patients move within the facility.

(continues)
Applying Quality Measurement Principles 297

CASE STUDY #12: Tracking Patient Complaints (continued)

Discussion
■■ What type of control chart should be used to track the number of complaints?
Complaints are typically regarded as defects (i.e., attributes data). In addition, because one
patient or their representative can make more than one complaint we can eliminate further
consideration of this being a binomial measure (i.e., the patient either complained once or
more or did not complain). Given that we want to count all the complaints including multiple
ones by one patient, the charting options are the c-chart and the u-chart. The c-chart provides
a count of the number of defects (complaints) when there is an equal area of opportunity
for a complaint to occur. The u-chart, on the other hand, is reserved for creating a defect rate
when there is an unequal area of opportunity for the defect to occur. The u-chart produces
a rate-based statistic (i.e., so many complaints per 1,000 patient visits). The c-chart was used
because the daily number of visits to the various subspecialties showed very little fluctuation
and the types of patients visiting the clinic remained reasonably constant. If either or both of
these assumptions were to change, however, the chart of preference would be the u-chart.9
■■ What conclusions do you make about the administrative changes that were put in place? Did they
have a positive or negative impact?
FIGURE 10-27A shows the results of changing the administrative process, which occurred during
month 9. The upward shift in the number of complaints is obvious. But, there are actually
eight specific special causes on this chart. Can you identify all of them? To detect some of the
special causes you will need to refer to the light dashed lines on the chart showing the zones

55

50

45 UCL=45.74
A
40
Number of Complaints

B
35
C
30 CL=29.46
C
25
B
20
A
15
LCL=13.18
10

0
1 3 5 7 9 11 13 15 17 19 21 23 24

Month

FIGURE 10-27A Total number of patient complaints at a group practice clinic (c-chart)

(continues)
298 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #12: Tracking Patient Complaints (continued)

55

50

UCL = 45.74
45
A

40
B
Number of complaints

35
C
30
CL = 29.46
C
25

B
20
A
15
LCL = 13.18
10

0
1 3 5 7 9 11 13 15 17 19 21 23
Month

Numerous special causes are present

FIGURE 10-27B Total number of patient complaints at a group practice clinic with special causes
identified (c-chart)

(designated by the letters A, B and C). You also may need to refer back to Chapter 9 for a brief
refresher on the use of the zones and applying the special cause rules. All eight special causes
are shown in FIGURE 10-27B. If this chart were to be used in a presentation to the management
team the presenter would merely say, “Ladies and gentlemen, the chart reveals numerous
special causes, which makes the complaint process unstable and unpredictable. The first thing
we need to do is to investigate why we are observing these various special causes.”
In the overview of this case study, it was noted that the administrative changes were made in
1 month, which is month 9 on Figure 10-27b. After this point the number of complaints started
to increase. When eight data points stayed above the mean (CL) this was evidence that the
complaint process had shifted to a new level of performance. This is justification to partition or
phase the data into two segments as shown in FIGURE 10-28. Now it becomes very clear that
there are really two processes. One occurred under the old administrative process and another
with the new process. Under the old arrangement the process produced around 18 complaints

(continues)
Applying Quality Measurement Principles 299

CASE STUDY #12: Tracking Patient Complaints (continued)

Old Process New Process


70

60

UCL = 54.08
50 A

B
Number of complaints

40 C
CL = 36.07
C
30
B

A
20
LCL = 18.05

10

0
1 3 5 7 9 11 13 15 17 19 21 23 24
Month

FIGURE 10-28 Total number of patient complaints at a group practice clinic with historical control
limits (c-chart)

each month. After the changes were made, however, the complaints jumped up to an average
of 36 each month with an UCL of 54 and a LCL of 18, which by coincidence coincides with the
process average of the baseline. You can see from Figure 10-28 that the original process and the
new process both reflect common cause variation which means they are stable and therefore
predictable. The problem is that the new process (months 10–24) is predictably bad and seems
to be moving to even higher levels of complaints.
■■ Given these results, what are the next steps?
Because there has clearly been a shift in the process, the management team needs to have
a serious discussion about the trade-offs between cost and quality. If the desire is to cut
costs by consolidating the administrative functions and cutting staff, then the customer
satisfaction consequences of these changes need to be evaluated. If the complaints continue
to increase, it is very likely that patients may start to look elsewhere for medical care. Although
patients typically have strong allegiances to their physicians, there comes a point when the
administrative functions serve as a catalyst to push patients away from their current providers.

(continues)
300 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #12: Tracking Patient Complaints (continued)

This clinic staff needs to conduct further data analysis and develop a Pareto diagram on the
reasons for the complaints. They also should develop a series of flowcharts to identify the steps
involved in completing all the administrative functions. This would also help the team focus on
some of the reasons patients are complaining. If they do not gain a better understanding of why
patients are complaining, then they will flounder around in the dark, cursing the ever growing
number of complaints but not understanding which ones are the major problems or how to
address them. A systems view of this process is clearly needed.

CASE STUDY #13: Reducing Ventilator-Associated


Pneumonia10
Situation
The medical/surgical intensive care units (ICUs) just celebrated the longest run (20 days) they had ever
experienced without a ventilator-associated pneumonia (VAP). But within 2 days of this celebration
they began to see the occurrence of VAPs. The occurrence was sporadic at first with one VAP then a
run of 2–3 days without a VAP then another one or two VAPs. They decided that it was time to revisit
their approach to VAP management. Maybe they had become too comfortable with approaching a
month without any VAPs. Maybe the complexity of patients had changed. Maybe the new staff were not
properly trained in the application of the VAP bundle. They began by reviewing the literature (see for
example the resources we have on the IHI website related to the VAP bundle and management of VAPs:
http://www.ihi.org/Topics/VAP/) and then developed an updated flowchart on how they assess and
treat patients placed on a ventilator. One area that came up repeatedly in the team’s discussions was
that oral care procedures could be a causal factor for the increasing occurrence of VAPs. Upon further
investigation they also discovered that their current policy and practices on oral care for VAP patients
were not particularly clear and not being followed consistently. Finally, they discovered that new oral
health products had come on the market and had received very positive results in both hospitals and
skilled nursing facilities. The team decided to look first at their current oral care process and products
and determine if poor oral care could be a key factor leading to the increased occurrence of VAPs.

Discussion
Two nurses on the QI team partnered to review and then update the hospital’s policy and procedures
for the performance of oral care for all adult mechanically ventilated patients in the medical/surgical
ICUs. Current literature was reviewed to identify recommendations for comprehensive oral care on
ventilated patients. The nurses identified the following best practices:
■■ A daily assessment should be performed to evaluate the level of oral dysfunction and provide
the most appropriate care to keep the patient comfortable and prevent complications.
■■ Brushing a patient’s teeth should occur at a frequency of every 2–4 hours and as needed to
prevent the formation of plaque, which can be a reservoir for respiratory pathogens.
■■ Alcohol-free, antiseptic oral rinse should be used to prevent bacterial colonization of the
oropharyngeal area.

(continues)
Applying Quality Measurement Principles 301

CASE STUDY #13: Reducing Ventilator-Associated


Pneumonia (continued)

■■ Suctioning of oral secretions in both the oral cavity and the subglottic area (above the cuff )
should be performed to prevent the aspiration of microorganisms.
■■ Application of a water-based mouth moisturizer should be used to maintain the integrity of the
oral mucosa.
The nursing staff also reviewed new oral hygiene products on the market. After a thorough
assessment of several new products, one was selected and a plan was developed to test the new
products. Once the new oral care policy and procedure statement was finalized, copies were
distributed to ICU nursing staff. Group education on the rationale for the changes and instructions on
proper product usage were provided by the manufacturer. The manufacturer also provided posters,
evaluation forms, and follow-up to further clarify, troubleshoot, and educate the staff. The presence of a
“champion” for this process change (in this case a RN from the ICU) provided leadership, accountability,
and support to test the new process and products.
The specific outcome indicator selected for analysis was the VAP rate. This indicator had been
tracked at the hospital for many years, so there was a solid baseline of 24 months. The VAP rate was
operationally defined as the number of pneumonia occurrences per 1,000 ventilator days. The specific
components of the VAP rate are:
■■ Numerator—Total number of inpatient ICU occurrences of VAP
■■ Denominator—Total number of days ICU patients spent on a ventilator
This resultant value is then multiplied by 1,000, so the rate can be compared with state and national
data (i.e., so many VAPs per 1,000 vent days).
The key process measure was the percentage of compliance with using the new oral health
procedures. This was defined as:
■■ Numerator—the number of VAP oral health cleaning opportunities performed correctly (i.e.,
according to the new procedures)
■■ Denominator—the number of VAP oral health cleaning opportunities
The resultant value from this ratio is then multiplied by 100 to produce the percentage of correctly
performed oral cleanings. Data for this percentage were collected manually each day and tabulated
on a monthly basis to match how the VAP rate was tabulated. The chart selected for analysis of the VAP
rate calculation was the u-chart, because this is the chart of preference for the presentation of rate-
based statistics (e.g., fall rate, medication error rate, nosocomial infection rate, or neonatal death rate).
The process measure of percentage of compliance with the new oral hygiene procedures was placed
on a p-chart (i.e., oral cleaning was done either correctly or not correctly).
FIGURE 10-29 shows the baseline VAP rate for 24 months before the new oral hygiene process
was tested. Because the baseline reflects only common cause variation the process can be considered
stable and therefore predictable. The best quick prediction of future performance is the average or
centerline on the chart. If nothing is done to change the process, therefore, the team can expect
to have about 28 VAPs each month. Also because the baseline demonstrates only common cause
variation the baseline centerline can be “frozen” and extended into the future as a reference line for
future data. This is shown in FIGURE 10-30.
The team began testing the new oral hygiene process in month 25. They plotted the new data
against the frozen centerline of the baseline to determine whether they were able to create a special

(continues)
302 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #13: Reducing Ventilator-Associated


Pneumonia (continued)

Baseline Ventilator Associated Pneumonia Rate (u-chart)


90

80
Rate (VAPs per 1000 vent days)

UCL
70

60

50

40

30

20
CL = 28.3 VAPs per
10 1000 vent days
0 LCL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Month

FIGURE 10-29 Baseline Ventilator Associated Pneumonia Rate (u-chart)

Baseline Ventilator Associated Pneumonia Rate With extened centerline (u-chart)


90
Rate (VAPs per 1000 vent days)

80
70 UCL

60
50 Baseline
New oral hygiene
40 process started here
30
20
Extended CL of 28.3
10 from the baseline period
0 LCL

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Month

FIGURE 10-30 Baseline Ventilator Associated Pneumonia Rate with extended centerline (u-chart)

cause. They noticed that the VAP rates were lower than those in the baseline period but it was not until
they observed eight consecutive data points below the baseline centerline of 28.3 that the team could
declare that a downward shift in the process had actually occurred. FIGURE 10-31 shows the shift in the
process and the fact that the data for subsequent months also remained below the baseline centerline.
The team then phased the data and made two sets of control limits and two centerlines to show how
the new oral hygiene process affected the VAP rate. These chart revisions are shown in FIGURE 10-32.

(continues)
Applying Quality Measurement Principles 303

CASE STUDY #13: Reducing Ventilator-Associated


Pneumonia (continued)

Baseline Ventilator Associated Pneumonia Rate before and after the


new oral hygiene procedures (u-chart)
90
Rate (VAPs per 1000 vent days)

80
New oral hygiene
70 UCL process started here
60
50 Baseline Extended CL from
the baseline
40
30
20
10
0 LCL

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Month

Downward shift in the process (8 or


more data point below the “frozen” CL)

FIGURE 10-31 Baseline Ventilator Associated Pneumonia Rate with extended centerline and shift in the
process (u-chart)

Ventilator Associated Pneumonia Rate before and after the new oral
hygiene procedures with new control limits and centerline (u-chart)
90
Rate (VAPs per 1000 vent days)

80
New oral hygiene
70 UCL process started here
60
50 Baseline CL = 28.3

40 New CL = 15.3 VAPs


30 per 1000 vent days

20
10
0 LCL

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Month

FIGURE 10-32 Baseline Ventilator Associated Pneumonia Rate compared to the VAP rate with the new
process (u-chart)

(continues)
304 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #13: Reducing Ventilator-Associated


Pneumonia (continued)

Special cause
(3 sigma violations)
90%
80%
UCL
70%
60%
Percent

50%
LCL
40%
30%
20%
10%
0%
25 26 27 28 29 30 31 32 33 34 35 36
Month

FIGURE 10-33 Percentage compliance with the new oral hygiene protocol (p-chart)

Note that in the period after the new oral hygiene procedure was being used that the control limits on
months 25–36 are tighter than those in the baseline period and the centerline has now shifted from
28.3 VAPs each month to 15.3.
The team was also tracking the percentage of compliance with the new oral hygiene protocol. As
can be seen from FIGURE 10-33 the percentage of compliance progressed in a positive upward manner
once the new method was introduced in month 25 culminating in a special cause (the last 3 months
of data exceeded the UCL). This special cause is reflective of a process in transition that frequently will
be observed during the introduction and testing of a new procedure and methods. The team will need
to continue tracking compliance to make sure all staff are properly trained in the new methods and
their application of the new approach is reliable. Because the last three data points exceed the UCL
and seem to be stabilizing around 79% the team decided to phase the results shown in Figure 10-33.
FIGURE 10-34 shows the chart with two phases. The team can use this phased chart to see if they can
now move the process closer to 100% compliance.
In summary, the team’s theory about the impact of improved oral hygiene for mechanically
ventilated patients seems to be substantiated. As compliance with using the new oral hygiene
protocol increased they began to see a drop in the number of VAPs. This relationship between the
process indicators and the outcome indicator is central to QI initiatives. The process indicators are
essentially the independent variables that are believed to influence or drive changes in the dependent
or outcome indicator.
The team’s next steps should include:
■■ Obtaining comparative reference data or norms on VAP rates within their local area, state, region,
or country.

(continues)
Applying Quality Measurement Principles 305

CASE STUDY #13: Reducing Ventilator-Associated


Pneumonia (continued)

100%
90%
80%
70%
UCL
60%
Percent

50%
40%
LCL
30%
20%
10%
0%
25 26 27 28 29 30 31 32 33 34 35 36
Month

FIGURE 10-34 Percentage compliance with the new oral hygiene protocol with phased control limits
(p-chart)

■■ Continuing to track both the VAP rate and the percentage of compliance with the new
procedure. If the VAP rate starts migrating back toward the higher preprotocol average of 28.3,
then it would appear that (1) the new oral care procedures were not as effective as first thought
and/or that (2) staff behaviors did not change as much as needed in order to sustain the initial
improvements. It is also possible that other untested factors could be influencing the decline
observed in the postprotocol VAP rates. Finally, if the postprotocol VAP rates continue to stay
below the preprotocol average for several more months, then a shift in the process will have
been verified. The team should explore all of these possibilities, however, to make sure that they
fully understand the factors that are driving the hospital’s VAP rate.
■■ The team should also explore the feasibility of moving away from monthly data to a weekly
or every 2-week period for the subgroup. Waiting for monthly data will not only slow down
improvement efforts but for some topics, like infections, the consequences of waiting for
monthly results could have serious implications for the patients. The decision to go to a
weekly or biweekly time frame will hinge on the volume of vent days and the number of VAPs
occurring in the ICUs. In this case study, the count of VAPs each month was always in the single
digits. If the team were to move to weekly or even biweekly subgroups they may have too few
occurrences to make viable rates. Also as they get closer to zero VAPs the VAP rate calculation
becomes less meaningful. When they do start to have a run of 50 days or more without a VAP
they will need to move away from the VAP rate and start to plot the days between a VAP. This
requires using a t-chart. T-charts are discussed in Chapter 9.
306 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #14: Pain Management for Hip and Knee


Replacement Patients11
Situation
A suburban hospital known for its high inpatient satisfaction survey results noted a less than desired
overall score with the following survey question: “How well was your pain controlled?” The nursing
staff from the orthopedic department investigated pain control on their units by drilling down further
into the data. Through data analysis and qualitative feedback form both patients and nurses they
determined that hip/knee replacement patients would be an ideal target group for improved pain
management.
Two nurses who worked regularly with these patients decided to investigate the possible causes
of dissatisfaction. They enlisted the assistance of an internal quality measurement consultant and
decided to start tracking the effectiveness of the pain management process. They designed a study to
determine whether pain was being controlled effectively after the postsurgical epidural and/or patient-
controlled analgesia (PCA) pump were removed. Their concern was that once these more automated
systems for the administration of pain medication were removed, gaps were occurring in the manual
administration of pain medications. They believed that these gaps led to (1) lengthy delays in the
administration of pain medications, (2) the undesirable “peak and trough” pain management cycle, and
(3) increased dissatisfaction on the part of patients and the nurses.

Discussion
First the team randomly pulled 50 hip and knee replacement charts from the past 2 months and
evaluated the pain medication administration process. For all 50 patients, the times that they
were administered pain medications were recorded, and a bar graph showing the number of
medications given by hour of the day was prepared (FIGURE 10-35). This bar graph reveals that
the most popular time for administering pain medications was between 9 and 10 a.m. Note that
there is a ramping-up effect from midnight to 9 a.m. then there is a decline in the frequency with
spikes at 2 p.m. and 10 p.m. The diagram reinforced the nurses’ beliefs that there was considerable
variation in the amount of time between the manual administration of medications and that some
patients were experiencing unacceptable delays in receiving pain medications. This suggested
that the unit took a patient-reactive approach to pain management rather than a more desired
nursing-proactive approach.
Next they developed an indicator to evaluate the effectiveness of the pain management
program. They used a modified version of the Indicator Development Worksheet described in Chapter
4 to specify the key outcome indicator and the related measurement plan (EXHIBIT 10-1). Using the
universal pain scale, the nurses asked the patients to evaluate their levels of pain two times during
the day and evening shifts and at least one time during the night shift. This provided a minimum
of five pain assessments for each patient each day. Even though the universal pain scale is based
on a 0–10 “scale” for the evaluation of pain, the data are basically attributes in nature, not variables.
This is because the pain scale is not a true measurement scale that has equal-appearing intervals
between each number on the scale. The pain scale is a very subjective scale that basically produces
a rank order distribution rather than a true interval scale such as on a ruler where there are equal
appearing intervals between each mark on the ruler. Time and weight are further examples of a true
measurement scale. On the pain “scale,” one patient’s pain level of 5 may be a 7 for another patient
or a 3 for yet another. Therefore, the appropriate way to analyze pain scale data is by computing

(continues)
Applying Quality Measurement Principles 307

CASE STUDY #14: Pain Management for Hip and Knee


Replacement Patients (continued)

Total counts

27

25

20
20 19
18

16

15
Count

13
12 12

10 10
10
8 8 8
7
6 6
5 5 5 5
5
4 4

2
1 1
12:00.00 A.M.
1:00.00 A.M.
1:30.00 A.M.
2:00.00 A.M.
3:00.00 A.M.
4:00.00 A.M.
5:00.00 A.M.
6:00.00 A.M.
7:00.00 A.M.
8:00.00 A.M.
9:00.00 A.M.
10:00.00 A.M.
11:00.00 A.M.
12:00.00 P.M.
1:00.00 P.M.
2:00.00 P.M.
3:00.00 P.M.
4:00.00 P.M.
5:00.00 P.M.
6:00.00 P.M.
7:00.00 P.M.
8:00.00 P.M.
9:00.00 P.M.
10:00.00 P.M.
11:00.00 P.M.

FIGURE 10-35 Number of pain medications administered by hour of the day

a percentage of patients selecting a particular pain level, not by computing averages or standard
deviations. The only decision that needs to be made is where to divide the scale in order to compute
the percentages. The nurses reached consensus that an unacceptable pain level would be any
evaluation that was greater than a 4 on the universal pain scale.

(continues)
308 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #14: Pain Management for Hip and Knee


Replacement Patients (continued)

EXHIBIT 10-1 Indicator development worksheet (Ortho Pain Management Team)


1) What Key Quality Characteristic 2) What is the specific name of this
(KQC) does this indicator measure? (e.g., indicator?
turnaround time, answering telephones (e.g., Radiology Turnaround time,
within 3 rings) percentage is missing charts)
Prompt relief from pain for hip & knee patients. “Quality of Pain Control in post-op Patients“
Maintain low level of pain. Acknowledgement
of pain.
3) What is the rationale for this indicator? (Why has this indicator been selected? What is the
purpose of the indicator?)
Keeping pain levels low for hip and knee replacement patients is an essential part of the care
process. Low levels of pain will also increase the patient’s satisfaction with their treatment
and will be demonstrated by increased scores on the pain-related question on our patient
satisfaction survey. Observations have also indicated the lack of a standardized approach
to the dispensing of oral pain medications to post-op patients. Oftentimes patients have to
complaint of pain, then wait for medication, and finally wait for medication to take effect. This
often results in dramatic shifts of pain levels.
4) Data source collection method: Collection Frequency and Duration:
◽ Medical Record ◾ Patient Satisfaction Collected: ◾ Daily ◽  Weekly ◽  Monthly
Reports Reported: after baseline and after
◽  Data Logs   ◽  System Reports, intervention
please specify Duration: 07/08/02 to 15–20 patients
◾ Other: recording of pain level on laminated
form by nursing staff
5) Operational definition: Remember to include the full definition with all inclusions and
exclusions, specify all required data elements (e.g., patient types, financial class, data dictionary
elements, DRGs, codes, clinical specialty. Given a solid Op Def, 10 different people measuring the
same thing should arrive at identical results.)
Quality Pain Control is defined as: Using a population of total hip/knee replacement patients
concurrently, nursing will use the universal pain scale to record patients pain number two (2)
times per shift (standard shift is 7–3, 3–11) and at least one (1) time per night shift (11–7). RN
on each shift will record on laminated collection tool the actual time and pain number when
assessed utilizing the universal pain scale (0–10). At the end of night shift, start a new form each
day. Place completed form in the 2 North nursing office.
The lead ortho nurses (MS and SH) will be accountable for entering data from the laminated
sheets and compiling additional information.
BP, our improvement advisor (IA), will be responsible for generating control charts. Order on
control charts will be based on admit dates (date of surgery). P charts will be used along with a
Pareto chart.

(continues)
Applying Quality Measurement Principles 309

CASE STUDY #14: Pain Management for Hip and Knee


Replacement Patients (continued)

6) Numerator Statement: (if applicable, 7) Denominator statement: (if applicable,


i.e. number of C-Sections) i.e. number of births)
number of pain measurements over 4 number of pain measurements obtained

8) What is this indicator 9) Baseline: 10) Goal:


measuring? *see attached charts a reduction in pain levels
◽ Rate ◽ Days ◽ Time responses over 4 as demonstrated
◾ Percentage by control chart mean and upper
◽ Other and LCL *see charts

11) This indicator will satisfy the following objective(s):


◽ Physician Partnership ◾ Customer Satisfaction ◽ Regulatory Requirement
◽ Culture Transformation ◾ Clinical Excellence ◽ Risk Management
◽ Value Enhancement ◽ Operational Excellence ◽ Patient Safety
◽ Market Development ◽ Cost Reduction

12) Which of our 5 Pillars of Organizational Excellence does this indicator effect?
◾ Service ◽ Financial ◽ Growth ◾ Quality ◽ People

The baseline data are shown in FIGURE 10-36. Fifteen patients were initially assessed using the pain
scale and any response that was greater than 4 was defined as a not acceptable level of pain. The data
used to create the p-chart are shown in the table at the top of the p-chart. Note that the row labeled
“inspected” represents the denominator for the percentage calculation and the row labeled “count”
contains a particular patient’s pain scale ratings that were greater than 4. For example, patient 1 was
asked 31 times to provide a pain scale rating. Out of 31 assessments 21 of these were greater than 4
producing a 67.74%. Three conclusions can be derived from Figure 10-36: (1) the baseline data contain
two special causes (patients 1 and 9) both of which exceeded their UCL; (2) there is a fair amount of
variation from patient to patient (i.e., the average control limits go from a LCL = 0.82% to the UCL =
50.69%); and, (3) if nothing is done to improve this process the nursing staff and physicians can expect
about 25% of their hip and knee surgery patients to be experiencing serious levels of pain. This process
is basically unstable and unpredictable because of the two special causes, which essentially confirmed
what the nurses thought was occurring with this process. The data also reinforced their belief that
improvements needed to be made. After evaluating the baseline data, the team:
■■ Conducted several inservice sessions for the nursing staff to reacquaint them with the importance
of pain management and the hospital’s commitment to managing it consistently and effectively.
■■ Introduced pain management concepts in their established presurgical education program for
hip and knee patients. This consisted of a presurgery orientation program as well as bedside
education after the surgery. The content of these sessions helped the patients understand the

(continues)
310 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #14: Pain Management for Hip and Knee


Replacement Patients (continued)

Inspected 31 21 22 23 31 26 17 26 31 29 25 24 26 25 24
Count 21 0 6 7 4 4 2 5 15 3 2 7 13 4 2
Percent 67.74 0.00 27.27 30.43 12.90 15.38 11.76 19.23 48.39 10.34 8.00 29.17 50.00 16.00 8.33

Beyond limits
70

60

UCL = 50.69

50
Percentage of ratings over 4

40

30

Mean = 24.93

20

10

0
LCL = 0.82

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Baseline Patient number Baseline

FIGURE 10-36 Baseline percentage of hip and knee replacement patients indicating that pain was
greater than 4 on a scale of 0–10 (p-chart)

different methods for managing pain and the different routes for the delivery of medications
(e.g., epidural, PCA, oral medications, and injections).
■■ Created posters that were placed on the units informing the public of the hospital’s
commitment to effective pain management practices and outcomes.
■■ Instituted a Pain Interventions Pledge with the nurses (EXHIBIT 10-2). This was a direct effort not
only to get the nurses to think about the pain management process and its components but

(continues)
Applying Quality Measurement Principles 311

CASE STUDY #14: Pain Management for Hip and Knee


Replacement Patients (continued)

EXHIBIT 10-2 Pain intervention pledge for hip and knee replacement patients
Pain control is very important to our patients and a quality goal that is very important to the 2
North Nursing Staff. Therefore, as an RN on the 2 North Team, I agree to change my practice in
relation to pain control, specifically for Total Joint Replacement Patients:
1.Pain Assessment will be done on my patients at least every 4 hours
2.Be PRO-ACTIVE regarding pain – asking about pain levels on hourly checks and offering
pain medications before the patient has to ask
3. Total joint patients will receive pain meds around the clock (at the minimum at bedtime
and at awaking in the morning) so that therapy and pain control can be obtained
4. PM staff will medicate for pain at the hour of sleep. PM staff will also script: “Would you
like to be awakened during the night when your next dose of pain medication is due?”
If patient does not receive pain meds during the night, pain med will be given when the
patient is awaked in the morning or at the very latest at the breakfast meal. This practice will
increase the ability for our patients to be able to participate More fully in Physical Therapy
sessions
5. Education regarding around the clock pain med administration will be done with my
patients
_____________________________________
Signature of RN
_____________________________________
Date

also emphasize the importance of making a personal commitment to “change my practice in


relation to pain control, specifically in the total joint patients.” Each nurse on the unit was asked
to sign the pledge as a demonstration of their endorsement of this initiative.
Once the interventions were in place and operational, additional data were collected to see
whether the percentage of patients indicating that their pain levels were over 4 had declined.
FIGURE 10-37 shows the baseline period (patients 1–15) and the patient assessments of pain after the
intervention (data points 16–41). The control limits for the baseline period were “frozen” and extended
across the postintervention period. This was done in order to see whether the interventions were able
to produce a special cause (e.g., a downward trend or possibly a shift in the process).12 FIGURE 10-38
does not reveal that the process has been improved. There are two special causes (3-sigma violations)
that needed to be addressed in the new data (i.e., patients 29 and 34). These two patients exceeded the
UCL and had much higher levels of self-assessed pain than the rest of the other patients. Whenever
there are data points that are markedly different from the rest of the data, it is useful to ask:
■■ Was there a data entry error?
■■ Is this a stratification problem (i.e., are these two patients different from the rest of the patients
in the study)?
The team did not uncover any data entry errors. What the nurses did discover, however, was that
these two patients were unlike the rest of the patients.13 These two patients, therefore, were removed

(continues)
312 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #14: Pain Management for Hip and Knee


Replacement Patients (continued)

Inspected 31 21 22 23 31 26 17 26 31 29 25 24 26 25 24 24 21 31 31 28 30 22 20 30 45 38 45 46 42 25 55 47 25 47 43 26 21 31 36 42 28

Count 21 0 6 7 4 4 2 5 15 3 2 7 13 4 2 11 5 9 12 4 12 2 3 9 19 4 8 20 24 2 4 8 6 33 7 5 1 1 7 9 6

Percent 67.74 0.00 27.27 30.43 12.90 15.38 11.76 19.23 48.39 10.34 8.00 29.17 50.00 16.00 8.33 45.83 23.81 29.03 38.71 14.29 40.00 9.09 15.00 30.00 42.22 10.53 17.78 43.48 57.14 8.00 7.27 17.02 24.00 70.21 16.28 19.23 4.76 3.23 19.44 21.43 21.43

Beyond limits
7 below centerline
70
Pain contract intervention after 15th patient

60
Percentage of ratings over 4

UCL = 50.69
50

40

30
Mean = 24.93

20

10

0
LCL = 0.82

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Baseline Patient number Follow-up

FIGURE 10-37 Pain assessments for hip and knee patients before and after the intervention (control
limits are based on patients 1–15, the baseline period and extended) p-chart

from the analysis and the p-chart was recalculated. The result is shown in Figure 10-38. Notice that in
this figure there is a period after the intervention when the percentage of patients who indicated that
their pain was greater than 4 is very similar to the baseline (i.e., compare patients 1–15 with patients
16–28 and you will see similar patterns of variation). Then an interesting thing is observed. A special
cause (i.e., a downward shift in the data) is detected from patient 29 to 36 (i.e., eight consecutive data
points initially below the frozen baseline mean of 24.93%. Also notice that the next three patients
(37–39) had their percentages of pain ratings also stay below the baseline mean suggesting that the
improvement was being sustained. You would not expect to see that many data points below the
baseline centerline if this was merely a random process.
Why did it take the process so long to respond to the interventions? Is this potentially another
data entry problem, or could it be that the last 11 patients did not fit the profile of the previous
patients? As the team discussed these questions, they arrived at several conclusions:
■■ The data were clean, and patients 29–39 were very similar to those in the previous time periods.
■■ They had instituted four new interventions simultaneously. Typically you would introduce one
change at a time and evaluate the impact of that single improvement strategy. This is the nature

(continues)
Applying Quality Measurement Principles 313

CASE STUDY #14: Pain Management for Hip and Knee


Replacement Patients (continued)

Inspected 31 21 22 23 31 26 17 26 31 29 25 24 26 25 24 24 21 31 31 28 30 22 20 30 45 38 45 46 25 55 47 25 43 26 21 31 36 42 28

Count 21 0 6 7 4 4 2 5 15 3 2 7 13 4 2 11 5 9 12 4 12 2 3 9 19 4 8 20 2 4 8 6 7 5 1 1 7 9 6

Percent 67.74 0.00 27.27 30.43 12.90 15.38 11.76 19.23 48.39 10.34 8.00 29.17 50.00 16.00 8.33 45,83 23.81 29.03 38.71 14.29 40.00 9.09 15.00 30.00 42.22 10.53 17.78 43.48 8.00 7.27 17.02 24.00 16.28 19.23 4.76 3.23 19.44 21.43 21.43

Beyond limits
7 below centerline
70
Pain contract intervention after 15th patient

60
Percentage of ratings over 4

UCL = 50.69
50

40

30
Mean = 24.93

20

10

0
LCL = 0.82

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Baseline Patient number Follow-up

FIGURE 10-38 Pain assessments for hip and knee patients before and after the intervention and after
removing patients 29 and 34 (control limits are based on patients 1–15, the baseline period and
extended) p-chart

of the Plan–Do–Study–Act (PDSA) cycle. When you implement several changes simultaneously,
however, it is hard to determine which intervention or combination of interventions caused the
change in the process.14
■■ Several of the changes required behavioral adjustments on the part of the nurses and the
patients. Because behavioral changes do not happen instantaneously, there is probably a lag
effect when it comes to observing the impact of the interventions.
■■ Data points 16 through 28 can be regarded as a “learning curve” and a transition phase after the
intervention for both the patients and the nursing staff.
■■ Therefore, the chart should be partitioned into three segments to reflect the baseline, the
transition period, and the current process.
FIGURE 10-39 shows the process segmented into three defined segments or phases. The left side
of the chart (patients 1–15) shows the baseline with an average of about 25% of the patients reporting

(continues)
314 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #14: Pain Management for Hip and Knee


Replacement Patients (continued)

Inspected 31 21 22 23 31 26 17 26 31 29 25 24 26 25 24 24 21 31 31 28 30 22 20 30 45 38 45 46 25 55 47 25 43 26 21 31 36 42 28

Count 21 0 6 7 4 4 2 5 15 3 2 7 13 4 2 11 5 9 12 4 12 2 3 9 19 4 8 20 2 4 8 6 7 5 1 1 7 9 6

Percent 67.74 0.00 27.27 30.43 12.90 15.38 11.76 19.23 48.39 10.34 8.00 29.17 50.00 16.00 8.33 45.83 23.81 29.03 38.71 14.29 40.00 9.09 15.00 30.00 42.22 10.53 17.78 43.83 8.00 7.27 17.02 24.00 16.28 19.23 4.76 3.23 19.44 21.43 21.43

Beyond limits
70
Pain contract intervention after 15th patient

60

UCL = 52.85
Percentage of ratings over 4

UCL = 50.69
50

40
UCL = 32.81

30 Mean = 28.71

Mean = 24.93

20
Mean = 14.78

10

0
LCL = 0.82

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Baseline Patient number Follow-up

FIGURE 10-39 Pain assessments for hip and knee patients before and after the intervention and after
removing patients 29 and 34 (baseline [patients 1-15], after Interventions [patients 16–28], and
follow-up [patients 29–39], all have separate control limits) p-chart

pain levels greater than 4. The middle section of the chart (patients 16–28) reflects the transition or
learning period. The control limits during this period are not very different from the baseline, but
the centerline is actually a little higher (29%) than the baseline. This is not uncommon for a period of
transition, because people are adapting to new ways of behaving, which frequently has a tendency to
create more variation rather than less. These conditions are compounded when multiple interventions
are implemented simultaneously. In this case, staff and patients are not sure of the steps in the new
process or the expected behaviors which can lead to increased variation and/or instability.
Finally, the right side of Figure 10-39 (patients 29–39) reveals how the process has shifted to a
new (and more acceptable) level of performance after the transition. During this phase the patients
and staff appear to be settling into the new process. The real test of improvement will be if the average
percentage of patients evaluating their pain levels greater than 4 stays at or below the new process
average of about 15%. Also, is it possible to get the percentage to be even lower? The team will have to
evaluate its aim and decide whether further improvement is possible.
Applying Quality Measurement Principles 315

CASE STUDY #15: Hospice/911 Paramedic System


Partnership to Improve Care15
Situation
When people with a terminal illness are enrolled in hospice care the goal is to provide them with
comfort care and dignity in the final stages of life. Patients and their families are instructed during
the hospice intake process to call their hospice provider’s 24-hour phone number with any issues,
concerns, or emergencies. They are told not to call 911. This is so that the patient’s wishes for a “good
death” with minimal suffering and intervention can be honored. Despite clear instructions, 911 is
activated for hospice patients all over the United States often by visiting relatives or concerned friends,
who were not part of the hospice intake process.
Once 911 paramedics, who are trained and equipped for life-saving care, are involved
the patient is often transported to a hospital ED by ambulance. The movement involved with
transport often aggravates the patient’s levels of pain and nausea. Most hospital emergency
teams are focused on life saving and disease treatment. Therefore, the comfort care and minimal
intervention approaches of hospice workers are frequently in conflict with what the ED is
designed to accomplish. Admission to the hospital also causes problems with hospice insurance
benefits, resulting in additional suffering and stress for patients and their families.

Discussion
In the fall of 2014, hospice providers and the 911 paramedic system in Ventura County, California,
began a project to address this issue and improve care for hospice patients who had 911 activated
on their behalf. Baseline data indicated that between 250 and 300 hospice patients had 911
activated for them in Ventura County each year with 80% of these patients being transported to the
hospital ED.
The aim of the project was to decrease the percentage of 911 hospice patients transported by
50% within the first year. The primary theory for improvement was that if paramedics were integrated
more thoroughly into the hospice system of care that more appropriate decisions could be made
regarding the need for transport.
The first PDSA in the project involved having a few paramedics ride along with hospice nurses
as they made their rounds visiting patients, entered new patients into the hospice program, and
mediated conflicts between family members of various hospice patients. The prediction was that
paramedics could see themselves acting as part of the hospice team and this was confirmed as valid
by the paramedics involved.
The second PDSA cycle involved having paramedics attend the weekly case conferences of two
different hospice organizations. These conferences led by the hospice’s physician medical director
reviewed situations that came up with their patients the previous week through the lens of how a
well-trained paramedic might support the patient’s care and blend with the hospice team. The hospice
medical directors, case managers, and nurses who participated in these case conferences all agreed
that having paramedics function as members of their team would likely improve care for patients and
keep them within the hospice program.
For the third PDSA two of the largest hospice providers, Assisted Hospice and Livingston
Memorial Hospice, trained 14 paramedic supervisors from American Medical Response Ventura, Gold
Coast Ambulance, and LifeLine Medical Transportation using a curriculum for new hospice nurses.
In order to support this initiative the state of California Emergency Medical Services (EMS) Authority

(continues)
316 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #15: Hospice/911 Paramedic System


Partnership to Improve Care (continued)

100.00%

90.00%

80.00%
CL = 80.3%
Percent Transported by EMS

70.00%

60.00%

50.00%

40.00%
CL = 37.1%
30.00%

20.00%

10.00%

0.00%
Jan 15

Feb 15

Mar 15

Apr 15

May 15

Jun 15

July 15

Aug 15

Sep 15

Oct 15

Nov 15

Dec 15

Jan 16

Feb 16

Mar 16

Apr 16

May 16

Jun 16

Jul 16

Aug 16
Month

FIGURE 10-40 Percentage of Ventura 911 hospice patient calls transported to the emergency room (p-chart)

granted a temporary order to increase the state scope of paramedic practice and the local institutional
review board (IRB) approved the project.
For hospice patients this meant that medical control of paramedic practice shifted from the
county EMS medical director to the hospice physician for the patient’s hospice. When 911 was
activated for a hospice patient one of the specially trained paramedic supervisors would be dispatched
along with a paramedic ambulance and the first responders from the fire department. Rather than
taking their clinical directives from the county EMS medical director, they would follow the direction
of the patient’s hospice physician. The results of putting this new initiative in place are shown in
FIGURE 10-40.
The baseline data shown on the left side of Figure 10-40 show that a little over 80% of the 911
calls received were transported to the emergency room. After the Community Paramedic Project
described in the previous paragraph was tested, however, the new average dropped to 37%. This new
average represented a 54% reduction in the baseline average. The original aim was to reduce this
number by 50%. Although improvement in the process is noted, the challenge to the team is to work
to standardize the new procedures and not have the process drift back to the original baseline levels.
This concern is based on looking at the right side of Figure 10-40 where the data are starting to drift
upwards. Specifically, the last three data points are above the new average of 37.1%. If this pattern
continues and the next five data points remain above the CL then a shift in the wrong direction will be
detected. Questions the team discussed included:
Applying Quality Measurement Principles 317

CASE STUDY #15: Hospice/911 Paramedic System


Partnership to Improve Care (continued)

■■ Do we have a reliable design for the new Community Paramedic Project?


■■ Has everyone been trained properly in the new way of addressing 911 calls for hospice patients?
■■ Has the severity of the patient population or the mix of conditions and diagnoses changed
recently?
As the team discussed these issues with their IA they arrived at the following findings:
■■ The upward drift seemed to be caused by a few of the trained community paramedics getting
promoted or moving on to take other positions, which caused a staffing issue, which in turn
caused some long response times.
■■ With the longer response times the first responders from the fire department and the paramedic
ambulance crew were more likely to initiate transport before the arrival of the community
paramedic.
■■ The state gave the participating paramedic organizations permission to train additional
community paramedics (initially they were limited to the initial group). Class will start later this
month.
With the help from their IA Mr. Mike Taigman, the team laid out an ongoing data tracking process. They
agreed that further testing and data analysis were needed before declaring a complete victory. Initial
results, however, do look promising.

CASE STUDY #16: Improving Access to Community Services


for Mental Health and Community Health Patients16
Situation
Two key topics drive the strategic dashboard of the East London National Health Service (NHS)
Foundation Trust (ELFT): (1) reducing harm and (2) right care, right place, right time. Four aspects of
care delivery were identified as the major dimensions of these two key topics:
■■ Safety
■■ Clinical effectiveness
■■ Patient experience
■■ Staff experience
ELFT has over 160 teams working on improvement projects that fall into one of these four aspects of
care delivery. The team charged with improving access to community services for mental health and
community health patients is one of the first improvement teams that was chartered by the Trust. This
team has a fairly broad range of responsibilities including the delivery of community-based mental
health services as well as community health in general, which encompasses all aspects of primary

(continues)
318 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #16: Improving Access to Community Services


for Mental Health and Community Health Patients (continued)

care delivered in the community except general practice (e.g., it includes cardiac rehab, physiotherapy,
district nursing, and home health visiting).
This topic is central to ELFT’s work for several reasons:
■■ Long waits to be seen after being referred from primary care create backups in the system and
delays in providing essential care to the service users.
■■ Delayed access is a major source of poor patient experience, which increases complaints.
Comparative data from the NHS show ELFT as having long waits when compared to other
providers across England.
■■ This topic was almost too “wicked” of a problem to be talked about at ELFT until QI became the
Trust’s central business strategy.
The responses from many staff typically centered on two reasons why access to community services
could not be improved: (1) they were short staffed and their budgets had been reduced (i.e., they were
approaching the issue as a first-order change with the solution being give us more staff and more
money), and (2) delays in access are just part of the way the NHS works.
The team faced several initial challenges. First, they did not know the magnitude or extent of
delays in gaining access to the system. Second, they had to address the attitudinal and behavioral issues
that allowed delays in gaining access to community services to be part of the staff’s general mindset.
Finally, they needed to understand how this issue fit into the Trust’s overall strategy to reduce harm
and provide the right care, right place, and right time. Through ELFT’s QI strategy, however, the team
began to address these three challenges and realized that they were empowered to tackle this issue but
with no extra staff or increased budget. What they did have, however, was support from senior leaders
and the board to improve access and the skills and tools of quality improvement (QI) through a large
capability building program delivered by ELFT and IHI. The access team’s journey is described next.

Discussion
After assembling subject matter experts from the various service lines that dealt with the access issue,
the QI team coach met with the team to start laying out their journey. The first thing they did was to
develop an aim statement specifying how they would work on their theory of improvement and the
methods they would employ. The team’s aim statement is shown in BOX 10-1.
Because of the complexity of the access issue the team took the advice of their QI coach and
devised a “learning system” that would support their work. They established three key components of
this learning system: (1) Webex learning sets, (2) individual project support structures and, (3) measures.
The details of this learning system are shown in FIGURE 10-41. The time spent discussing how the
team would learn and approach its work was well spent. It allowed the team members and the QI
coach to engage in a rich dialogue about the nature of the challenges that lie ahead and their plan for
addressing them. Too many teams begin their quality journeys without a roadmap or plan. This leads to
the lack of direction, wasted time, and frequent conflicts among team members because they are not
clear about their purpose and how they plan to achieve their aim.
The Webex sessions were held every 6–8 weeks and provide a time-efficient way to bring team
members together virtually. This not only saved time and resources but also helped prepare the team
members for the face-to-face meetings. During the Webex sessions external case studies and best
practices from other NHS Trust and IHI’s work in other countries on the issue of access were reviewed,
current and future PDSA tests were reviewed and future tests planned. One of the new ideas that

(continues)
Applying Quality Measurement Principles 319

CASE STUDY #16: Improving Access to Community Services


for Mental Health and Community Health Patients (continued)

BOX 10-1 Aim statement for the improving access team


Aim:
■■ Improving access to services for new patients by increasing uptake, reducing wait times or

reducing DNAs (did not attend) according to locally set targets by April 2017
Theory:
■■ These three areas affect the start of the care pathway and are interrelated. Improvement in

one area may adversely affect the others.


Contract:
■■ Project team appoints a project lead.
■■ Project team meets regularly as a team and invites QI lead.
■■ At least one member of the team attends collaborative learning sets.
■■ Copy QI team into updates to sponsor/coach
■■ Use of shared folder as primary location for project documentation
■■ Use of the three core measure for the collaborative as a minimum

Method:
■■ Project teams that are part of the learning system will have at least one of these measures as their

main outcome measure and the rest as a balancing measure in addition to their own measures
■■ Teams develop and test change ideas in their own areas and are encouraged to collaborate

with other project teams in the learning system


■■ A QI lead regularly meets with the project teams to help them apply the QI tools and

methodology to advance their work and to act as a link across similar projects
■■ Change ideas that are shown to be effective at reducing wait times and DNAs will be

evaluated by the learning system members and cascaded to appropriate services


■■ Change ideas that are effective within the learning system will be cascaded to the rest of the

organization

emerged from this sessions was the need for a monthly newsletter, posted on the ELFT’s intranet,
which provided:
■■ Information on what each team is testing or planning to test
■■ Project team interviews (internal case study)
■■ Information about a QI tool (e.g., flowcharting, data collection or a particular Shewhart chart)
and other internal and external resources that could be of assistance to the improvement teams
■■ Dates, times and locations of upcoming learning sets.
As the access team gained increased knowledge about the SOI they realized that the sequence
of improvement started with ideas that can be put into PDSA testing, which leads over time to
implementation of reliable changes that have been shown to work and finally to spreading the
improvement work to other locations. To share this piece of learning with other teams at ELFT the

(continues)
320 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #16: Improving Access to Community Services


for Mental Health and Community Health Patients (continued)

Improving Access Learning System

WebEx Learning Sets

12 Projects

6 Wait Times 6 Increasing Uptake


Projects Projects

• Sharing progress • Core measures & DD • Dashborad


• Case Studies • Change Ideas & Tests • Newsletter
Measures
Facilited by: Director of Ops, 2x QI Leads, 2x Senior IAS

Wait Times - Average days from


referral accepted to first face-to-face
Individual Project Support contact

% DNA - Do Not Attends (DNAs)


Coaching Testing Dashboard Alignment before first face-to-face contact /
total number of appointments
booked (excluding cancellations)

Documen- New Referrals - Total number of


Tools Research Strategy referrals received from external
tation
referrers (non-ELFT)

Supported by: 2x QI Leads Supported by: QI Data Analyst

FIGURE 10-41 ELFT Improving Access learning system

access team developed FIGURE 10-42 that laid out their thoughts on a scale-up and spread strategy for
improving access to care. Even though the team was still in a testing phase the discussion with their
QI coach about their ultimate destination provided a vision of the end game and helped the team
understand more fully the nature of their journey.
The access team chartered 12 project teams organized under two key headings: (1) reducing
waiting times and (2) increasing uptake. A summary of these teams and the current changes they are
each testing is shown in TABLE 10-2. Each of the 12 teams developed their own aim statement, created
flowcharts to detail the steps involved in each respective process, and then created measures and data
collection plans to understand the variation inherent in each of the respective processes they were
working to improve.
Three indicators formed the core of all 12 teams’ measurement strategies:
■■ Waiting Time—Average days from referral accepted to first face-to-face contact. This measure
looks at external referrals (i.e., non-ELFT referrals from GPs, primary care, or other Trusts)

(continues)
Applying Quality Measurement Principles 321

CASE STUDY #16: Improving Access to Community Services


for Mental Health and Community Health Patients (continued)

Scale-Up and Spread

Change Change
Ideas Ideas
Designing and Testing
multiple change ideas by
Change teams within the
Ideas learning system

Degree of belief

Up-scale & Test Bundle


within the learning
system
Bundle

Degree of belief

Recruit & Spread


the bundle as a menu
of options across the
organization

FIGURE 10-42 ELFT scale-up and spread strategy for improving access to care

■■ Number of Referrals—Total number of referrals received from external referrers. This measure
looks at external referrals (i.e., non-ELFT referrals from GPs, primary care, or other Trusts)
■■ Percentage of DNAs—DNAs before first face-to-face contact divided by total number of
appointments booked (excluding cancellations). This measure looks at external referrals (i.e.,
non-ELFT referrals from GPs, primary care, or other Trusts)
Data on these three indicators were analyzed with Shewhart charts. Totals for the system as well as
individual teams and borough locations were developed. When appropriate the dashboard indicators

(continues)
322 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #16: Improving Access to Community Services


for Mental Health and Community Health Patients (continued)

TABLE 10-2 ELFT change ideas being tested by the Improving Access Team
Current Tests

Reducing Waiting Times Projects

Project Team Currently Testing


NH Psychological Therapies Services After several change ideas, running session in August to develop
new change ideas

CH Psychological Therapies Service Text message reminders and service user focus group

Memory Clinics in East London Currently making technical changes to have uniform processes
across East London memory clinics

Community CAMHS Tower Hamlets Offering two sessions to some new referrals before final decision

CH Adult Community Mental Health Telephone referral and screening; Text message reminders; Posting
Access Services appointment letters directly

TH Psychological Therapies Service DNA policy statement on therapy contract; New referral form;
Mailshot to service users and carers to attend SU engagement event

Increasing Uptake Projects

Project Team Currently Testing


NH Psychological Therapies Services Text message reminders using web platform

Community Sexual and Reproductive Testing manual text messages sent immediately after appointment
Health booked, confirmation texts rather than reminders.

Clozapine team and EPCL team Data cleaning of RiO in preparation for iPlato. Testing EE messages
for one clinic.

Health Visiting, Sickle Cell & Testing an emergency clinic slot to ensure that clients who are at
Thalassaemia and Children & Young 20 weeks or over gestation are seen within 48 hours. Completing a
People service, NUHT DNA audit and planning to contact a small sample of service users
to gather qualitative information about why service users DNA
appointments

MSK Physio iPlato text message reminder system for new referrals and
follow-up appointments

TH Adult Community Mental Health Designing leaflet specific for Isle of Dogs GP and community teams
Services

(continues)
Applying Quality Measurement Principles 323

CASE STUDY #16: Improving Access to Community Services


for Mental Health and Community Health Patients (continued)

were displayed as small multiples to maximize the amount of information being displayed (Tufte, 1983,
1990). FIGURES 10-43, 10-44, and 10-45 show the Shewhart charts for these three indicators aggregated
across all sites and all teams. You will notice that the charts have phased control limits and extensive

Average waiting time

Novemver 2016 -Baseline data


Average Waiting Time/Days

Average waiting time from referral to 1st face to face appt (Collaborative, 10/12 teams) - X-bar Chart
70
65 UCL
60.66
60
55 53.17
LCL 50.74
50
45
40 44.51
35
Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16
27/07 03/09 05/01 2 new teams 3 team leave
Testing begins

Learning Learning Learning join collaborative


Set 3 Set 4 Set 7 collaborative

28/03 01/10 16/02 10/05


Learning Learning Learning Learning
Set 2 Set 5 Set 8 Set 9

25/11
Learning
Set 6

Average waiting time from Referral to 1st face to face appointment - I Chart
Child and Adolescent Mental Health Service (Tower Hamlets) Community Mental Health Teams (City and Hackney & Tower Hamlets)
80.0 80.0
Initial baseline Initial baseline
60.0 60.0
Decrease identified Decrease identified
Decrease identified
40.0 40.0 Decrease identified

20.0 20.0

0.0 0.0
Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16

Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16

Psychological Therapy Service (City and Hackney, Newham & Tower Hamlets) MHCOP Memory Service (City and Hackney Newham & Tower Hamlets)
140.0 80.0
Initial baseline
120.0 Initial baseline
Decrease identified
100.0 60.0

80.0
40.0
60.0
40.0 20.0
20.0
0.0 0.0
Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16

Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16

FIGURE 10-43 ELFT Dashboard: average waiting time from referral to 1st face-to-face appointment (I charts)

(continues)
324 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #16: Improving Access to Community Services


for Mental Health and Community Health Patients (continued)

Referrals

Novemver 2016 -Baseline data

No. of referrals received (Collaborative, 10/12 teams) - I Chart


1700
No. of Referrals

1500
UCL 1,274.14
1300
1100 1,021.71 1,213.13

900
LCL
700
Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16
16/02 27/07 03/09 05/01 2 new teams 3 team leave
Learning Learning Learning Learning join collaborative
Set 1 Set 3 Set 4 Set 7 collaborative
Testing begins

28/03 01/10 16/02 10/05


Learning Learning Learning Learning
Set 2 Set 5 Set 8 Set 9

25/11
Learning
Set 6

No. of Referrals Received - I Chart


Child and Adolescent Mental Health Service (Tower Hamlets) Community Mental Health Teams (City and Hackney & Tower Hamlets)
200 1000
Initial baseline increase identified
increase identified
800 721.38
150 126.36 increase identified 646.60
600 Initial baseline 556.50
100 414.21
400
50
200

0 0
Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16

Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16

Psychological Therapy Service (City and Hackney, Newham & Tower Hamlets) MHCOP Memory Service (City and Hackney, Newham & Tower Hamlets)
350 250
300 Initial baseline Initial baseline
200
250 145.82
150
200
150 211.79 100
100
50
50
0 0
Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16

Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16

FIGURE 10-44 ELFT dashboard: number of referrals received (I chart)

(continues)
Applying Quality Measurement Principles 325

CASE STUDY #16: Improving Access to Community Services


for Mental Health and Community Health Patients (continued)

Did Not Attend (DNA)

Novemver 2016 -Baseline data

% of 1st face to face appts DNAs (Collaborative, 10/12 teams) - P Chart


38%
36% UCL
34% 32.21%
32%
DNA / %

30%
28% LCL 25.23%
26%
24% 26.39%
22%
20%
Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16
16/02 27/07 03/09 05/01 2 new teams 3 team leave
Learning Learning Learning Learning join collaborative
Set 1 Set 3 Set 4 Set 7 collaborative

28/03 01/10 16/02 10/05


Testing begins

Learning Learning Learning Learning


Set 2 Set 5 Set 8 Set 9

25/11
Learning
Set 6

% of first appointment non-attendance - I Chart


Child and Adolescent Mental Health Service (Tower Hamlets) Community Mental Health Teams (City and Hackney & Tower Hamlets)
50.0% 60.0%
Initial baseline
Initial baseline
40.0% Decrease identified
40.0% Decrease identified
30.0%

20.0%
20.0%
10.0%

0.0% 0.0%
Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16

Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16

Psychological Therapy Service (City and Hackney, Newham & Tower Hamlets) MHCOP Memory Service (City and Hackney, Newham & Tower Hamlets)
50.0% 40.0%
Initial baseline
Initial baseline
40.0%
30.0%
30.0%
20.0%
20.0%
10.0%
10.0%

0.0% 0.0%
Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16

Jan-14
Feb-14
Mar-14
Apr-14
May-14
Jun-14
Jul-14
Aug-14
Sep-14
Oct-14
Nov-14
Dec-14
Jan-15
Feb-15
Mar-15
Apr-15
May-15
Jun-15
Jul-15
Aug-15
Sep-15
Oct-15
Nov-15
Dec-15
Jan-16
Feb-16
Mar-16
Apr-16
May-16
Jun-16
Jul-16
Aug-16
Sep-16
Oct-16

FIGURE 10-45 ELFT Dashboard: % of first appointment non-attendance (I chart)

(continues)
326 Chapter 10 Applying Quality Measurement Principles

CASE STUDY #16: Improving Access to Community Services


for Mental Health and Community Health Patients (continued)

annotations to highlight the changes that were tested and their and the impacts of these changes.
Each figure shows the aggregated chart on the top and a series of small multiples underneath the
aggregated chart showing the various teams and their locations. When you study the charts you will
see that wait times have gone down, whereas the number of referrals have gone up. This has been
accompanied with a decline in the percentage of DNAs.
This case study demonstrates nearly all the principles that I have discussed in this book. It
started out with a concern for the VOC, specifically that staff observed and service users complained
about long wait and poor referral processes. It then moved to organizing the access team, building
an aim statement, understanding the learning system they were trying to build, defining indicators
that would capture the variation in the key outcomes, and organizing a data collection plan. Once
the data were collected, they proceeded to analyze the three outcome measures on appropriate
Shewhart charts (X-bar and S, XmR, and p-charts), annotated the charts to show when they ran
PDSA tests and then interpreted the charts for common and special causes of variation. Finally, they
tested their improvement strategies against a baseline to see whether their interventions made a
difference. At first, they were a little discouraged because the improvement did not happen right
away. But they soon realized that process and behavioral changes do not occur overnight. This
case study also helps to explain why ELFT has received numerous awards over the past year for its
application of QI to daily work. It is not an impossible journey but one that does require knowledge,
skills, the will to change, ideas on how to make improvements, the ability to execute improvement
strategies, and constancy of purpose.
Notes 327

Notes Advocate Good Samaritan Hospital in


Downers Grove, Illinois, for sharing
1. I want to acknowledge Dr. Charles Derus, this case study with me and helping
of Advocate Dreyer Medical Clinic for to develop the measurement plan and
tutoring me in the clinical aspects of this indicators. The potential seriousness
case study. His technical consultation is of this situation requires a passion and
very much appreciated. commitment to break down barriers
2. For more information on issues related and make improvements. These women
to central line insertion and maintenance have met this challenge with great energy
readers can consult the IHI’s website where and spirit. They have spent numerous
we have extensive information on the hours with me thinking through the
central line bundle and infection issues. logic of how the levels of stratification
The link is http://www.ihi.org/sites/search influence the various outcomes and
/pages/results.aspx?k=central+line+bundle. spent after-work hours assembling data,
3. Mr. William Peters, IHI IA, called one analyzing control charts, and developing
day and posed this question on sampling improvement strategies. I thank them
options. Our conversation prompted me for their dedication and willingness to
to develop this case study. I thank him share their story.
for the idea. 8. I also suggested to the nursing team that
4. Most control charting software will provide they should consider tracking a new
you with the option for presenting the indicator—the percentage of pregnant
data as a proportion or as a percentage. women with positive GBS who deliver
Remember, to convert a proportion to a in less than 4 hours after arriving at the
percentage you merely multiply the propor- hospital. This indicator will help them
tion by 100. Thus the average proportion understand if the volume of women
of nosocomial PUs in this figure is 0.241, who do not meet the criteria is actually
which translates to 24.1% (0.241 × 100). increasing and potentially influencing
5. Like many aspects of health care, this is the overall performance of this process.
one of those topics that most people do The last recommendation I had was that
not think much about on a regular basis. they should develop a Pareto diagram to
It can have profound implications not document the reasons why the antibiot-
only for process efficiency and resource ics were not delivered according to the
allocation but also, and most important, protocol for those women arriving at the
for patient safety. If you have not looked at hospital 4 or more hours prior to delivery.
this issue recently I would encourage you The Pareto diagram will enable the team
to at least investigate how often FS occurs. to focus on the major reason(s) why the
6. This is a very personal case study because protocol is not working for the targeted
it tells the tale of my wife’s breast cancer population.
journey. My objective in sharing this 9. If the conditions creating an equal area of
story is to help highlight the opportunity opportunity for a complaint changed (i.e.,
to apply statistical thinking to the care there were wide swings in the daily volume
of individual patients and not just to of patients and/or the severity and mix
larger groups or aggregations. My wife of patients changed constantly), then the
was diagnosed in April 2001 and is still chart of preference would be the u-chart.
cancer free today. The u-chart would provide a complaint
7. I want to thank Carol Burke, RN, Nancy rate where we would have the number
Longino, RN, and Julie Marre, RN, from of complaints as the numerator and the
328 Chapter 10 Applying Quality Measurement Principles

number of visits as the denominator. This new process compared to the baseline,
would produce a complaint rate (i.e., the so they acknowledged this issue but still
number of complaints per 100 or 1,000 proceeded to freeze the limits and cen-
patient visits). terline for comparison purposes. Life is
10. I had the pleasure of working with ­Bonnie full of options!
Schleder, critical care clinical nurse specialist, 13. Because these are individual patients, I
and Karen Stott, ICU registered nurse, on am not revealing the specific factors that
this project. I also want to thank the ICU each patient possessed.
staff for their support of this initiative, 14. There are methods to test the simultaneous
Eileen Yaeger, clinical epidemiologist, effects of multiple variables on a process.
and Lori Pinzon, QI specialist, for data The most appropriate technique when
analysis and technical support. The results conducting QI research is known clas-
of this improvement effort were initially sically as DOE or design of experiments
published in Nursing Management (fall and is frequently used in manufacturing
2003) and served as a foundation for (Sloan, 1997). In healthcare settings, this
other clinical groups to begin their own approach is referred to, however, as PE
initiates to reduce VAPs. The IHI website and is detailed nicely in a book I regularly
provides up-to-date literature and ref- teach from titled Quality Improvement
erences on reducing VAPs (http://www through Planned Experimentation (Moen
.ihi.org/Topics/VAP/Pages/default.aspx). et al., 2012). Other approaches that might
11. I want to thank Susan Herrmann, RN, be considered involve the development
MS, clinical specialist, medical/surgical of causal models (Blalock, 1971) and
unit, and Marjie Schoolfield, RN, BSN, multiple regression models (Kerlinger &
OCN, at Northwestern Medicine Delnor Pedhazur, 1973; Namboodiri, Carter, &
Hospital (Geneva, Illinois) and William Blalock, 1975).
Peters, IHI IA for sharing the context 15. I am indebted to Mr. Mike Taigman,
and data for this case study. They have improvement guide for FirstWatch
been very vigilant in applying quality in Encinitas, California for initially
measurement principles and tools to developing the idea for this case study.
this initiative. They have also been gen- Mr. Taigman is a graduate of the IHI
erous in devoting time to explain to me Improvement Advisor Professional
the steps they have taken to track their Development Program and has been a
indicators, interpret the charts, and make champion of statistical thinking for years.
improvements. William Peters did a very I appreciate his willingness to share this
good job of constructing the charts and important story with me and work with
partitioning the data to demonstrate the him on this important topic.
points I wanted to make with this case 16. I want to thank Dr. Amar Shah, associate
study. Congratulations to all of you for medical director and consultant forensic
a job well done. psychiatrist at ELFT for permission to use
12. Technically the centerline and control this case study. I have had the pleasure
limits should not be “frozen” and extended of working with Dr. Shah and his team
if the process that produced these limits at ELFT for over 3 years and it has been
contains special causes. Why? Because a wonderful journey. They have made
the special causes make the process and great progress in a short period of time
the observed variation in the process to create a culture of quality and improve
unstable and, therefore, unpredictable. the lives of those they serve. I am honored
But this team was eager to see how the to be part of their journey.
References 329

References Namboodiri, N., L. Carter, and H. Blalock. Applied Multi-


variate Analysis and Experimental Designs. New York:
Blalock, H., ed. Causal Models in the Social Sciences. Chicago: McGraw-Hill, 1975.
Aldine, 1971. Sloan, D. Using Designed Experiments to Shrink Health Care
Carey, R. Improving Healthcare with Control Charts: Basic and Costs. Milwaukee: Quality Press, 1997.
Advanced SPC Methods and Case Studies. Milwaukee: Tufte, E. The Visual Display of Quantitative Information.
ASQ Press, 2003. Cheshire, CT: Graphic Press, 1983.
Kerlinger, F., and E. Pedhazur. Multiple Regression in Behavioral Tufte, E. Envisioning Information. Cheshire, CT: Graphic
Research. New York: Holt, Rinehart and Winston, 1973. Press, 1990.
Moen, R., T. Nolan, and L. Provost. Quality Improvement
through Planned Experimentation, 3rd ed. New York:
McGraw-Hill, 2012.
CHAPTER 11
Connecting the Dots
“Transformation is required to move out of the present state, metamorphosis, not mere patchwork on
the present state of management.”
—W. Edwards Deming, The New Economics, 1994: 123

I
n Chapter 2, I introduced the importance of (the dots) that affect their individual and organi-
connecting the dots in order to predict what zational performance and then make the linkages
the collective distribution of data points (i.e., between these dots, they will not only be able to
dots) is trying to tell you. Most of us remember adapt to the myriad of changes facing the health
being engaged with the connect-the-dots activity and social services industries but they will also
as children. It was a great way to help us see the be able to proactively harness these changes and
relationship of apparently disparate dots and manage them to their benefit.1 We need to move
the image that can emerge if we connect the away from the old fragmented ways of thinking
dots correctly. As adults, it might not be a bad and begin to have serious dialogue about the level
idea to return to those early years and practice of transformation needed to achieve the new
connecting the dots once again. Frequently we state of management that Dr. Deming describes
fail to connect all the dots and begin making in the New Economics (1994).
connections only between selected dots that I believe there are four key activities that
confirm our own view of reality. As a result, this will prove beneficial for leaders interested in
leads us to not only reinforcing our own theory working on the transformation Deming discusses:
of knowledge, but also it provides an incomplete
■■ Adopting quality as a business strategy
view of the world and subsequently leads us to
■■ Developing a learning system to support
make the wrong conclusions.
improvement
Just as we need to connect all the dots to
■■ Linking measurement to improvement
make sense of the variation in a set of data, we
■■ Building capacity and capability for
also need to connect the dots at the organizational
improvement
level in order to build organizational excellence.
If leaders, managers, and staff all take time The remainder of this final chapter briefly
to deliberately think about the various factors addresses each of these four activities.

© Michal Steflovic/Shutterstock

331
332 Chapter 11 Connecting the Dots

▸▸ Adopting Quality as Improve quality

a Business Strategy Cost decrease due to less


Dr. Deming was well known for asking leaders rework, fewer errors and
mistakes, better efficiency
short yet very challenging questions:
■■ What business are you in?
■■ By what method will you improve your Productivity and
business? morale improve
■■ Will best efforts bring improvement?
■■ What is management’s job?
■■ Who determines quality? Attract more customers

On this last point he was very clear: “Quality


is determined by the top management. It cannot Stay in business
be delegated” (Deming, 1994, p. 17). What he
observed in many organizations, however, was
the lack of constancy of purpose, delegation of Provide more jobs
quality to a department or a few individuals, and
management styles that were in his opinion “the
biggest producer of waste, causing huge losses Add value to society
whose magnitudes cannot be evaluated, cannot
be measured” (Deming, 1994, p. 22). Dr. Deming’s FIGURE 11-1 An adaptation of Deming’s chain
14 Points for Management (1992) were developed reaction
as a strategy for management to avoid what he
called the “heavy losses” and stay in business, organization will be more appealing to existing and
protect jobs, and add value to society. In the early new customers, stay in business for the long run,
1950s, while working in Japan with members of generate additional employment opportunities,
the Union of Japanese Scientists and Engineers and most important, add value to society. Now
(JUSE), he proposed what he called the “chain initially, Deming developed his chain reaction for
reaction” to show the logical consequences of manufacturing industries. But, given the increasing
transforming management and making quality demands for health and social service industries
the organization’s overarching strategy for the to add value for the money being spent on these
long run. I have adapted the wording he used in services, Deming’s chain reaction provides a logical
his lectures and writings to explain his thinking starting point to think about the role of quality
behind the chain reaction (see, e.g., Deming, improvement (QI) within an organization.
1992, p. 3) to make it more visual and relevant Although the chain reaction notion provides
to health and social service organizations. My a good starting point to think about the logical
adaptation is shown in FIGURE 11-1. Notice that sequence of events required in order for an or-
this chain reaction begins with improving quality. ganization to stay in business and add value to
It does not begin with “hit the targets” or “meet society, what is needed is a way to move beyond a
financial goals.” Deming’s logic was quite clear. conceptual model and evaluate the specific activities
By improving quality there will be reductions in and behaviors that leaders need to support in order
costs because the organization is more efficient to make quality their central business strategy.
and produces fewer errors and less waste. This In 1985, such a framework began to emerge.
in turn leads to increased productivity, greater Members of Associates in Process Improvement
staff engagement, and better morale among the (API)2 began to develop a template designed
workforce. The result of this causal chain is that the to help leaders incorporate the philosophy and
Adopting Quality as a Business Strategy 333

concepts taught by Dr. Deming into the way 2. Understanding the organization as
they managed their organizations. In 1987, they a system
formalized this framework and named it Quality 3. Designing and managing a system for
as a Business Strategy (QBS). The three basic gathering information for improvement
principles behind the development of QBS are 4. Conducting planning for improve-
that an organization needs to: ment and integrating it with business
planning
1. Establish a foundation of continuously
5. Managing and learning from a
matching products and services
portfolio of improvement initiatives
to a defined need of the organi-
zation and its customers through These five activities can be traced directly
design and redesign of processes, to Dr. Deming’s classic diagram of production
products, and services viewed as a system shown in FIGURE 11-2 and
2. Perform as a system to achieve this described by Dr. Deming in both Out of the
matching of products and services Crisis (1992) and The New Economics (1994). This
with the defined needs as the targets diagram was also used initially by Dr. Deming in
or goals of the organization 1950 at a conference in Japan. He developed it to
3. Maintain a set of methods to ensure depict how production of products or services
that changes result in real improve- needs to be viewed as an interrelated system
ments to the organization3 that requires generation of new ideas, product
design, production inputs (e.g., supplies, mate-
The details behind QBS can be found in the
rials, and equipment), creation and delivery of
API publication Quality as a Business Strategy
the product or service, and customer feedback
(API, 1998) and Norman (2007). In this books,
on whether the product or service met their
and their related publications (Langley et al., 1996,
expectations. Although the initial focus and
2009), they describe the five activities that leaders
intent of Figure 11-2 was on helping Japanese
need to carry out in order to make QBS a reality:4
leaders understand manufacturing production
1. Establishing constancy of purpose Dr. Deming was quick to point out, “In a service
in the organization (mission, vision, organization, the sources A, B, C, etc. could
and values) be sources of data, or work from proceeding

Stage 0:
Generation
of ideas

Design Consumer
and research
Suppliers of redesign
Consumers
materials
Receipt and
and equipment
test of
A materials Distribution

B Production, assembly, inspection

D Tests of processes,
machines, methods,
costs

FIGURE 11-2 Deming’s production viewed as a system


Deming, W. Edwards. The New Economics for Industry, Government, Education, second edition, figure, page 58, © 2000 Massachusetts Institute of Technology, by permission of The MIT Press.
334 Chapter 11 Connecting the Dots

operations, such as charges (as in a department ■■ Where do the topics of quality, safety, and
store), calculations of charges, deposits, with- the patient/family experience fit into your
drawals, inventories in and out, transcriptions, board and senior leader meeting agendas?
shipping orders and the like” (1994, p. 58). In ■■ Are quality, safety, and patient-centered
Out of the Crisis Dr. Deming directly addressed care the topics at the top of your agenda
quality and productivity in service organization or do they appear after the discussions
in detail. On pages 203 to 205 he offers “sugges- about finance, budget, market share, and
tions on study of performance in a hospital.”5 resourcing have occurred?
As we have used Figure 11-2 with our vari- ■■ Do you start each senior leader or board
ous strategic partners, we have clarified further meeting with a quality, safety, or customer
that “production” can easily be translated into experience story?
the diagnosis and treatment of patients, surgical ■■ How much of senior leadership’s time is
procedures, laboratory and pharmacy services, devoted to discussions around quality,
home care services, rehabilitation services, mental safety, and the patient/family experience?
health care, and a variety of educational and other From the following list of questions the ones
social care services. In 2009, the API team created that always cause a pause in the dialogue are the
an adaptation of Figure 11-2 to make the diagram ones that explore the priority of financial issues
even more relevant to encompass service indus- against the priority placed on quality-related issues.
tries (Langley et al., 2009). All work, regardless of
the product or service produced, is a process or a ■■ How involved are you with the organiza-
series of processes that create the system. tion’s finances?
When we are invited to conduct site visits to ■■ Do you delegate responsibility for the finan-
assess an organization’s commitment to making cial performance of the organization to a
quality a central operating strategy at some point department as you do quality and have no
in the visit we will ask the leaders if they are further involvement with the financial issues?
adopting and working on the five QBS activities. ■■ Do you spend at least equal time in board and
Many leaders say they have never even heard of senior management meetings discussing
QBS let alone adopting the five activities. The quality issues and financial issues?
ensuing dialogue is always interesting. This is ■■ If not, which topics dominate your leader-
where the issues about defining quality start to ship discussions?
come out. A typical response from leaders will As you can imagine, these questions create an
often be this: “Oh sure, we are very committed interesting exchange. To facilitate this exchange
to quality. We have three people working in our we refer to the QBS five activities for leaders and
Quality Department. They handle everything ask them where they stand with respect to each
from Department of Health regulations to the one. Deming in the New Economics (1994) was
Joint Commission accreditation process as well as quite blunt about where he felt leadership of the
tracking all of our improvement projects. We’ve transformation stands. He did not feel that the
delegated the quality work to this department transformation of overall business strategy and
and they are doing a very good job for us. Yes, planning has penetrated the ranks of management.
we are very committed to quality.” He wrote: “Ninety-five percent of changes made by
After listening attentively to responses like management today make no improvement” (Dem-
this we point out that quality is not a department ing, 1994, p. 38). He maintained that management
or something to delegate to a department or spends too much time focusing on what he called
selected individuals (Lloyd, 2016). We then ask “unique processes that produce figures” and not
the senior leaders questions such as: enough time on the organization’s overall business
strategy or company-wide systems that influence
■■ How involved are you in making quality a the organization’s future. He concluded that even
central strategy for the organization? though management spends more time on the
Adopting Quality as a Business Strategy 335

unique processes that produce figures, this rather ■■ A key point when interpreting the re-
singular focus still would produce only about 3% sults is to focus on the alignment of the
success in meeting organizational long-term goals. response patterns rather than the percent-
To help leaders in establishing a dialogue age of respondents selecting any given
on QBS, I developed the brief QBS Assessment response option. For example, if 50% of
Tool shown in EXHIBIT 11-1. In order to establish the respondents select “Not started yet” for
the foundation for dialogue among your own activity 1 (establishing & communicating
senior leaders about QBS, I would recommend the purpose of the organization) and the
the following steps: remaining 50% selected “Completed,”
then there is poor alignment within the
■■ Distribute the QBS self-assessment to leadership team on this particular item.
senior leaders. A discussion should occur as to why half the
■■ Ask them to complete the assessment and group selected the one end of the response
send it back to an individual for tabulation, spectrum and the other half selected the
preferably in a graphic format. other end. For leaders, alignment around a
■■ Schedule a meeting of the senior leadership particular activity is more important than
team to discuss the results. the absolute percentages. Remember that
■■ Be sure to have a designated individual to the data should be used for learning and
serve as facilitator. dialogue not judgment.

EXHIBIT 11-1 Quality as a business strategy assessment tool


Instructions: For each of the Five Quality as a Business Strategy (QBS) Activities listed here select the
one response that best captures the current status of the activity within your organization.

Response Options
Quality as a Business Strategy
Five Activities for Leaders 1 2 3

1. Establishing constancy of purpose in the organization


(mission, vision, and values)

2. Viewing the organization as a system

3. Designing and managing a system for gathering


information for improvement

4. Conducting planning for improvement and integrating it


with business planning

5. Managing and learning from a portfolio of improvement


initiatives

Response Options
1= Not started yet 2 = In progress 3= Completed and firmly in place
336 Chapter 11 Connecting the Dots

In summary, QBS can (and many of us argue 3. Building knowledge


should) serve as an overarching framework for 4. The human side of change
any organization. A caution is in order, however. These points are detailed nicely by Deming
According to API (1998, p. 3), “The QBS template in his classic works (1992, 1994). Other good
cannot simply be installed or implemented like a summaries may be found in Langley et al. (2009),
new computer system in an organization. There is Schultz (1994), and Scherkenbach (1991). A very
a need for knowledgeable leadership to carry out good short summary of profound knowledge
the strategy and make it successful. Five activities to was written by Moen and Norman in 2016. Their
be led by the top management of the organization article, “Always Applicable: Deming’s System
provide the structure to begin working on making of Profound Knowledge Remains Relevant for
quality a business strategy. The five activities form Management and Quality Professionals Today”
a system for the leaders of an organization to focus not only provides an excellent overview of Dem-
their learning, planning, and actions.” ing’s system of profound knowledge but also
describes why it provides an “essential framework
▸▸ Developing a Learning for management and quality professionals to
understand the messiness of the business world
System to Support and develop productive paths forward” (p. 27).
Deming argued that a system cannot under-
Improvement stand itself. He said that in order to understand
the system you want to improve you need to
Systems’ thinking is not only a core activity of have an outside view. People involved with the
QBS but it also forms the foundation of what system are too close to it and involved to a level
Deming called a SoPK.6 He detailed four key of detail that makes it quite difficult for them to
components that allow individuals to gain a clear objectively see all the interrelated and intercon-
picture of the system they are trying to improve:7 nected parts of the system. He offered his system
1. Appreciation for a system of profound knowledge (SoPK) as a “lens” that
2. Understanding variation when looked through and applied can provide

Appreciation
of a system

Theory of Human
knowledge Behaviour

Understanding
variation

FIGURE 11-3 Looking through the lens of profound knowledge


© Tom Wang/Shutterstock
Developing a Learning System to Support Improvement 337

all members of the organization with a more


clear and comprehensive understanding of the
Understanding Variation
system they are trying to transform. FIGURE 11-3 This has been a central theme of this book. But
is my depiction of the lens Deming referenced. suffice it to say that variation exists. Period! It is
Notice that the four components are all inter- critical, therefore, that if you are truly interested
connected. Deming pointed out, “One need not in improvement you need to be able to:
be eminent in any part nor in all four parts in ■■ Distinguish enumerative from analytic
order to understand it and apply it” (1994, p. 93). studies (Chapter 2)
What he stressed as essential, however, was the ■■ Identify common cause variation from
interrelated nature of the four components. A special cause variation (Chapter 7)
brief overview of each of the four components ■■ Build skills in the construction and inter-
of profound knowledge is provided next. pretation of run charts (Chapter 8) and
Shewhart charts (Chapter 9)
Make the correct decision when you identify
Appreciation of a System ■■
common and special causes of variation
For Deming, this was the starting point for (Chapter 7)
developing deep insight about an organization.
His production viewed as a system (Figure 11-2) If a run or Shewhart chart exhibits common
served as the starting point. The literature on sys- cause variation then it is reflecting a process
tems thinking is very rich and besides Deming’s that is stable and therefore predictable. But the
views on this topic (1992, 1994) readers would performance of the stable process must now be
do well to become familiar with the works of placed against a second question: Is the process
Ackoff (1981); Argyris (1990); Churchman (1968); capable of meeting the target or goal that the
Drucker (1973); Forrester (1971, 1986); Nelson, improvement team has established? Quality
Batalden, and Godfrey (2007); Senge (1990); Senge, improvement is about understanding stability,
Roberts, Ross, Smith, and Kleiner (1994); and von instability (special causes), and capability. This
Bertalanffy (1968). The central theme that runs can be achieved only by applying statistical
throughout all these works is that a system is es- process control (SPC) methods.
sentially an interdependent series of functions and
activities that bring together people, equipment,
methods, processes, and procedures to achieve Building Knowledge
a common purpose or aim. The key word that Originally, Deming referred to this component
characterizes all the writing about appreciation as Theory of Knowledge. Langley et al. (2009)
of a system is interdependence of the parts. Think changed this to Building Knowledge, which I
about this from a medical perspective. When cli- believe is a more practical label. Essentially what
nicians are in training they learn about the body Deming was conveying with this component is
as a system. There is the circulatory system, the that we all have theories as to how and why things
musculoskeletal system, the nervous system, the occur. Theories lead to predictions, predictions
respiratory system, and so on. Systems thinking is lead to a test, and the results of a test, whether
a fundamental aspect of medical school training. successful or a failed test, lead to new knowledge.
You would think, therefore, that it would be an For example, I come up with a theory that if I
easy transition to apply system thinking to how alter my route to work and take a different com-
healthcare organizations function. But this is bination of roads I will reduce my drive time. I
frequently not the case. When only part of the have a theory that leads to a prediction. I test the
system is made to function efficiently or effectively new route and record the number of minutes this
other parts of the system will be compromised. alternative route takes to get me to work. Did it
Deming called this “suboptimizing the system.” take more time, less, or the same amount of time
338 Chapter 11 Connecting the Dots

as my traditional route? I had a theory, made a not spend an equivalent amount of time dealing
prediction, and then tested this prediction against with the interaction and motivation of people.
the data I record. Deming captured this sequence I really find this interesting especially because
when he wrote, “Without theory, experience has we are involved in an industry that is classified
no meaning. Without theory one has no questions as a service. We are in the people business but
to ask. Hence without theory there is no learning” all too often we pay little or no attention to the
(Deming, 1994, p. 103). Building knowledge, people involved in delivering and receiving care.
therefore, is a deliberate and iterative process Leaders and managers need to dedicate time to
that involves inductive and deductive approaches finding out what factors are most important to
to learning (Langley et al., 2009). Do you and their individual staff members. Each person is
the leaders within your organization regularly motivated by different things. Some individuals
explore the theories you have about how and are more extrinsically motivated whereas others
why things occur? Do you establish predictions are more intrinsically motivated. If you have not
about your theories? Do you build knowledge or read Herzberg’s (2003) classic article “One More
make judgments? Time: How Do You Motivate Employees?” you
should. He succinctly lays out the factors and
events on the job that lead to extreme dissat-
The Human Side of Change isfaction (hygiene factors) and those that lead
Deming originally labeled this component of the to extreme satisfaction (intrinsic motivators).
lens of profound knowledge as psychology. The What do you think are the most satisfying as-
API authors (Langley et al., 2009) modified this to pects of a person’s job? Herzberg’s cross-industry
be the human side of change to give it a broader and cross-country investigations have clearly
meaning. The human side of change involves the documented that the top six factors leading
points about psychology that Deming emphasized, to job satisfaction include, in descending rank
but it also addresses issues of social psychology, order, achievement, recognition, the work itself,
group dynamics, conflict resolution, intrinsic responsibility, advancement, and growth. Six
and extrinsic motivation, and systems of hiring, factors that lead to extreme job dissatisfaction in
performance review, and rewards. Understanding descending rank order, include, company policy
and embracing the human side of change will and administration, supervision, relationship
help leaders realize more fully that individuals with supervisor, work conditions, salary, and
and groups have a wide variety of needs and ex- relationship with peers.8 In addition to the work
pectations as Herzberg detailed in his 2003 article of Herzberg, readers interested in learning more
on intrinsic and extrinsic motivation. When we about intrinsic and extrinsic motivation should
teach our yearlong IA Professional Development become familiar with the works of Alfie Kohn.
Program many of the participants struggle more Kohn’s first book, No Contest: The Case Against
with this component of profound knowledge Competition (1986) traces the history of com-
than with any of the other three. Although they petition and argues that “healthy competition”
become concerned when we start sessions on is a contradiction in terms. His second book,
Shewhart charts, statistical analysis, and planned Punished by Rewards: The Trouble with Gold
experimentation, they actually become nervous Stars, Incentive Plans, A’s, Praise and Other Bribes
when we ask them to have a difficult conversation (1993) speaks much more directly to negative
with an individual they are having a challenge consequences of extrinsic motivation. Deming
with at work. The human side of change is very praised Kohn for highlighting the destructive
difficult and is frequently an area in which staff forces of life (1994) and how they contribute to
and managers are not given adequate training and the “heavy losses” within an organization. Kohn’s
development. We focus on filling out forms and work has not been without its critics. His work
following correct technical procedures but we do is controversial but it offers provocations that
Linking Measurement to Improvement 339

leaders and managers need to consider seriously. a small group of individuals who will be the
In summary, the human side of change poses a deep knowledge experts in the QMJ milestones,
major challenge for many leaders and managers. you should be taking steps to build knowledge
Listening, not just hearing but really listening to in all employees about the nature of data for
employees in order to find out what motivates improvement.
and demotivates them, is essential not only for To assist you in determining the level of
good management but is also critical to the over- knowledge and your ability to apply the mile-
all success of the organization. But this requires stones in the QMJ, I developed the Measurement
that leaders and managers must have an open Self-Assessment tool shown in EXHIBIT 11-2. This
and honest dialogue with each individual they self-assessment tool provides five rank-order
supervise to find out what really matters to each responses. Read the description of each response
person and how they as managers can contribute option (A through E) and then select the one
to creating joy in work for each employee. response that best describes your current level
This brief review of the four components of knowledge and skill related to each item. You
provides a start. I strongly recommend building should administer this assessment tool to indi-
a deeper understanding of Deming’s think- viduals designated to serve as IAs, coaches, and
ing on this subject if you are serious about improvement team members. There are no right
making quality a reality within your organization. or wrong answers, only an opportunity to gain
a better understanding of where you and your
colleagues personally stand with respect to the
▸▸ Linking Measurement milestones in the QMJ. For example, what would
your reaction be if you had to explain why it is
to Improvement preferable to plot data over time rather than using
aggregated statistics and tests of significance?
Data without a context for action is waste of Can you construct a run chart or help a team
time, effort, and money. Yet, considerable data decide which measure is more appropriate for
are collected within healthcare and social care their project? You may not be asked to do all of
settings that have no direct connection to actions the things listed in this assessment today or even
aimed at improvement. Staff time and effort are next week, but, if you are facilitating a QI team or
resources not to be wasted. Therefore, a deliberate expect to be able to demonstrate improvement,
plan needs to be established that: sooner or later these questions will be posed. How
■■ Periodically evaluates the relevance and will you deal with them? The place to start is to be
utility of the indicators honest with yourself and see how much you know
■■ Regularly reviews the amount, frequency, about concepts and methods related to the QMJ.
and duration of the data being collected on Once you have had this period of self-reflection,
the various indicators you will be ready to develop a learning plan for
■■ Assigns responsibility for turning the data yourself and those involved with improvement
into useful information for decision mak- to close the gap between where you are currently
ing, including understanding common and in your QMJ and where you would like to be.
special causes of variation As you build new knowledge of and
skills with the milestones in the QMJ, it is
Linking your measurement efforts to critical to be cognizant of the fact that data
improvement begins with understanding the analysis and making run or Shewhart charts
milestones in the quality measurement journey can become a very captivating activity. I have
(QMJ) and then building skills throughout the seen individuals become very involved in
organization that allow these milestones to be the physical process of collecting data and
part of daily work. Although you may identify making charts. The tactile stimulation gained
EXHIBIT 11-2 Measurement self-assessment
340

Response Options
Measurement Topic or Skill A B C D E
Help people in my organization determine why they are measuring (improvement, judgment, or research)

Move teams from concepts to specific quantifiable measures

Building clear and unambiguous operational definitions for measures

Develop data collection plans (including stratification and sampling strategies)


Chapter 11 Connecting the Dots

Explain why plotting data over time (dynamic display) is preferable to using aggregated data and summary
statistics (static display)

Explain the differences between random and nonrandom variations

Construct run charts (including locating the median)

Explain the reasoning behind the run chart rules

Interpret run charts by applying the run chart rules

Explain the statistical theory behind Shewhart control charts (e.g., sigma limits, zones, special cause rules)

Describe the basic seven Shewhart charts and when to use each one

Help teams select the most appropriate Shewhart chart for their measures

Describe the rules for special cause variation on a Shewhart chart

Help teams link measurement to their improvement efforts

A. I’d definitely have to call in an outside expert to explain and apply this topic/method.
B. I’m not sure I could apply this appropriately to a project.
C. I am familiar with this topic but would have to study it further before applying it to a project.
D. I have knowledge about this topic, could apply it to a project, but would not want to be asked to teach it to others.
E. I consider myself an expert in this area, could apply it easily to a project, and could teach this topic/method to others.
Linking Measurement to Improvement 341

by entering commands on a keyboard and ■■ What changes can we make that will result in
watching a multicolored charts print out can improvement? This question challenges a team
give one a sense of accomplishment. But, the to develop ideas that they believe will lead
production of charts or statistical analysis is to improvement and achieve the stated aim.
not the end objective. The charts do not tell
you (1) the reasons(s) for a special cause, These questions guide a team’s journey
(2) whether or not a common cause process and should be reviewed and evaluated contin-
should be improved (i.e., is the performance of uously. It is also important to realize that the
the process capable?), or (3) how the process three questions can be addressed in any order.
should actually be improved or redesigned. To For example, an improvement project could be
accomplish these aims you need a framework or initiated because several nurses while reviewing
roadmap to guide your overall quality journey. their unit’s data discover that the number of
There are many frameworks that can provide falls has been increasing over the past 2 weeks.
a roadmap for your improvement journey.9 This could lead to developing an aim statement
At the Institute for Healthcare Improvement to reduce the number of falls. Alternatively, a
(IHI) the framework we use to frame and drive staff member might have attended a conference
improvement is the Model for Improvement where she heard a presentation on a new idea
(MFI) developed by Langley et al. (2009). My that was successful at improving medication
adaptation of the MFI, shown in FIGURE 11-4, reconciliation at time of discharge. Now she
consists of three questions combined with the has an idea that leads her to explore her unit’s
Plan–Do–Study–Act (PDSA) cycle. The three current percentage of med reconciliation at
fundamental questions are: time of discharge, which in turn leads to writing
an aim statement for improvement. The three
■■ What are we trying to accomplish? This is questions, therefore, should not be viewed as a
the team’s aim statement. How good do linear progress (i.e., first, develop an aim, then
they want to be and by when do they plan gather some data, and finally, develop ideas for
to achieve this result? improvement). Instead, they should be viewed
■■ How will we know that a change is an im- as a dynamic set of interrelated questions.
provement? This is the measurement question The second part of the MFI is the PDSA cycle.
and the central theme of this text. Whereas the three questions provide guidance
and checks to help a team maintain its direction
and stay on course, the PDSA cycle provides the
logical path for testing the ideas that the team
Aim believes will lead to improvement. The PDSA
cycle has a rich history that has been described
and analyzed by many writers, including Langley
et al. (2009), Schultz (1994), and Scherkenbach
(1991). An excellent overview of the history of
PDSA cycle the PDSA cycle was written by Moen and Nor-
man (2010). They not only trace the evolution
Ideas Measures of the scientific method and how it provides
the foundation for the PDSA cycle but they
also provide a very easy to follow historical
journey from the creation of the Shewhart cycle
FIGURE 11-4 The model for improvement to the emergence of Deming wheel, the Japanese
Reprinted by permission of Robert Lloyd. development of the PDCA cycle (Plan, Do,
342 Chapter 11 Connecting the Dots

Check, Act), and the more contemporary term, what the letters PDSA stand for. What we have
the PDSA cycle. FIGURE 11-5 provides a summary not achieved so far, however, is getting a majority
of the key activities involved with each step in of healthcare professionals to embrace and then
the PDSA cycle. apply the PDSA cycle to daily work. There are
One of the things I find most interesting a number of very good examples of how the
about teaching the PDSA cycle is how many PDSA cycle has been successful in improving
people know what the letters PDSA stand for. quality, reducing cost, and enhancing the cus-
I have conducted workshops and seminars with tomer experience but there is still considerable
large audiences and say: “If you know what the work to be done to make this the way healthcare
letters PDSA mean please stand up.” Consistently, professionals approach improvement.11 Sending
nearly the entire audience will stand.10 Then I out more policy statements, edicts, arbitrary
ask people to remain standing if they have run targets, and goals or demanding that staff attend
a PDSA test in the last month. At this point an in-service training session will not achieve
about 75% of the audience sits down. Then, I the levels of excellence most organizations
ask the 25% still standing to remain standing and governments envision or desire. What is
if they have run a PDSA in the past 2 weeks. required is a focused approach that creates and
This now gives me about 10% of the audience maintains the will, ideas, and execution skills
standing. Finally, I ask this small group to remain needed to achieve excellence.
standing if they have run a PDSA in the week When working with teams I frequently
prior to coming to the workshop. Regardless of find that they think that if they “do” a PDSA
the size of the audience I typically end up with cycle once then they are done with testing. It
only 1–2% of the participants actually having is then time to implement the new idea and
run a PDSA test the previous week. My point then spread it to all other locations within the
is that in the health services field we have been system. This is a fundamental flaw in execution
reasonably successful in getting people to know as well as strategy. The sequence of improve-
ment is shown in FIGURE 11-6. Throughout
this sequence the PDSA cycle and data are
used to test a variety of ideas under different
What will happen
if we try something conditions and to gain new knowledge about
different? the improvement idea. All improvement begins
What's next?
ACT PLAN with an idea that should be initially tested
• Test again? • Objective on a small scale, preferably in a pilot unit
• Test a new idea? • Questions & or ward. Then as shown in Figure 11-6, this
• Compare theory predictions testing should be conducted under a variety
to predictions • Plan to carry it our: of conditions to determine how robust and
• What changes need Who? When?
to be made? Where? How? reliable the new idea is. This is where quality
control (QC) should start to come into the
STUDY DO sequence. The new idea for med reconciliation,
• Complete the data • Carry out the plan for example, might have worked well in the
analysis • Collect data pediatric unit but failed in the geriatric unit.
• Compare result to • Document
predictions problems
Why? What was learned about the conditions
• Summarize the • Start data in one unit versus those in the other? Did the
learning analysis new idea not work as well because the patients
are different, the physical layout of the units
Did it work? Let's try it!
is different, or maybe the staff members on
FIGURE 11-5 The components of the PDSA cycle each unit have different levels of knowledge
Linking Measurement to Improvement 343

Make part of Sustaining improvements


routine and spreading changes to
operations other locations
Test under
a variety of Implementing
conditions a change Act Plan
Theory
and Testing a
change Act Study
Plan Do
prediction

Developing Act PlanStudy Do


a change
nce
Act Plan Study Do ue
eq
es
t th
ou
Act Plan Study Do
h
o ug
Study Do
d thr
use
Act Plan

are
Study Do

ta
Da

FIGURE 11-6 The sequence of improvement


East London NHS Foundation Trust

about the science of improvement (SOI)? the chart shows that there has been a shift
Testing under different conditions is where of the outcome indicator in the direction
ideas are refined, adapted, and made more of goodness and it has been sustained for
reliable. This is one of the most critical steps a period of time (e.g., four to five new data
in the improvement sequence. points staying in the general vicinity of the
Implementation differs from testing in shift) then you have a higher degree of belief
several ways. When you are testing a new that the improvement has been sustained and
idea you are (1) trying and adapting existing the conditions for implementation appear to
knowledge on a small scale and (2) learning be in place. On the other hand if you detect
what works in the process or system of inter- a shift in the data in the desired direction but
est and what does not work well. When you the next four or five data points start drifting
move to implementation you are ready to (1) back to the less desirable level of performance,
make the new idea a permanent part of the this is a reasonably strong indication that the
day-to-day operation of the process or system, improvement has not been sustained and that
(2) develop all the support and infrastructure you are not ready for implementation of the
required to maintain the change(s), (3) expect new idea. Implementation requires three ba-
to see sustained improvement with few or no sic conditions to be met: (1) you have a high
failures or back-sliding in performance, and (4) degree of belief that the new idea will in fact
deal with the potential of increased resistance lead to improvement; (2) the cost of failure in
from individuals who may have to adopt new terms of time, effort, morale, and resources is
ways of working or thinking. low; and (3) individuals are ready, willing, and
The data on your run or Shewhart chart eager to embrace the change. If any of these
should also be used to help determine whether three conditions are not present, the chances
you are ready to implement. If, for example, of successful implementation will be greatly
344 Chapter 11 Connecting the Dots

diminished. The Improvement Guide (Langley 5. Require the person and team who
et al., 2009) has a detailed chapter that describes drove the pilot to be responsible for
all the conditions under which implementation system-wide spread!
can be successful. They provide an excellent 6. Look at process and outcome mea-
implementation checklist that should be used sures on a quarterly basis!
to guide a team’s implementation journey. 7. Early on expect marked improvement
The final step in the sequence of improvement in outcomes without attention to
(Figure 11-6) is sustaining the improvements process reliability.
and spreading changes to other locations. Like
The final aspect of linking your measurement
all the previous steps in the sequence, data play
efforts to improvement work comes by assessing
a central role in realizing a successful spread
three organizational drivers: will, ideas, and exe-
effort. This stage also requires running PDSA
cution. These three organizational assets were first
tests to see if the conditions are right to support
proposed by Dr. Tom Nolan to Dr. Don Berwick
spread. The classic reference on the adoption
back in 1998 when Dr. Berwick was preparing
and diffusion (spread) of new ideas was written
his plenary address for the 10th Annual National
by Everett Rogers (1995). At the IHI we have
Forum on Quality Improvement in Health Care
focused extensively on the topic of spread in
(Berwick, 2004). Like many of the other con-
healthcare settings and have produced a variety
cepts and components discussed in this chapter,
of publications that can all be obtained from the
these three characteristics interact to create the
IHI website (www.ihi.org) including a framework
cultural and tactical milieu in which change can
for spread (Barker, Reid, & Schall, 2016; Massoud
either flourish or flounder. FIGURE 11-7 depicts
et al., 2006) and the important role of scale-up
the interaction of these three drivers. For many
in planning a spread initiative (McCannon,
years, this triumvirate of will, ideas, and execution
Schall, & Perla, 2008). These publications will
has served as a fundamental departure point for
help you clarify the nature of spread (i.e., having
assessing the likelihood that an organization will
individuals adopt the new ideas and changes)
be able to achieve excellence. When conducting
and how scale-up (i.e., overcoming the process
diagnostic visits to an organization we regularly
and structural issues that arise during spread)
ask leaders what level of will they have within
is an absolutely crucial component of every
their organization to change. Where do they get
successful spread initiative. Chapter 9 in the
their ideas for improvement and how many new
Improvement Guide (Langley et al., 2009) also
ideas do they generate every week? Finally, are
provides excellent guidance on how to develop
they building within their staff the appropriate
a spread plan and guarantee its success. Finally,
skills to take the ideas they have generated and
you should be aware of the key pitfalls that
execute change?
frequently arise when organizations engage in
TABLE 11-1 provides a very brief assessment
premature spread work. We refer to these pitfalls
tool related to the level of will, ideas, and execu-
endearingly as the “seven spreadly sins.”12 The
tion skills within an organization. You can use
seven spreadly sins include:
this assessment at the organizational level, at
1. Don’t bother testing just start with the department level, or at the team level. If you
large pilots! decide to use this assessment tool I would suggest
2. Find one person willing to do it all! sending it to the designated respondents prior
3. Rely solely on vigilance and hard work! to a meeting and instruct them to send it back
4. If a pilot worked then spread the pilot to you for tabulation. Then present the results
unchanged! After all, if it worked well at the next meeting. The responses provide a
once why won’t it work that way in wonderful opportunity to create a dialogue on
other locations? why the responses were selected. I have used
Linking Measurement to Improvement 345

Having the Will (desire) to change the current state to one that is better

Will

Developing Ideas that will QI Having the capacity to apply CQI


contribute to making processes and theories, tools and techniques that
Ideas Exucution
outcome better enable the Execution of the ideas

FIGURE 11-7 The primary drivers of improvement

TABLE 11-1 The primary drivers of improvement

Key Components* Self-Assessment

■■ Will (to change) ■■ Low Medium High


■■ Ideas ■■ Low Medium High
■■ Execution ■■ Low Medium High

*All three components MUST be viewed together. Focusing on one or even two of the components will guarantee suboptimized performance. Systems
thinking lies at the heart of QI!

this assessment in many countries and have Boren in their classic article “Managing Clinical
consistently discovered that the level of will to Knowledge for Health Care Improvement” (2000).
change is usually overestimated. Responses usu- The authors’ conclusion is a well-known fact
ally are in the medium-high to high categories. within medical circles: “Studies suggest that it
When we explore the reasons for this positive takes an average of 17 years for research evidence
response pattern I get comments like, “Oh yes, to reach clinical practice” (p. 66). Although there
we definitely have a high desire to change and are healthcare organizations that have demon-
make things better” or “Our mission, vision, and strated a reasonably high level of will to change,
values all point us in the direction of changing my experience has led me to conclude that the
things to make them better.” Yet, as we discuss will to change in a majority of healthcare settings
how it is that their results are not near the targets is more accurately in the medium to low end of
or goals they have established, I get responses the responses. When I was conducting this little
like, “Well we have not been able to convince assessment in a workshop one day, I had a fellow
the (fill in the name of the appropriate group of raise his hand when I asked what the level of will
staff or employees) to actually change the way was in their organizations. He wanted to know
they do things.” This is not surprising. Health if the response options included zero or even
care is a fairly conservative, traditional, and very negative numbers. Being realistic about the will
hierarchical industry that is not known for rapid to change is a starting point. Because the desire
change and the quick adoption of new ideas. to improve the current state will most likely vary
This was explored very thoroughly by Balas and within an organization I think it is important
346 Chapter 11 Connecting the Dots

to do this assessment with different groups of capability journey is not a singular event but
staff and also to apply it across departments and rather an ongoing strategic and tactical com-
functions. Building a culture of quality is not a mitment to prepare the organization for the
simple challenge. Therefore, it is critical to begin future. Organizations interested in building
understanding the degree to which the will to capacity and capability need to start with an aim.
change varies within the organization. For example, a straightforward aim to achieve
When I ask what the level of ideas is within organizational excellence might be: To build a
the organization I usually get a medium-high renewable infrastructure that produces highly
to high assessment. I agree with this response reliable quality and safety and customer service
and think it is pretty accurate. Meta studies by (fill in the date). This journey follows a path
have shown that each month over 10,000 that is shown in FIGURE 11-8.
new bits of knowledge and information (e.g.,
books, articles, journals, and blogs and other
web-based content) are disseminated within Are Capacity and Capability
the healthcare industry. This is a huge amount the Same Thing?
of material to digest. There is so much new
It is important to realize that the concepts of ca-
information that many physicians have trouble
pacity and capability are not synonymous terms.
keeping current on their discipline’s journals.
Capacity refers to the following characteristics:
Many organizations actually have a knowledge
management staff that is charged with sorting ■■ The ability to receive, hold, or absorb content
and organizing new information. So, it does and new knowledge
seem reasonable that access to new ideas gets ■■ The maximum or optimum amount of
a high rating. production or output that can be delivered
But, when I get to asking the question ■■ The ability to learn or retain information
about execution skills, I usually get a loud and ■■ A measure of volume; the maximum amount
resounding “low” response. Having the ability of new knowledge that can be held
to blend the will to change with new ideas and ■■ The power, ability, or possibility of doing
produce results requires a special set of execution something or performing
skills and knowledge. These skills are typically Essentially building capacity refers to filling
not taught in schools of medicine or manage- people with the knowledge, methods, and skills
ment. So, it is not surprising that we usually associated with the SOI. It can be considered
have quick consensus that organizations need an initial step in creating the potential for an
to develop and institute strategies and tactics organization to improve but by itself building
aimed at building capacity and capability for
improvement. This leads us to the final of the
four transformational activities.
Excellence

Sustainability
▸▸ Building Capacity Capability

and Capability for Capacity

Improvement
Building capacity and capability for improve-
ment is not accomplished by sending staff to FIGURE 11-8 Milestones in the journey to build
one-off “training” sessions. The capacity and excellence
Building Capacity and Capability for Improvement 347

capacity provides no assurance that improve- everyone in the organization needs to have the
ment thinking and applications will be part of same depth of knowledge about the SOI. To help
the very fabric of daily life within the organi- people understand this concept I have used the
zation. Many organizations make the mistake analogy of how medicines are dosed. When a
of thinking that if they send staff to inservice medicine is prescribed, for example, a critical
training sessions and “fill them up” they have aspect of the effectiveness of the particular
built capability. Wrong! All they have done is medication is the dose that is prescribed. Even
to increase their capacity. Capability, on the for patients receiving the same medication,
other hand, refers to: the dose will differ depending on the specific
condition and needs of the patient. This no-
■■ The power or ability to generate an outcome
tion of dosing also applies to an organization’s
or results
need to build capacity and capability for QI.
■■ The ability to execute a specified course
Not everyone within an organization needs to
of action
have the same “dose” of the SOI. Therefore, it
■■ The sum of expertise and capacity
is incumbent on the leaders of an organization
■■ Knowledge, skill, ability, or characteristics
to have a serious dialogue about the dose of the
associated with desirable performance on
SOI that needs to occur at various levels within
a job, such as problem solving, analytical
their organization. For example, the dose that
thinking, or leadership
board members and senior leaders need will be
■■ Motivation, beliefs, and values about work
different than the dose that middle managers
and the individual’s role in the organization
and supervisors need. The dose of the SOI that
If capacity is focused on filling people with those delivering care at the bedside need to be
the appropriate amount or dose of knowledge effective in their QI efforts will be different from
(i.e., giving them potential) then capability is that which supervisors receive and both of these
releasing this potential energy and turning it doses of knowledge will be different from that
into kinetic energy. That is, allowing people with which is needed for those expected to serve as
knowledge and skills of QI to (1) have protected team coaches and IAs.
time to apply the knowledge they have been given, The specific doses of the SOI are not de-
(2) have access to structures and processes that termined by a strict mathematical formula.
support quality and safety initiatives, and (3) be Instead each organization must be assessed in
part of a learning organization that values ongo- terms of its current understanding of the SOI,
ing development and growth. In a 2010 article, its demonstrated ability to actually improve
Bevan highlighted the critical role of building results, and the existing levels of will, ideas,
capacity and capability: “A focus on building and execution skills within the organization.
capacity and capability for improvement is a key Such diagnostic analysis allows us to develop
strategy. Global analysis of healthcare systems a dosing strategy that is tailor made to each
that deliver outstanding performance in cost and organization. Therefore, the dosing strategy for
quality shows their most common characteristic each organization will be different. When I have
is a systematic approach to capability building helped organizations develop a dosing strategy
for improvement” (p. 139). we typically follow these steps:

■■ Conduct a diagnostic assessment to obtain


Who Needs to Know What? a clear picture of where the organization is
So, how does an organization begin to build in its quality journey (e.g., just beginning,
capacity and capability for improvement? A evolving, or mature) and where it wants to
key point in properly mastering this aspect of be in the next 1, 3, and 5 years. In order to
your quality journey is to first realize that not help organizations during this diagnostic
348 Chapter 11 Connecting the Dots

phase I have developed a number of as- are interested in using any of these tools
sessment tools that can be used at different within your organization you can find out
levels of the organization. The specific as- how to access them by going to Jones and
sessment tool, the target audience for each Bartlett Learning’s Instructor Resources
one and guidance for administering the Center which can be found at the following
tool are summarized in TABLE 11-2. If you link: (Danielle needs to complete this link).

TABLE 11-2 QI assessment tools

Assessment Target
Tool Name Audience Guidance for Administration of the Tool

From the Top Board Members, It is recommended that this survey be administered to the
Non-Execs and target audience(s) at a minimum of twice a year in order
Senior Leaders to gauge progress against a baseline on the 6 things all
boards and senior leaders should do to improve quality
and reduce harm: (1) setting aims to reduce harm this year,
(2) getting data and hearing stories about the impact of
harm, (3) establishing, monitoring and displaying system level
measures, (4) changing the environment, policies and culture,
(5) learning. . .starting with the board, and (6) establishing
executive accountability.

Issues and Board members, This is a shorter version of the first tool. With only 9 items
Priorities for Non-Execs and it will take less time than the first tool to administer but
Healthcare Senior Leaders will still identify a number of key issues that need to be
Leaders addressed if the organization is serious about making QI
part of daily work. This also should be administered at least
twice a year.

Improvement Senior Leaders, This tool has been designed to assess alignment of senior
Capability Directors, Middle and middle management leaders. It involves selecting
Managers and descriptive summaries related to 6 areas: (1) leadership for
Supervisors improvement, (2) results, (3) resources, (4) workforce and
human resources, (5) data infrastructure and management,
and (6) improvement knowledge and competence. For each
of these 6 areas, the tool provides a brief description of levels
of capability, ranging from just beginning, to developing,
to making progress, to significant impact, to exemplary.
Respondents are asked to select the one description that
best describes where they believe the organization is on each
dimension. The tool should be administered separately to
relevant target audiences, tabulated then brought to a joint
meeting to see how well aligned the various target audiences
are around the 6 key areas.

(continues)
Building Capacity and Capability for Improvement 349

TABLE 11-2 QI assessment tools (continued)

Assessment Target
Tool Name Audience Guidance for Administration of the Tool

Science of All staff but This tool has been designed to help individuals gain a better
Improvement especially QI understanding of where they personally stand with respect
Self- team members to their knowledge of the basic principles, tools and methods
Assessment of the Science of Improvement (SOI). It can be administered
Tool at the beginning of a program or workshop designed to
build knowledge and skill sets related to the SOI and then
at the end of the program. It can also be administered
several months after a program offering to determine the
level of retention of SOI knowledge. Six skills to support
improvement are assessed: (1) supporting a change with
data, (2) developing a change, (3) testing a change, (4)
implementing a change, (5) spreading a change, and (6) the
human side of change.

Quality QI team This is the first of two tools designed to assess and
Measurement members, QI individual’s knowledge of and skill with measurement tools
Assessment coaches and and methods. The tool itself is shown in Exhibit 11-2 of this
Tool (short facilitators and QI chapter. It is short and can be administered as a pre-post
version) experts assessment when conducting workshops related to the
quality measurement journey. It can be administered to QI
team members and those expected to actually engage in
quality measurement activities.

Quality QI team This is a much more comprehensive measurement


Measurement members, QI assessment tool. It should not be administered frequently
Assessment coaches and or as often as the shorter version (Exhibit 11-2) and should
Tool (long facilitators and QI be aimed primarily at those expected to function as
version) experts measurement leads or experts for the organization.

■■ Determine the level of commitment that the number of employees and staff into
the board and senior leadership have for relevant strata.
making quality their business strategy. ■■ Establish estimates of the amount of SOI
■■ Identify the total number of employees knowledge and skills the individuals in the
and staff within the organization and then organization currently possess by applying
stratify this total into appropriate categories various assessment tools on the SOI.
(e.g., board members, senior management,
physician leaders, nurses, allied health Once the assessments are made and the
professionals, and support staff). If the organizational demographics are determined
organization is large, has multiple sites of decisions can be made about the appropriate dose
care, and/or covers a broad geographic area of the SOI that needs to be administered to each
we will use these factors to further stratify of the relevant groups. FIGURE 11-9 provides an
350 Chapter 11 Connecting the Dots

Science of improvement Board Sr. Sr. Nurse Admin QI team QI


topic mgmt clinicians mgrs. mgrs. ldrs. experts

History of QI
Profound knowledge

Quality as a business stratey


Model for improvement

PDSA testing
Understanding variation
Scale-up and spread
Construction of control charts

Legend

Minimal Moderate Maximum


dose dose dose
Note that the intensity of the color reflects the “does” of the science of improvement knowledge and skills
that would be administered to each respective group. The mechanisms for administering the allocated dose
would range from the IHI open school to the Improvement Advisor Professional Development Program.

FIGURE 11-9 Dosing the SOI to selected groups within an organization

example of how the dosing notion can be applied realized that there is not a single set of numbers,
to an organization. The column headings identify percentages, or a mathematical formula that can
the relevant groups of individuals within the be used to determine the appropriate doses of the
organization requiring varying does of the SOI. SOI for each group. As we have applied the dosing
The rows indicate selected content areas within approach to a variety of organizations, we have stressed
the SOI and the shades of color indicate the dose that the actual doses will vary depending on a
or intensity (minimal, moderate, or maximum) number of organizational characteristics, including:
of the content being administered to each group.
■■ The number of individuals within each group
Look at the last row in the matrix labeled “con-
■■ The mix of services provided across the
trol chart construction.” Notice that the board,
organization
senior leaders, and senior clinicians would all
■■ The geographic region covered by the
receive a minimal dose of knowledge related to
organization
the selection and construction of control charts.
■■ The amount of development already pro-
Nurse managers and administrative managers
vided on the SOI
need to have a moderate dose on control chart
■■ The resources (time as well as money) com-
construction whereas the QI team leaders and
mitted to learning and employee development
QI experts need to have a more intense dose of
■■ The level of commitment that the board
control chart, construction knowledge.
and senior leadership are providing to make
After the initial decisions have been made
quality the organization’s strategic focus
about who needs to know what, as shown in
Figure 11-9, then we typically proceed to deter- With these conditions in mind, the dosing
mine how to deploy the dosing of knowledge approach will produce different numbers for
to the individuals within each group. It must be different organizations.
Building Capacity and Capability for Improvement 351

FIGURE 11-10 shows the initial 2016 dosing of individuals to be developed within each
strategy I helped the East London Foundation group and a diagram (the triangle) that shows
Trust (ELFT) develop, and FIGURE 11-11 shows the relative size of each group. Notice that in
their updated strategy for 2017. These figures Figure 11-10 the QI experts are around the edge
provide an update on the journey of each of all the other groups in the triangle. This was
identified group, estimates of the numbers done to show that the QI experts are supporting

Where are we?

Estimated number = 3,300


On track to train over
Requirement = introduction to
400 people through 5
quality improvement, identifying
six-month waves of
problems, change ideas,
learning between
testing and measuring
2014-16. First 3 waves
change
delivered with the IHI
Time frame = train 10-20% in 2 years

Experts

Estimated number = 250


On track. All Requirement = deeper Front line staff
senior staff understanding of improvement
being encouraged methodology, measurement and
to join QI training over using data, leading teams
next 2 years in QI
Time frame = train 30-50% in 2 years
Clinical leaders
Exp

er ts
Estimated number = 25-30
New need
er ts

Requirement = deeper

Exp
recognized. Developing Directorate
understanding of improvement improvement
Improvement
methodology, understanding leads
coaches program
variation, coaching teams
will train 30 QI
and individuals
coaches in 2015
Time frame = train 100% in 2 years Executives

Estimated number = 10
On track. Most
Requirement = setting direction
Executives will have
and big goals, executive
undertaken the ISIA,
leadership, oversight of
and all will have
improvement, being a champion,
received Board training
understanding variation to lead
with the non-Executives
Time frame = train 100% in 2 years

Currently have 3 Estimated number = 5


improvement advisors, Requirement = deep statistical
with 1.5wte process control, deep improvement
deployed to QI. methods, effective plans for
Will need to build more implementation & spread
capacity at this level. Time frame = train 100% in 2 years
2016

FIGURE 11-10 East London Foundation Trust 2016 dosing strategy


East London NHS Foundation Trust
352 Chapter 11 Connecting the Dots

Psychology trainees-Pocket QI, embedded into QI project


teams with 4 bespoke learning sessions

Nursing students-intro to QI delivered within undergraduate


and postgrad syllabus, embedded into QI project

upstream
Working
teams during student placements

Estimated number needed


363 completed Pocket
to train = 4,000 Needs =
QI so far. All staff
introduction to QI & systems
receive intro to
thinking, identifying problems,
QI at induction Experts by experience
how to get involved

All staff
690 graduated form Estimated number needed
ISIA in 6 waves. to train = 1,000 Needs =
Wave 7 in 2017-18. Model for improvement, PDSA,
Refresher training for measurement and using Staff involved in or
ISAI grads. data, leading teams leading QI projects

Exp

nce
er ts
Estimated number needed = 50

perie
47 QI coaches trained QI coaches
Needs = deep
by e
so far, with 35 currently

x
understanding of method

by e
active. Third cohort of x
peri
& tools, understanding
20 to be trained in 2017

er ts
variation, coaching teams
enc
Sponsors

Exp
e

58 current sponsors. Needs = Model for Internal


All completed ISIA. improvement, PDSA, experts
(QI leads)
Leadership, scale-up & measurement & variation,
refresher QI scale-up and spread,
training in 2017 leadership for improvement Board

Estimated number needed


Currently have 6 to train = 10 Needs = deep
improvement statistical process control,
advisors, with 3 further deep improvement methods,
QI leads in training effective plans
for implementation & spread

All Executives have Needs = setting direction


completed ISIA. and big goals, executive
Annual Board session leadership, oversight of
with IHI & regular improvement,
Board development understanding variation

Bespoke QI learning
Needs = introduction to
sessions for service
QI, how to get involved
users and careers.
in improving a service, practical
Over 95 attended so
skills in confidence-building,
far. Build into recovery
presentation, contributing ideas
college syllabus 2017

FIGURE 11-11 East London Foundation Trust 2017 dosing strategy


East London NHS Foundation Trust
Building Capacity and Capability for Improvement 353

all the other groups and helping them apply the MFI (What are we trying to accomplish?). In
SOI. In Figure 11-11, however the QI experts constructing an aim statement, therefore, we
are now part of the triangle as a group unto expect it to identify:
themselves. The group surrounding the triangle ■■ The boundaries of the system to be improved
is now called the “Experts by Experience.” This (scope, patient population, processes to
group emerged from the various workshops address, providers, beginning and end, etc.)
we conducted and stepped forward to be local ■■ Specific numerical goals for the outcome
experts in training. They have deepened their measure(s) that are ambitious but achievable
knowledge of the SOI by participating in teams, (How good do you want to be?)
leading improvement projects and working ■■ The time frame (By when do you pan to
with the ELFT IAs to obtain a deeper dose of achieve the numerical goal?)
the SOI. These individuals have become part ■■ Any guidance on issues or circumstances
of the self-sustaining infrastructure that ELFT that may have an impact on the project’s
is building. success or progress (e.g., potential changes
Finally, notice that ELFT has now developed in sponsorship, resource issues, or opera-
10 IAs. These individuals are graduates of our tional issues such as a pending merger or
yearlong IHI Improvement Advisor Professional even construction)
Development Program (IAPDP) and serve as
the full-time deep knowledge experts in the Basic awareness of the organization’s structure
SOI. They support improvement teams, the and format for an aim statement is a minimum
improvement coaches, and managers and also requirement (i.e., a minimal dose). This will al-
provide strategic guidance to senior leaders on low everyone in the organization to understand
the SOI. In short, the ELFT has deployed the an aim statement when they read one or see it
dosing strategy in a very deliberate and effec- presented in a meeting. But team coaches and
tive way. They have viewed improvement as a IAs, on the other hand, need to have a deeper
journey not a short trip. They have developed dose or exposure to building aims from scratch,
a very focused yet flexible strategy to apply the how to conduct a the dialogue around whether
dosing approach as a primary way to build both the aim is reasonable or overly ambitious, and
capacity and capability. how modify the aim in light of the baseline
data collected. The topic of aim statements does
not vary from one group to another. But the
dose does. This same principle applies to all
What Core Skills of the content topics included in learning about
Improvement Get Dosed? the SOI.
The first thing to realize when discussing the The essential skills required to make quality
content being delivered on the SOI is that the thinking and practice the guideposts for how
topics do not differ for the various groups of indi- an organization approaches work consist of the
viduals involved. The content of the SOI remains following content categories:
pretty constant across all the groups within an
organization. What differs, however, is the dose
or intensity of the content being delivered. For QI Philosophy and Theory
example, everyone in the organization needs to ■■ Understanding the differences between
understand how the aims or target conditions quality assurance (QA), QC, and QI (see
for teams and the organization as a whole are Chapter 1)
being developed and how they are phrased. At ■■ Having a firm grounding in the SOI theories
the IHI, we build all team and organizational articulated by Dr. Water Shewhart, Dr. W.
aim statements around the first question in the Edwards Deming, and Dr. Joseph Juran.
354 Chapter 11 Connecting the Dots

■■ Selecting an approach to QI that can serve (2007), and Goal QPC (2008). In addition to
as a roadmap for the organization’s quality these publications, you may be interested in a
journey. Note that more important than the series of Whiteboard and On Demand videos
decision as to which approach or model to I have made that provide overviews of the SOI
QI is selected is what Dr. Deming referred to and QI tools, methods, and concepts. These
as “constancy of purpose.” (Refer to note 9 in videos are all free on the IHI website (www.ihi
this chapter for more details on this point.) .org) or by following these direct links:
■■ Organizations that follow a “flavor of the ■■ R. Lloyd’s Whiteboard videos on tools,
month” approach to QI will not only lack methods and concepts:
direction and focus but will also send very https://tinyurl.com/q4s4pe7
mixed messages to the staff about where ■■ R. Lloyd’s On Demand Videos on the SOI:
the organization is headed. https://tinyurl.com/moxlct3https://tinyurl
.com/knt8cyz
QI Methods https://tinyurl.com/ldg252k
■■ Applying systems thinking to all aspects of At this point many of you are probably
the organization using QI tools in your daily work. Tools such
■■ Viewing all work as an interconnected set as flowcharts, cause and effect diagrams, Pareto
of processes charts, force field analysis, scatter plots, two-way
■■ Using analytic statistical thinking and tables and SPC charts are some of the more
methods to understand the variation that frequently used tools. The tools are usually
lives within the organization’s data. This organized into categories that capture the pur-
requires a fundamental shift away from the pose of the various tools (e.g., understanding
use of enumerative statistical methods (e.g., the system, gathering data and information,
comparing averages or using red, amber, and organizing information, analyzing variation, or
green rating and ranking schemas to make understanding relationships). Many of the QI
conclusions about performance). tools are aimed at making sense of or putting
order into quantitative issues facing a team.
Frequently overlooked, however, are a variety
QI Tools of tools designed to help teams identify ideas
There are a variety of QI tools that need to be for improving a process, tools to assist a team
understood and used to help diagnose, analyze, in deciding which improvement idea(s) they
and drive improvement work. The various QI would like to pursue, and tools for setting
tools are critical to the success of improvement priorities. These tools are aimed at generating
teams but they are not to be used in isolation ideas (divergent thinking) and then helping a
from improvement strategies and action. In- team decide which improvement idea(s) they
dividuals and teams often become enamored wish to pursue (convergent thinking). The tools
with the use of tools such as flowcharting or related to divergent and convergent thinking
cause and effect diagrams with little or no un- are summarized in FIGURE 11-12.
derstanding of when the tools should be used Many teams become overly focused on tools
and, more important, how a particular tool during their QI journey. Like the run and Shewhart
fits into the QI journey. There are many good charts, however, the tools are there to help a team
resources on a wide spectrum of qualitative along their way not be an end in themselves.
and quantitative tools used in QI work. A few Just like each surgical tool or instrument has a
that are quite relevant to health and social care specific purpose so too do the QI tools. Part of
services include publications by Graham and the dosing strategy is to (1) make people aware
Cleary (1992), Murray and Murray (1997), AIP of the tools, (2) know how to construct and
Building Capacity and Capability for Improvement 355

Affinity Diagram which is designed to provide general awareness


Nominal Group Co Multi-voting to large numbers through online modules that
Technique nv

g
my colleagues and I have designed. Many uni-

in
er

nk
Brainstorming ge Rank ordering
versities around the globe have Open School

i
th
nt
th Structured

nt
in discussion Chapters for students enrolled in the health
e
rg
ive kin
g sciences. It is a light dose designed to build
D

Few ideas or None Many Ideas Few Ideas awareness but one that can be applied easily
and broadly. Over a million modules in the
• Teams start with a few ideas for improvement Open School have been accessed to date by
or none.
students and healthcare professionals. When
• They need to engage in divergent thinking to
open up their brains and generate ideas. organizations request the next dose of the
• Once a team generates many ideas, however, SOI we offer face-to-face workshops. Two of
they need to engage in convergent thinking to our flagship programs are in the middle of
reduce the many to the vital few that they
can test. Figure 11-13 and have been identified with a
deeper shade of grey indicating a slightly heavier
FIGURE 11-12 Divergent and convergent thinking dose of the SOI. The Improvement Science
tools in Action (ISIA) Program is a 3-day offering
designed for a maximum of 200 participants.
It provides all the essential SOI concepts, tools,
interpret the tools, and, most important, (3) know and methods needed to initiate improvement
when and under what conditions a specific tool projects. Virtual sessions are incorporated with
should be used. this program so that participants can present
The final part of understanding the core their project updates and receive feedback. The
skills of improvement that need to be dosed Improvement Coach Professional Development
is to develop programs that deliver the right Program (ICPDP) receives the deepest shade
dose of the SOI at the right time. FIGURE 11-13 of color. This program is designed for 30–50
provides an example of different SOI programs participants and provides a deeper dive into the
we offer at the IHI but that all deliver a different human side of change or as we at IHI refer to it
dose of the same content. For example, at the as, the “care and feeding of teams.” This program
top of Figure 11-13 is the IHI Open School, is a 7-day offering, delivered as a 2–3–2 day or

IHI Open School Designed for the masses to build


awareness

• ISIA designed for 100-200


ISIA & ICPDP • ICP designed for 30-50
• Building team and project based skills

IA • Designed for 20-25 participants


• Building deep knowledge in the SOI

FIGURE 11-13 Dosing the content of IHI program offerings


356 Chapter 11 Connecting the Dots

a 3–4 day design. A virtual pre-call session is Healthcare leaders and managers are like farmers
required as well as additional calls between the in this respect. They are charged with creating
actual workshops. During the virtual sessions the conditions under which the delivery and
we provide additional SOI content and the par- improvement of healthcare services can grow
ticipants provide updates on their experiences and flourish.
with coaching a team. The ICPDP is intended A basic premise of the dosing approach is
for individuals who have already attended the the need to have an organization-wide plan that
ISIA program and have a working knowledge allows for dispensing the appropriate “dose” of
of how to apply the SOI to projects and teams. the SOI to the appropriate individuals over a
The deepest and most intense dive into prescribed period of time. But, the initial dose for
the SOI shown in Figure 11-13 is found in the any group of individuals will not be sufficient to
IAPDP. This 12-day workshop is spread out transform the organization. For example, when
over a year of learning with eight of IHI’s senior we administer a medication to a patient we do
faculty. Each participant must have a project that not tell them, “Okay, take one pill twice a day
they are expected to move to the results stage then stop after the first day.” The appropriate dose
of improvement. The program consists of three needs to be administered for the proper period
4-day workshops with 11 virtual sessions spaced of time; otherwise it is not effective. Similarly,
between the workshops. Upon completion of the dose of the SOI does not produce the desired
all the requirements participants are presented results after one application. Whether you are
with a certificate of completion and can add receiving a light, moderate, or heavy dose of the
the title of IHI IA to their resume. SOI it typically needs to be administered multiple
Whether an organization has developed times over a defined period of time. The pitfalls
its own SOI offerings or participates in those to avoid, therefore, when deciding to undertake
provided by outside groups, the key point is the dosing approach include (1) lack of constancy
that you need to have access to a variety of of purpose; (2) lack of leadership commitment;
programs that (1) provide consistent messages (3) lack of a strategy for deploying QI concepts,
about the SOI, (2) are dosed appropriately for methods, and tools throughout the organization;
the intended audience or group of employees, (4) the belief that you can apply one dose of
(3) are well grounded and designed around adult the SOI and it will solve all the organization’s
learning theory, (4) are focused on applying the problems; and (5) failure to connect the SOI
SOI knowledge to actual improvement work, learning to daily work.
and (5) are flexible enough to accommodate In this chapter I have outlined four key
the organization’s strategic and operational activities that I believe will help leaders connect
changes. the dots and greatly enhance their organization’s
chances of demonstrating excellence:
How Do We Create the Conditions ■■ Adopting quality as a business strategy
for a Successful Dosing Strategy? ■■ Developing a learning system to support
improvement
Creating the conditions for change is probably
■■ Linking measurement to improvement
THE most important aspect of the quality jour-
■■ Building capacity and capability for
ney. In agricultural circles there is a great bit
improvement
of guidance that applies here.13 The expression
is that farmers do not grow crops; they create There is no doubt that all health and social
conditions under which crops can grow. For service providers are going to be under increas-
farmers this is determined by finding out how ing pressure to demonstrate greater value and
well they prepare the soil, which then determines improved performance. By connecting the dots
how the seeds they plant will take root and grow. with the suggestions discussed in this chapter,
Notes 357

I sincerely believe that we will be able not just to Langley, Ron Moen, Tom Nolan, and
cope with these pressures but flourish in spite of Lloyd Provost, worked with Dr. Deming
them. As Dr. Deming wrote in Out of the Crisis, at his 4-day seminars throughout the
“Who will survive? Companies that adopt con- 1980s. Cliff Norman and Kevin Nolan
stancy of purpose for quality, productivity, and joined the API team in 1988. API has
service, and go about it with intelligence and been a strategic partner with IHI for over
perseverance, have a chance to survive” (1992, 20 years building knowledge of the SOI
p. 155). But, in every one of his seminars I at- within IHI and within IHI’s customers.
tended he would always throw in an afterthought Additional detail on API can be found at
and say, “Survival is not mandatory!” Our goal http://www.apiweb.org/.
should not be merely to survive but rather to 3. I have modified the language describing
thrive! I hope the information presented in this these three principles slightly from how
book has given you a framework to enhance they appear in the API book Quality as
your journey toward excellence and thriving. a Business Strategy (1998). This was done
only for clarity and style that made the
original wording more compatible with
Notes the style of this book. The essence of the
1. From my perspective the dominant principles remains consistent with that
changes facing the healthcare industry which the API authors initially wrote.
across the world over the next 5 years will 4. Readers interested in the details behind
be centered on the following key topics: each of these activities are encouraged to
• Declining trust in the medical read chapter 13 in the 2009 edition of the
community Improvement Guide and pages 260–292
• More public scrutiny of the healthcare in the 1996 edition (Langley et al., 1996,
profession by people who are not 2009). Those interested in a deeper dive
trained as clinicians into QBS should obtain a copy of Quality
• Increased demand on the part of as a Business Strategy (API. 1998), which
patients and families to become more is available through Process Improvement
involved in making decisions about Products (www.pipproducts.com).
their care and treatment 5. Two good references on the application
• Less autonomy and income for of QI to nonmanufacturing situations
physicians can be found in Quality Is Personal by
• Increased pressure to create safe Harry Roberts and Bernard Sergesketter
environments for the delivery of (1993) and Total Quality Ministry by
medical services Walt Kallestad and Steve Schey (1994).
• Greater demand for transparency The Quality Is Personal book provides
and the release of data related to cost, an excellent discussion on how to apply
quality, service, access, and safety QI thinking and methods, especially
• Increased focus on containing health- checklists, to your own personal growth
care costs and adding value for the and development. As the name suggests
money spent. the Total Quality Ministry book applies
2. API develops methods, works with leaders the total quality management (TQM)
and teams, and provides education and ideas to the operations of churches and
training to help organizations improve ministerial practice. When I worked at
their products and services and to build Lutheran General Health System in Park
their capability for ongoing improve- Ridge, Illinois, I shared this book with
ment. The initial principals of API, Jerry several of the chaplains I worked with.
358 Chapter 11 Connecting the Dots

After reading the book one of the chaplains details can be found in chapter 1 of the
took the initiative to write a brief paper for API book Quality as a Business Strategy
the pastoral care staff titled “Total Quality (1998), which is available through Process
Chaplaincy.” These are all good examples Improvement Products (www.pipproducts.
of how quality thinking and methods can com) and in chapter 5 of Transforming
be applied to any aspect of life. Health Care Leadership by Maccoby,
6. I have had participants in class and even Norman, Norman, and Margolies (2013).
some colleagues who take issue with 8. As you scan these two lists notice where
Deming’s term “profound knowledge.” salary appears in the rankings. First, it
I have had comments such as “Well, is a factor that appears on the extreme
isn’t that profound?” Or, “Profound to dissatisfaction side of the picture. Sec-
whom? It doesn’t seem profound to me!” ond, it falls in the fifth position not the
The term profound seems to stir in some very first dissatisfying factor. So although
folk a feeling that Deming was being too salary is always thought of as a major
academic and even a bit intellectually motivator, Herzberg’s work demonstrates
exclusive. But as Langley et al. (2009, that salary is not quite as important as
p. 75) write: “The word profound denotes many think. I think this finding should
the deep insight that this knowledge offers be of particular interest to healthcare
into how to make changes that will result and social service organizations, which
in improvement in a variety of settings.” are typically not-for-profit organizations.
If I sense that people are either getting Rarely are health and social service
confused with this term or starting to be workers even eligible for stock options,
cynical about its descriptive qualities, I incentives, or bonus plans. These individ-
will merely say that Deming identified uals are usually quite pleased to receive
four key components that are essential an annual cost-of-living adjustment. So,
to making improvement a mainstay the role of money needs to be viewed in
of organizational performance. Then I light of a bigger systems view. According
name the four components. So you will to Deming (1994, p. 113–114), “An award
have to decide how you wish to convey in the form of money for a job done for
this important point that Deming was the sheer pleasure of doing it is demor-
making. What is important is the use of alizing, overjustification. Money, above
the four components. Don’t let the word a certain level, is not enticement. Money
“profound” become a stumbling block to may entice someone that knows that he
learning and change. is inferior. Certainly a boss should give
7. Deming originally called these four a pat on the back for a job well done.”
components (1) appreciation for a system, 9. There are a number of models or ap-
(2) knowledge about variation, (3) theory proaches to guide an organization’s
of knowledge, and (4) psychology. The improvement journey. Besides Deming’s
slight change in the wording of these approach based on his notion of profound
components was offered by Langley et al. knowledge and the API’s MFI, the other
(2009) to try to make the theory more leading approaches or models that have
accessible to others and more descriptive been used in healthcare settings include:
of how the theories are to be applied. • Juran’s Quality Trilogy and its related
Deming was still evolving and revising methods
his SoPK when he passed away in 1993. • The Toyota Production System (TPS)
Thinking and writing about the SoPK was or as it has become known in health-
continued by the API team. Additional care settings Lean Six Sigma and its
Notes 359

derivatives—Define, Measure, Analyze, overwhelming positive response. This


Improve and Control (DMAIC) and was a wonderful response given that it
Design for Six Sigma (DFSS) was in 2007 and we were just getting this
• FOCUS PDSA major countrywide collaboration started.
• Malcolm Baldrige National Quality I looked at my colleagues from the IHI
Award and we all smiled at each other. One of
• European Foundation for Quality my friends even gave me the old “thumbs
Management (EFQM), a.k.a. the up” gesture. All of us thought that this
European Quality Award was a very good sign that they all knew
• International Organization for Stan- what PDSA meant. At the coffee break we
dardization (known as ISO) found out what PDSA actually meant not
Although all of these approaches have only in Scotland but also throughout the
been show to work effectively the key United Kingdom and it was not PDSA.
for me is the issue that Deming kept During lunch, those of us from across
stressing—constancy of purpose. Too the pond were proudly presented by one
many organizations lack constancy of of our Scottish hosts with PDSA pins.
purpose and keep switching models or They were lovely little label pins that in
approaches to improvement. They are bold blue and white letters stated “PDSA.”
all too often looking for quick fixes to We thought this was great. Not only did
solve immediate problems, which most everyone know what PDSA stood for but
often are aligned with budget and finan- they already had pins to help reinforce the
cial maximization not quality, safety, or ideas. Then the presenter of the PDSA
customer experience. Organizations pins asked the audience to shout out what
cannot be successful in the long run if PDSA meant: People’s Dispensary for Sick
they keep changing their approach to Animals! The Yanks learned a good cul-
achieving organizational excellence. tural lesson that day. The rest of the story
Although the various approaches or is that I went to a People’s Dispensary for
models do have common elements, Sick Animals location in Edinburgh and
trying to apply Juran’s quality trilogy made a donation in order to obtain all the
one month, then Lean thinking the PDSA pins they could provide. So, today,
next month and Six Sigma the next not I proudly give out PDSA pins to those I
only creates confusion among the staff mentor and teach. Those from the United
but also creates strategic inconsistency Kingdom ask me why I am giving them a
for the organization. So, as your orga- pin from the People’s Dispensary for Sick
nization reviews and evaluates different Animals. I simply say, “Now really, think
approaches or models for improvement about it for a minute.”
my recommendation is to pick one and 11. A number of the IHI’s strategic partners
stick with it. Modify and adapt the ap- have demonstrated enhanced results by
proach or model you select to meet your applying the MFI. Examples of these
local conditions and strategic objectives organizations include Kaiser Perma-
but don’t keep jumping from one model nente, Cincinnati Children’s Hospital
to another. Only confusion and lack of Medical Center, Virginia Mason, ELFT
consistency will result. (UK), NHS Scotland, Jonkoping County
10. I asked this question years ago at the Council (Sweden), and Hamad Medical
first learning session for the Scottish Corporation (Qatar).
Patient Safety Program collaborative we 12. The term “the seven spreadly sins” emerged
were running. I had a resounding and when we were running the Safer Patients
360 Chapter 11 Connecting the Dots

Initiative (SPI) in the United Kingdom. We Associates in Process Improvement. The Improvement
rotated meetings among England, Wales, Handbook: Model, Methods and Tools for Improve-
ment. Austin, TX: Associates in Process Improvement,
Scotland, and Northern Ireland. During 2007.
one of the sessions in Northern Ireland Argyris, C. Overcoming Organizational Defenses. New York:
we put slides up on the screen to discuss Prentice Hall, 1990.
spread initiatives, with a rather bland title Balas, E., and S. Boren. “Managing Clinical Knowledge for
about the seven things you should not Health Care Improvement.” In Yearbook of Medical
Informatics 2000: Patient-Centered Systems, edited by
do if you want to be successful at spread. J. Bemmel and A. T. McCray, 65–70. Stuttgart, Germany:
A nurse from Ulster hospital who was Schattauer Verlagsgesellschaft mbH, 2000.
looking pensively at our rather bland title Barker, P., A. Reid, and M. Schall. “A Framework for Scaling
offered “the seven spreadly sins” as an up Health Interventions: Lessons from Large-Scale
alternative. We immediately said it was a Improvement Initiatives in Africa.” Implementation Science
11 (2016): 12, Open Access. This article is distributed
brilliant revision. We changed the title on under the terms of the Creative Commons Attribution
the spot and it has been with us ever since. 4.0 International License (http://creativecommons
13. Because my PhD is in agricultural eco­ .org/licenses/by/4.0/).
nomics and rural sociology from the Berwick, D. Escape Fire: Designs for the Future of Health
Pennsylvania State University I frequently Care. San Francisco: Jossey-Bass, 2004.
Bevan, H. “How Can We Build Skills to Transform the
provide a variety of ag stories and refer- Healthcare System?” Journal of Research in Nursing 15,
ences while teaching. The question I get no. 2 (2010): 139–148.
quite often is, “How did you end up in Churchman, C. W. The Systems Approach. Philadelphia: Dell
health care when you come out of an ag Publishing Co., 1968.
background?” Long story short, I have Deming, W. E. Out of the Crisis. Cambridge, MA: MIT
Press, 1992.
done research on rural health issues and Deming, W. E. The New Economics, 2nd ed. Cambridge,
community health. I also have worked MA: MIT Press, 1994.
in the community development field Drucker, P. Management Tasks, Responsibilities, Practices.
with a focus on social well-being and New York: Harper & Row, 1973.
measurement of the impact of social Forrester, J. Principles of Systems. Cambridge, MA: Produc-
tivity Press, 1986.
policy. With no formal degrees in health Forrester, J. “The Counterintuitive Behavior of Social Systems.”
care or medicine I bring what Deming Technology Review 73, no. 3 (1971): 52–68.
called an “outside view.” My colleagues Goal QPC. The Memory Jogger II: Healthcare Edition. Salem,
from Associates in Process Improvement, NH: Goal QPC, 2008.
who taught with Dr. Deming and have Graham, J., and M. Cleary. Practical Tools for Continuous
Improvement: Vols. 1 and 2. Dayton, OH: PQ Systems
written several of the leading books Publishing, 1992.
on the SOI, do not have any degrees in Herzberg, F. “One More Time: How Do You Motivate
health care or medicine either. My point Employees?” Harvard Business Review 81, no.1 (2003):
is that you do not need to have formal 86–96.
degrees in the healthcare field, or any Kallestad, W., and S. Schey. Total Quality Ministry. Minne-
apolis: Augsburg Press, 1994.
other field for that matter, to be able to Kohn, A. No Contest: The Case Against Competition. New
help people in that particular discipline York: Houghton Mifflin, 1986.
make improvements. The SOI can be Kohn, A. Punished by Rewards: The Trouble with Gold Stars,
applied to any discipline. Incentive Plans, A’s, Praise and Other Bribes. New York:
Houghton Mifflin, 1993.
References Langley, G., R. Moen, K. Nolan, T. Nolan, C. Norman, and L.
Provost. The Improvement Guide: A Practical Approach
Ackoff, R. Creating the Corporate Future. New York: John to Enhancing Organizational Performance, 2nd ed.
Wiley & Sons, 1981. San Francisco: Jossey-Bass, 2009.
Associates in Process Improvement, Quality as a Business Lloyd, R. “Improvement Tip: Quality Is Not a Department.”
Strategy. Austin, TX: API-Austin, 1998. Institute for Healthcare Improvement website, 2016.
References 361

http://www.ihi.org/resources/Pages/Improvement Murray, S., and B. Murray. Practical Tools for Healthcare


Stories/ImprovementTipQualityIsNotaDepartment.aspx Quality. Dayton, OH: PQ Systems Publishing, 1997.
Maccoby, M., C. Norman, C. J. Norman, and R. Margolies. Nelson, E., P. Batalden, and M. Godfrey. Quality by Design:
Transforming Health Care Leadership. New York: Wiley A Clinical Microsystems Approach. San Francisco: John
& Sons, 2013. Wiley & Sons, 2007.
Massoud, M. R., G. A. Nielsen, K. Nolan, T. Nolan, M. W. Norman, C. Quality as a Business Strategy: An Overview.
Schall, and C. Sevin. A Framework for Spread: From Austin, TX: Associates in Process Improvement, 2007.
Local Improvements to System-Wide Change. IHI Roberts, H., and B. Sergesketter. Quality Is Personal. New
Innovation Series white paper. Cambridge, MA: Institute York: Free Press, 1993.
for Healthcare Improvement; 2006. http://www.ihi.org Rogers, E. Diffusion of Innovations, 4th ed. New York: Free
/resources/Pages/IHIWhitePapers/default.aspx Press, 1995.
McCannon, J., M. Schall, and R. Perla. Planning for Scale: A Scherkenbach, W. Deming’s Road to Continual Improvement.
Guide for Designing Large-Scale Improvement Initiatives. Knoxville: SPC Press, 1991.
IHI Innovation Series white paper. Cambridge, MA: Schultz, L. Profiles in Quality: Learning from the Masters.
Institute for Healthcare Improvement; 2008. http://www White Plains, NY: Quality Resources, 1994.
.ihi.org/resources/Pages/IHIWhitePapers/default.aspx Senge, P. The Fifth Discipline: The Art and Science of The
Moen, R., and C. Norman. “Circling Back: Clearing Up Learning Organization. New York: Doubleday, 1990.
Myths About The Deming Cycle and Seeing How It Keeps Senge, P., C. Roberts, R. Ross, B. Smith, and A. Kleiner. The
Evolving.” Quality Progress (November 2010): 22–28. Fifth Discipline Fieldbook: Strategies and Tools for Building
Moen, R., and C. Norman. “Always Applicable: Deming’s a Learning Organization. New York: Doubleday, 1994.
System of Profound Knowledge Remains Relevant for von Bertalanffy, L. General System Theory: Foundations,
Management and Quality Professionals Today.” Quality Development, Applications, rev. ed. New York: Penguin
Progress (June 2016): 47–53. University Books, 1968.
Index
A case studies, quality measurement principles
breast cancer patient’s clotting levels, managing, 287–290
cardiovascular event, predicting, 259–265
administrative data, analysis, 2
emergency department fast track, 293–296
aggregate statistics. See data analysis
flash sterilization, 283–285
analytic studies, 29–33, 85, 91
group B streptococcus in pregnant women, 290–293
area of opportunity, Shewhart chart, 231, 233
hospice/911 paramedic system partnership to improve
ASQ Statistics Division, 98
care, 315–317
astronomical data point, 198–199
improving access to community services for mental
attributes data, 226
health and community health patients, 317–326
Austin, C., 160
operational definition of readmission, clarifying, 286
pain management for hip and knee replacement
patients, 306–314
pressure ulcer prevention, 272–275
B reducing ventilator-associated pneumonia, 300–305
sampling central line infections, 265–266
Babbie, E. R., 121 sampling Medicare insurance audits, 267–268
Bader, B., 149 staffing effectiveness, evaluating, 275–283
balanced scorecard, 101, 149 tracking patient complaints, 296–300
Baldrige Award, 18 tracking patient falls, 268–271
bar charts, 235 central tendency, measures of, 169, 207, 216
baseline data, 274, 309 Chambers, D., 203, 213–214, 219–220, 225, 230
benchmarking, 155–157 CL (center line), 211–212
Berman, H., 149 Cleary, M., 158, 354
bias, sampling, 115 cluster sampling, 119–120
binomial distribution, 240 Codman, E. A., 2–4, 6, 12, 14, 16
Blalock, H., 30 Coffey, R., 163
Blank, R., 213–214, 250 common cause variation, 176, 179–183. See also
Block, S., 169 variation
business strategy, adopting quality as, 332–336 comparative charts, 262
Conceptland, 99–100
continuous data, and control charts, 226
control chart decision tree, 230
C control charts, 229–230, 243–245
control limits
c-charts, Shewhart charts, 240–241 for control charts calculating, 214–215
Caldwell, C., 94, 96 freezing, 204
Campbell, D., 35, 43, 117 stair-step, 234
capability building, improvement, 346–357 trial, 215
capability of process, 180, 213, 251 vs. specification limits, 213
capacity building, improvement, 346–357 convenience sampling, 121

© Michal Steflovic/Shutterstock

363
364 Index

convenient quota sampling, 121


Conway, W., 99 F
count data. See attributes data
Fiske, D.W., 60
CQI, 21
Fitzpatrick, T., 117
culture of service excellence, creating, 46–55
focus groups, 64
customer, primary, 58–60
Forcese, D., 162
Freiberg, J., 47–48
Freiberg, K., 47–48

D funnel experiments, of Deming, 19–20

Daniel, W., 117, 212, 216


dashboards, 143–158
data G
gathering of, steps for, 160–163 g-chart, 247–249
data analysis and output, 162 Garvin, D., 54
information for decision making, 162–163 Gaucher, E., 163
interpretation of the results, 162 Gitlow, H., 85, 91
measurement and data collection, 161–162 Godfrey, B., 102, 337
select and define indicators, 161 Gonick, L., 117, 216
theoretical concepts, 160 Graham, J., 151, 334
vs. information, 159–163
data analysis, 35–38
static vs. dynamic approaches, 163–167
data collection, 35–37, 113–117, 126
data types, Shewhart charts, 226–229 H
defectives, 226–228 Hare, L., 250
defects, 228–229 Hayes, B. E., 45, 68
Deming, W. Edwards, 18–19, 27–29, 32, 35, 37, health care system
48–50, 54, 85, 90, 99, 103, 122–123, 131, accountability, 7
147–148, 157, 163–164, 172–175, 178–179, and consumer movement, 14
182, 185, 219, 223, 331–334, 336–339, 341, HMOs, 14
353–354 reform, 2, 6
design of experiments (DOE), 328 Henkel, R., 32, 38
Dillman, D. A., 68, 74–76, 89 Hess, I., 117
dispersion, measures of, 169, 216 high-tech tools, for data collection, 66
Donabedian, A., 3, 101 histograms, 277
dosing strategy, 356–357 Hull, R., 48
Duncan, A., 117, 214, 230, 242 hypotheses, developing, 162
dynamic approaches (data analysis), 163–167

I
E i-chart, Shewhart charts, 236–238
effectiveness, quality measurement and, 30–33 improvement. See also quality improvement (QI)
efficacy, quality measurement and, 30–33 and quality measurement, 339–346
efficiency, quality measurement and, 30–33 capability building for, 346–357
elements of control, 212 capacity building for, 346–357
end results, description of, 2 core skills and, 353–356
enumerative study, quality measurement and, 28–29 indicators, 101–108
experiential shopper, for data collection, 66 benchmarking, role of, 155–157
experimental designs, for research, 328 development worksheets, 127–132
expert sampling, 122 selection of, 104–109
extrinsic motivation, 49 strategic dashboard
Index 365

evolution of, 144–151 measurement, quality. See quality measurement


organizing into, 143–158 measurement, self-assessment tool, 339
vital, focusing on, 151 Measurementland, 100–101
information vs. data, 159–163 median, statistical, 41
intrinsic motivation, 49 Miller, D., 117
Ishikawa, K., 116–117, 214 mode, statistical, 41
ISO registration, 268, 359 Montgomery, D. C., 214, 219, 232, 242
Morrison, D., 32, 38
Mosser, G., 33

J moving range chart, 236


mystery shoppers, 66

judgement sampling, 122–123, 125–126

N
K Namboodiri, N., 328
Kamberg, C., 31 Nelson, B., 49
Kerlinger, F., 328 Nelson, E., 102, 144, 147–149, 157
Key Quality Characteristics (KQC), 57–60 Nightingale, Florence, 2, 4, 12
KQC. See Key Quality Characteristics (KQC) nonconforming units, 226
nonprobability sampling, 120–126
convenience sampling, 121
judgment sampling, 122–123, 125–126
L quota sampling, 121–122
norms, 46
Langley, G., 45, 48, 122, 163, 336–337, 341, 358 Nugent, W., 150
Lastrucci, C., 160, 163 numerical illiteracy, overcoming of, 171–185
LCL. See lower control limit (LCL) variations
leadership rounds, VOC and, 65 common vs. special causes of, 179–183
Leaon, L., 54 understanding of, conceptually, 171–179
learning system, development of, 336–339
Likert scale, 79
line graphs, 278
listening
and voice of the customer (VOC), 45–46
O
points, 60 observation, for data collection, 35–37
Lloyd, R., 214–215, 219, 230, 242 Onnias, A., 151
lower control limit (LCL), 212–213, 215, 217–218, operational definitions, 109–112
220, 234 organizational components, VOC and
human resource issues, 47–48
measurement and design, 54–55

M motivation, 48–52
organizational design, 52–54
organizational culture, 47
Maddox, B., 121
Ostrom, C., 32
Mann, N. R., 99
outcome measures, 36, 353
Marsick, V., 54
matched samples, 31
McDowall, D., 32
McGlynn, E., 31
mean, statistical, 41
measurement data, 226
P
measurement scales, 78 p-charts, 239
measurement, of performance. See performance Pareto diagram, 151, 327
measurement Pareto principle, 151
366 Index

patient-centered care, definition, 15 and improvement, 339–346


Patton, S., 65 data collection plans, developing, 103–117
PDSA cycle, 163, 341–342 effectiveness and, 30–33
Peabody, Francis, 2, 4, 12–15, 23 efficacy and, 30–33
Pedhazur, E., 328 efficiency and, 30–33
performance measurement, 33–39 importance of, 25–43
data analyzing, 35–38 indicator development worksheet, 127–132
data collection and sample size, 35–37 milestones in, 93–141, 101–108
measurement aim, 34–37 Conceptland and, 99–100
testing methods and observability, 35–37 indicator, 101–108
three faces of, 33–39 Measurementland and, 100–101
personal interviews, VOC and, 65 nonprobability sampling, 120–126
Peter, L., 48, 178 convenience sampling, 121
Plsek, P., 151, 214 judgment sampling, 122–123, 125–126
point-of-service (POS) assessment, 61 quota sampling, 121–122
Polar Area Diagrams, 2–3 operational definitions, 109–112
POS (point-of-service assessment). See point-of-service performance measurement and. See performance
(POS) assessment measurement
Posavac, E., 31 philosophy development, 93–95
postservice assessment, 61 principles, application of, 259–326
preservice assessment, 60 clarifying the operational definition of
probability sampling, 117–120 readmission, 286
cluster sampling, 119–120 emergency department fast track, 293–296
simple random sampling, 117–118 evaluating staffing effectiveness, 275–283
stratified proportional random sampling, 119 flash sterilization, 283–285
stratified random sampling, 118–119 group B streptococcus in pregnant women, 290–293
systematic sampling, 119 hospice/911 paramedic system partnership to
process capability, 157, 177, 180, 213, 251 improve care, 315–317
profound knowledge, 48, 163, 336–337 improving access to community services for mental
proportional stratified random sampling, 124 health and community health patients, 317–326
purposive sampling, 122 managing a breast cancer patient’s clotting levels,
Pyzdek, T., 214, 220, 230, 242 287–290
pain management for hip and knee replacement
patients, 306–314
predicting a cardiovascular event, 259–265
Q pressure ulcer prevention, 272–275
reducing ventilator-associated pneumonia, 300–305
QA. See quality assurance (QA) sampling central line infections, 265–266
QC. See quality control (QC) sampling Medicare insurance audits, 267–268
QI. See quality improvement (QI) tracking patient complaints, 296–300
quality tracking patient falls, 268–271
as business strategy, 332–336 probability sampling, 117–120
definition, 17–18 cluster sampling, 119–120
essential ingredients to achieve, 19 simple random sampling, 117–118
level of, 26 stratified proportional random sampling, 119
quality assurance (QA), 20 stratified random sampling, 118–119
quality control (QC), 20 systematic sampling, 119
quality funnel, 20 reason for, 25–43
quality improvement (QI), 26, 32–35, 37, 39 roadblocks in, 95–99
and voice of customer, 45 desire for precision, 96–97
assessment tools, 348–349 numerical illiteracy, 98–99
key activities of, 18 performance objectives, 97
methods, 354 statistical techniques, 98
philosophy and theory of, 353–354 threatening, 95–96
quality measurement, 19–20, 100 setting the context, 1–24
Index 367

patient-centered care and service, 12–19 Sergesketter, B., 357


quality funnel, 19–20 service excellence, VOC, 46–55
transparency, growing demand, 2–12 Shewhart charts, 211–257
types of studies, 28–30 additional, 247–249
quota sampling, 121–122 appropriate, decision of, 225–226
data types, 226–229
defining the key terms, 230–242

R area of opportunity, 233


c-chart, 240–241
I-chart, 236–238
range, statistical, 216
observation, 233
rate, 228
p-chart, 239
rational sampling, 122
S chart, 234–236
reliability, as criteria for sampling plan, 120
subgroup, 231–233
Richer, S., 162
u-chart, 241–242
Riedel, H., 117
X-bar, 234–236
roadblocks in quality measurement, 95–99
XmR chart, 236–238
desire for precision, 96–97
description of, 211–214
numerical illiteracy, 98–99
effective use of, 249–250
performance objectives, 97
exercises, 242, 246
statistical techniques, 98
key questions about, 214–219
threatening, 95–96
application of run chart rules, 219
Roberts, H., 357
how many dots?, 214–216
Rubik’s cube, 38
sigma, importance of, 216–219
run charts, 187–209
special cause variation, presence of, 219–225
analyzing of, 190–199
study questions, 245
annotation and, 206–207
summary of, 243–245
construction of, 188–190
types of, 229–230
description of, 187
variation and, 180
elements of, 188
vs. run charts, 211
median, changing of, 203–205
shift in process, 167
minimum and maximum amount of data, 202–203
sigma limits, 212, 232
nonrandom pattern, 193
simple random sampling, 117–118
random pattern, 193
Six Sigma, 250, 359
rules, 192–199
Skinner, B., 51–52
astronomical data point, 198–199
Smith, W., 117, 234
shifting, 192–193
SPC (statistical process control). See statistical process
too much or too little variation, 193–198
control (SPC)
trending, 193
special cause variation, 176, 179–183
vs. Shewhart charts, 211
specification limits, 213
run, definition of, 190
stair-step control limits, 234
standard deviation (SD), 114
Stanley, J., 35, 43
S static approaches (data analysis), 163–167
statistical process control (SPC), 18, 30, 98, 148, 159,
s-chart, Shewhart charts, 234–236 179, 187, 214
sample size, 35–37 statistical thinking, 98–99
sampling methods, advantages and disadvantages of, statistics, 30
123–125. See also individual sampling methods strategic dashboard, indicators and
Scherkenbach, W., 336 evolution of, 144–151
Schultz, L., 336, 341 organizing into, 143–158
scientific inquiry, steps, 31, 160 stratification, 113
scientific method, 160 stratified proportional random sampling, 119
self-assessment tool, measurement, 339 stratified random sampling, 118–119
Selltiz, C., 32 structures, processes, and outcomes (SPO), 101
368 Index

subgroups, 231–233 voice of the customer (VOC), 45–91


surveys, VOC, 67–86 and listening, 45–46
administration, logistics of, 79, 82–83 key points, 60–61
growth of, 71–74 experiential shopper, 66
improvement strategies, linked with, 84–86 focus groups, 64
questions, 74–77 high-tech tools, 66
response formats, 77–79 identification of customers, 55–57
system thinking, 54 key quality characteristics, defining, 57–60
systematic sampling, 119 leadership rounds, 65
observation, 64–65
organizational components
human resource issues, 47–48
T measurement and design, 54–55
motivation, 48–52
t-chart, 247–249 organizational design, 52–54
targets, 20 personal interviews, 65
Terrell, J., 117, 212 relating with voice of the process, 84
testing methods, 35–37 service excellence, creation of, 46–55
time-series analysis, 32 surveys, 67–86
transparency administration, logistics of, 79, 82–83
and measurement, 2–12 growth of, 71–74
definition, 5 improvement strategies, linked with, 84–86
trends, statistical, 173–174 questions, 74–77
Trial control limits, 215 response formats, 77–79
turnaround time, 113–114, 165 tools, 61–64
unsolicited feedback, 65–66
voice of the process (VOP), 18, 54, 84, 93
VOP. See voice of the process (VOP)
U
u-charts, 241–242
UCL. See upper control limit (UCL)
unsolicited feedback, 65–66
W
upper control limit (UCL), 212, 220–221, 234 Watkins, K., 54
Weiss, R., 117
Wheeler, D., 29, 98, 148, 181, 203, 214, 217–225, 230, 236, 242
Wick, C., 54
V worksheets, indicators and, 127–132

variables data, 226


variations, 171–183
and behaviors, people’s, 173–179
run charts, understanding with, 187–209
X
Shewhart charts, understanding with, 180, 211–257 X-bar, 234–236
VOC. See voice of the customer (VOC) XmR chart (I chart), 236–238

You might also like